### Scoping My Data
From what I've been learning through Codecademy, it seems that it's best to scour over the data, and then plan out my analysis accordingly. There are questions that will immediately come to mind, but as the owner of this data, it's easy to answer these questions without really using any analysis. For example:

**On average, what month(s) incurred the most electricity costs?** This one is fairly easy to answer, even from someone who's never really seen this data. Summer months in Central California almost always use the most electricity of the year. The same can be said of the **gas** bill and the winter months.

#### About the Data 

This is pretty self-explanatory; at the end of each month, I grabbed the bill for each utilities type. I then put them in a spreadsheet, totalled them, and then divided by the number of occupants living in the household. When it comes to occupants, I *could* provide context here, but I feel that for my current needs, it will be a time-waste. There will also be instances where data does not align with what you might expect. Unfortunately, I did not document the months where I received credits on bills. Also missing is the few months in 2021 where I had a crypto mining rig eating up about $50 in electricity each month. 

With these small cases, I will be sure to provide a tidbit of information for clarity (where I can). 

Because I will be analysing this data multiple times throughout my DS journey, I think it's best that I start with simple questions. As I learn more advanced methods of analysis, I can then add those questions. For example, taking the average cost of each utility type over each year and then comparing them will be much easier than doing the same but also calculating the average cost *per occupant*, and then adjusting values based on that (that actually seems pretty fun as I'm typing it out). Doing it the easy way, unfortunately, will result in skewed data most likely. But again, this is a subjective set of data and the accuracy isn't very important. 

By the end of this, I only hope to get better awareness on where costs go each month. I will not be using this data to make "better business decisions." It will be an amusing project that will assist in my data science journey!

#### So my first question will be simple: What was the average cost of each utility *per year*? Secondly, which year had the highest cost for *each* utility? Finally, which year was the most expensive *in total* utilities?

***

First, of course, I will be needing to import the data from the csv. Initially, I'll be using the 'csv' python module. Why not use something like a pandas DataFrame? Simply, it's because I'm still very much a beginner. Eventually, I will be using the more advanced methods.

In [363]:
import csv

Next, I'll be using DictReader to grab all the data. I'll be storing in a list for the scope of this part of the project.

In [364]:
cost = []

with open('cost_distribution.csv') as data:
    reader = csv.DictReader(data)
    for row in reader:
        cost.append(row)
        #gas[row['Month']] = (row['Gas'].strip('$'))
        

More Data Stuff:

I'll need to add the year to each row, and then remove the data for 2022. This will be fun. There's probably a much more efficient way of doing this, but for practice's sake, I'll be using a function here.

I will also be stripping 2022 data rather crudely; I will remove any row with 9 items in it, as the rows with Years added to them will have 10.

### Special Thanks to [EddisFargo](https://github.com/EddisFargo) for helping me with this :) 

In [365]:
current_year = 2018
index = 0
while current_year < 2022:
    count = 0
    while count < 12:
        cost[index]["Year"] = current_year
        count += 1
        index += 1
    current_year += 1

count = 0
for i in range(len(cost)):
    if (len(cost[i])) == 9:
        count += 1

 # ONLY RUN THIS ONCE OR ELSE ITLL KEEP DELETING. TRYING TO FIGURE OUT A FIX       
del cost[-4:]

print(cost)

[{'Month': 'January', 'Utility': '$22.19', 'Gas': '$51.42', 'Electricity': '$58.52', 'Water': '$43.85', 'Internet': '$59.99', 'TOTAL': '$235.97', 'Occupants': '2', 'EACH': '$117.99', 'Year': 2018}, {'Month': 'February', 'Utility': '$55.41', 'Gas': '$68.60', 'Electricity': '$60.61', 'Water': '$49.14', 'Internet': '$59.99', 'TOTAL': '$293.75', 'Occupants': '2', 'EACH': '$146.88', 'Year': 2018}, {'Month': 'March', 'Utility': '$55.41', 'Gas': '$46.63', 'Electricity': '$29.75', 'Water': '$65.13', 'Internet': '$59.99', 'TOTAL': '$256.91', 'Occupants': '2', 'EACH': '$128.46', 'Year': 2018}, {'Month': 'April', 'Utility': '$55.41', 'Gas': '$19.07', 'Electricity': '$56.67', 'Water': '$62.32', 'Internet': '$59.99', 'TOTAL': '$253.46', 'Occupants': '3', 'EACH': '$84.49', 'Year': 2018}, {'Month': 'May', 'Utility': '$55.41', 'Gas': '$19.07', 'Electricity': '$56.67', 'Water': '$62.32', 'Internet': '$59.99', 'TOTAL': '$253.46', 'Occupants': '3', 'EACH': '$84.49', 'Year': 2018}, {'Month': 'June', 'Util

There's more that could be done here, like verify the TOTALS and EACHES. Since I double checked in the spreadsheet and saw that each is a result of a calculation and not a raw input, I know these values to be true. For time's sake I will not be doing that, but later on I think it will be something I can work on. 

**So to my first question: What was the average cost of each utility per year?**

For this, I think I want to write a function, since I've not so far in this project. It will have the utility type as the parameter, and then add each year's average for that utility type to a dict, and then return that dict. I will be able to call a specific year by its key/value pairing.

In [366]:
def util_avg(utility):  
    util_avg = {}
    current_year = 2018
    index = 0
    while current_year < 2022:
        util_sum = 0
        count = 0
        while count < 12:
            util_sum += float(cost[index][utility].strip('$'))
            rounded_avg = round(util_sum / 12, 2)
            count += 1
            index += 1
        util_avg[current_year] = rounded_avg
        print("The average {util} price in {year} was ${cost}.".format(util=utility, year=current_year, cost=rounded_avg))
        current_year += 1


This is great for finding averages for *each* utility, but for this question, I'd like to get them all in one fell swoop. However, i"m going to just switch the year and utility, so I'll be passing the year as the parameter this time. This way I only have to call the function 3 times.

### NOTE: This was not the first iteration of this function. I've spent several hours figuring out different loops, functions, parameters, etc to get ALL years/utilities averages within one function. I even tried using a list of utility types as a parameter and then getting all the averages based on that (future proofing for if I want to get averages for only certain groups of utilities). Iterating through years is much easier for me at this time. Since I won't be using this function for anything else, I'll just print the output of each average instead of returning the data in lists/dicts.

## UPDATE: I've decided to just stick with what I have, because now that I'm learning pandas/dataframes, I'm finding they have ways to do all of this so much easier. So for now, I will answer my questions with the methods I've been using. And I will start a new notebook using Pandas and DataFrame methods. 

### Q1: What was the average cost of each utility per year?

Let's run the following function and find out.

In [367]:
utils = ["Utility", "Gas", "Electricity", "Water", "Internet"]

for util in utils:
    util_avg(util)
    print(" ")

The average Utility price in 2018 was $53.49.
The average Utility price in 2019 was $55.47.
The average Utility price in 2020 was $55.49.
The average Utility price in 2021 was $55.58.
 
The average Gas price in 2018 was $29.24.
The average Gas price in 2019 was $42.95.
The average Gas price in 2020 was $34.26.
The average Gas price in 2021 was $34.38.
 
The average Electricity price in 2018 was $87.58.
The average Electricity price in 2019 was $158.03.
The average Electricity price in 2020 was $139.14.
The average Electricity price in 2021 was $163.72.
 
The average Water price in 2018 was $58.75.
The average Water price in 2019 was $41.7.
The average Water price in 2020 was $43.48.
The average Water price in 2021 was $47.74.
 
The average Internet price in 2018 was $59.99.
The average Internet price in 2019 was $81.79.
The average Internet price in 2020 was $61.83.
The average Internet price in 2021 was $80.0.
 


### Q2: Which year had the highest cost for each utility?

This should be straightforward (should be), I can loop through each utility and grab the highest value. 

In [368]:
def highest_cost_year(utility):
    base_cost = 0
    year = 0
    month = 0
    for i in range(len(cost)):
        new_cost = float(cost[i][utility].strip("$"))
        while new_cost > base_cost:
            base_cost = new_cost
            year = cost[i]["Year"]
            month = cost[i]["Month"]
    print("The highest {util} cost occured in {year}. Its cost was ${cost} in the month of {month}.".format(util=utility, year=year, cost=base_cost, month=month))

Now we run this function to find out which year and month each utility costed the most.

In [369]:
utils = ["Utility", "Gas", "Electricity", "Water", "Internet"]

for util in utils:
    highest_cost_year(util)

The highest Utility cost occured in 2018. Its cost was $65.41 in the month of June.
The highest Gas cost occured in 2019. Its cost was $115.41 in the month of February.
The highest Electricity cost occured in 2019. Its cost was $347.2 in the month of August.
The highest Water cost occured in 2019. Its cost was $75.04 in the month of August.
The highest Internet cost occured in 2019. Its cost was $91.95 in the month of April.


Boy was 2019 expensive. This was probably due to the introduction of a 4th person who may or may not have been a utility hog. 

Now for the last question!

### Which year was the most expensive in TOTAL utilities?

In [373]:
current_year = 2018
base_total = 0
index = 0
year_of_total = 0
while current_year < 2022:
    
    new_total = 0
    count = 0
    while count < 12:
        new_total += float(cost[index]["TOTAL"].strip("$"))
        if new_total > base_total:
            base_total = new_total
            year_of_total = cost[index]["Year"]

        index += 1
        count += 1
    current_year += 1
tot_avg = round(base_total / 12, 2)

print("{year} was the most expensive year, with total utilities costing ${cost} at an average monthly cost of ${avg}.".format(cost=base_total, year=year_of_total, avg=tot_avg))
    
    

2021 was the most expensive year, with total utilities costing $4577.09 at an average monthly cost of $381.42.


Very anti-climactic if I do say so myself. It makes sense that things costed more as time went on. Instead of what I did for the last question, I opted to directly as the 3rd question through our loop, cutting out total costs for all other years.

I can calculate the percentage change cost year-over-year next if I wanted to, but I believe this is it. I did it! 