# Goals of this assignment
-------------------------

* Demonstrate that you can access specific rows, columns, and entries of a data frame
* Demonstrate that you can iterate through a list (a particular row or column, or a set of rows or columns)
* Demonstrate filtering pattern using if-then-else
* Demonstrate that you can reconstruct an accumulator pattern
* Write a function to make use of the same block of code over and over again.
* Understand how indentation is used in python to indicate logical and functional blocks (and compare to how this is done in other languages)


# The Dataset
--------------

This dataset is also taken from Zillow and is formatted similarly, but covers median prices for 3 bedroom *rentals* from each state  for the past eight years.  There is more missing data, so we will have to handle it differently this time, but you will be doing the same kinds of things with this data, and the goal is to be sure everyone is clear on how accessing the data works and how to use it.

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('../data/State_MedianRentalPrice_3Bedroom.csv', index_col='RegionName')

In [None]:
print(df.columns)
months = df.columns[1:]
print(months)

# How many entries are missing?
--------------------------

In [None]:
for state in df.index:
    #print(state, '\t', sum(pd.isna(df.loc[state][months])))
    print('{:22}{:2d}'.format(state, sum(pd.isna(df.loc[state][months]))))

In [None]:
len(months)

# a lot of the data are missing

Of the 105 months in the dataset, more than half of the entries are missing for some of the states, and most of the states have some missing entries, so we're going to have do something different from what we did in project 3, where we just dumped the data from the one state that had any missing entries.  We're going to have to ignore just the the ones that are missing, which means that the number of data values will be different for every state



# Task 1 - accessing the data
----------


Demonstrate that you can access specific rows, columns, and entries of a data frame

In separate cells, print:

 1. Every entry for the state of Wyoming
 2. Every entry for the month of March, 2014
 3. The entry for New York in March of 2014
 4. The entry for Rhode Island in May of 2012
 ----------

# Task 2 - looping through the data
----------

Demonstrate that you can iterate through a list (a particular row or column, or a set of rows or columns)

 1. Make a for loop that accesses every entry for Arizona and prints it out

-----------

# Task 3 - filtering the data
---------

Demonstrate filtering pattern using if-then-else

 1. Make a for loop that accesses every entry for California.
 2. If the entry is "nan" it should print the month and "Missing"
 3. If the value is a number AND it is greater than $2000 it should print the month and "Expensive"
 4. Otherwise, it should print the month and "Less than 2000"

----------- 

# Task 4 - filtering the data again
-----------

Demonstrate filtering pattern using if-then-else

   1. Make a for loop that accesses every state during the month of October 2010
   2. If the entry is "nan" it should print the state and "Missing"
   3. If the value is a number AND it is greater than $2000 it should print the state and "Expensive"
   4. Otherwise, it should print the state and "Less than 2000"

-----------

# Task 5 - accumulating the data
----------

Demonstrate that you can reconstruct an accumulator pattern in the presence of missing values

 1. Compute the average national price during the month of October 2010, but you should ignore states that don't have data
 2. So construct an accumulator pattern that initializes three holder variables to hold the running total of prices, a second to hold the running count of valid data points, and a third to hold the running count of invalid data points
 3. Compute and output the average price as well as the number of valid and invalid data points

-----------

# Task 6 - Functions
-----------------------

The bit of code above works for getting information about a particular month, but it's a little inconvenient to use it to get information about every month, so we're going to reuse that work in a way that allows us to repeatedly call on that capability

------------------------

In [None]:
def monthly_avg(df, month):
    total_price = 0
    valid_data = 0
    invalid_data = 0

    for state in df.index:
        new_price = df.loc[state][month]
        if pd.isna(new_price):
            invalid_data = invalid_data + 1
        else:
            valid_data = valid_data + 1
            total_price = total_price + new_price

    print('{:8}{:06.2f} {:2} {:2} {:2}'.format(month, 
                round(total_price/valid_data, 2), 
                valid_data, invalid_data, len(df.index)))


In [None]:
monthly_avg(df, '2012-09')

In [None]:
for month in months:
    monthly_avg(df, month)