# Homework 03 - Plotting

### Spark 010, Spring 2024

We will access `counties.csv` file which was downloaded from a [zillow](https://www.zillow.com/research/data/) database.

The columns we will use are `RegionName` (county name), `State`, and any column with dates as labels. There is a column for every month in Jan 2000 through Jan 2024. Each column has each county's median home value for the corresponding month. For example, in the column labeled `2000-02-29` are the median home prices for February 2000.


In [None]:
# just run me
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)
pd.options.display.width = 0
pd.options.display.max_colwidth = 100

In [None]:
# read the csv file
Counties = pd.read_csv("data/zcounty.csv")

Let's drop the columns we aren't going to use. This will make the table more readable.

In [None]:
Counties = Counties.drop(columns = ["Metro",'RegionType','SizeRank','RegionID','StateCodeFIPS','MunicipalCodeFIPS','StateName'])
Counties.head()

There seems to be a column for every month starting with Jan 2000 onto Jan 2024.

In [None]:
Counties.shape

Let's narrow our search to only include counties in California.

In [None]:
CA_counties = Counties[Counties['State'] == 'CA']
CA_counties.head()

### Question 1: 

Which county had the highest median home sales value in February 2000? Feel free to use google, or even Chat GPT for help.

In [None]:
# Put you answer here

In [None]:
# End Question 1

Let's isolate the rows corresponding to Merced and LA County.

In [None]:
Merced = CA_counties[CA_counties['RegionName'] == 'Merced County']
LA = CA_counties[CA_counties['RegionName'] == 'Los Angeles County']

In [None]:
Merced.head(10)

It looks like there was no Merced county data for the year 2000.

In [None]:
LA.head(10)

### Question 2:

Which month was LA County's median home sales price highest?

In [None]:
# Put you answer here


Which month was Merced County's median home sales price highest?

In [None]:
# Put you answer here


In [None]:
# End Question 2

## Plotting.

It should be easier to visualize to tell what's going on by plotting the median home value over time. Although we haven't taught you everything used below, you can learn by practicing, copy-pasting, using google, or asking classmates/teachers! 😃

Let's start by plotting LA County data.

In [None]:
# Plot a timeline of the median home value by month in LA County

# take only the home value columns
data = LA.T.iloc[2:]
months = data.index
prices = data.values
# initialize a figure
plt.figure()
# plot the data
plt.plot(months,prices)
# Give a good title
plt.title("Timeline of Median Home Value in LA County")
# Mark the axes with the appropriate labels
# Mark the y-axis labels
yticks = [k*10**5 for k in range(2,10)]
ylabels = [str(k)+'00K' for k in range(2,10)]
plt.yticks(yticks, labels = ylabels)
# For the x-axis labels, print only every 20th month for clarity
plt.xticks(months[::20],rotation = 45)
plt.xticks(data.index.tolist()[::20],rotation = 45)
plt.yticks([k*10**5 for k in range(2,10)],labels = [str(k)+'00K' for k in range(2,10)])
# show the figure
plt.show()

### Question 3

What do you notice about the above graph?

In [None]:
# Put you answer here


In [None]:
# End Question 3

### Question 4

Can you do a similar plot for Merced County?

In [None]:
# Do the same for Merced

# take only the home value columns
data = Merced.T.iloc[2:]
months = data.index
prices = data.values
# initialize a figure
plt.figure()
# plot the data
plt.plot(...)
# Give a good title
plt.title(...)
# Mark the axes with the appropriate labels
# Mark the y-axis labels
yticks = [k*10**5 for k in range(1,5)]
ylabels = [str(k)+'00K' for k in range(1,5)]
plt.yticks(yticks, labels = ylabels)
# For the x-axis labels, print only every 20th month for clarity
plt.xticks(months[::20],rotation = 45)
# show the figure
plt.show()



In [None]:
# End Question 4

### Plotting both together using a legend

It would be easier to compare the two side-by-side if we plotted them on the same plot. To that end, let's plot LA and Merced County on the same plot and make a color-coded legend. Let's add a third county for perspective.

In [None]:
Merced = CA[CA['RegionName'] == 'Merced County']
LA = CA[CA['RegionName'] == 'Los Angeles County']
Mo = CA[CA['RegionName'] == 'Monterey County']

In [None]:

# Now let's put LA and Merced County on the same plot and make a color-coded legend

# take only the home value columns for each
dataMerced = Merced.T.iloc[2:]
dataLA = LA.T.iloc[2:]
dataMont = Mont.T.iloc[2:]
# initialize a figure
plt.figure()
# plot the data
plt.plot(dataLA.index,dataLA.values,label = 'LA County')
plt.plot(dataMerced.index,dataMerced.values,label = 'Merced County')
plt.plot(dataMont.index,dataMont.values,label = 'Monterey County')
# Give a good title
plt.title("Merced, LA, and Monterey Counties - Home Value Comparison")
# Show the legend
plt.legend()
# Mark the axes with the appropriate labels
xticks = dataMerced.index.tolist()[::20]
plt.xticks(xticks,rotation = 45)
yticks = [k*10**5 for k in range(1,10)]
labels = [str(k)+'00K' for k in range(1,10)]
plt.yticks(yticks,labels = labels)
# show the figure
plt.show()


### Question 5

What do you notice about this plot?

In [None]:
# Put your answer here

In [None]:
# End Question 5

### Conclusion

What type of datasets would you enjoy working with in this class?

# Submission

Make sure you have run all cells in your notebook so that all images/graphs (if any) appear in the output before preparing to submit your work.  **Please create a PDF using File->Save and Export Notebook as->PDF**, then upload this document to the appropriate assignment on Catcourses.