# Analysis of rising housing costs for UCSD students
### By
- Andrew T. Li, A12818225
- Andrew Yoo, A11346949
- Chelin Huang, A53053719
- Alex LaBranche, A14266131
- Matthias Baker, A13788705

### Introduction and Background:
As students we have to worry about whether or not we can graduate with good grades and without too much debt. Dealing with these things with so many unknowns can make a student's life more difficult than it has to be. So we wanted to explore one of those unknowns -- off campus housing. Many of us live off campus due to UCSD no longer offering 4 year guarunteed on-campus housing, and many of us rely on financial aid. So we wanted to explore the relationships between off-campus housing, rising tuition costs, and if financial aid is keeping up with both of them.

Our group would like to determine if housing prices and tuiton costs are rising, but fincancial aid is not adjusting accordingly then financial aid packages should be adjusted.


### DataSet: Zillow API
- Link to the dataset: https://www.zillow.com/howto/api/APIOverview.htm

From the Zillow API we are using the GetDeepSearchResults. This will allow us to get information such as rent price, when the home entered the market, location, region name, and much more. The specific fields our group is interested in are location, time the property entered the market, and rent per month. This will allow us to analyze the financial impact of housing relevant to students in the UCSD area. 

The Zillow API allows up to 1000 requests per day, and for the purposes of our project we will be using roughly 10,000 observations. These observations range from 2010 to 2017. 

### DataSet: UCSD Financial Aid
- Link to data:


In [1]:
import urllib
import json
import requests
import pprint

# For XML -> json
import xml.etree.ElementTree as ET
from xmljson import badgerfish as bf

In [2]:
# For url -> json
!curl -i -X PUT -H "Content-Type: application/json" -d

curl: option -d: requires parameter
curl: try 'curl --help' or 'curl --manual' for more information


In [3]:
# Put in your zws_id here as a string
zws_id=""

In [None]:
# Use redirected request of zpid only url to get address on zillow
import requests
response = requests.get("https://www.zillow.com/homedetails/16842323_zpid/")
if response.history:
    print ("Request was redirected")
    for resp in response.history:
        print (resp.status_code, resp.url)
    print ("Final destination:")
    print (response.status_code, response.url)
else:
    print ("Request was not redirected")

In [None]:
def API_URL(zws_id, zpid):
    response = requests.get("https://www.zillow.com/homedetails/" + str(zpid) + "_zpid/")
    address = response.url.split('/')[-3]
    url = "http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=" + zws_id + "&address="
    street = address.split('-')[:-4]
    for i in range(len(street)-1):
        url += street[i]
        url += "+"
    url += street[-1] + "&citystatezip=" + address.split('-')[-4] + "+" + address.split('-')[-3]
    url += "%2C+CA&rentzestimate=true"
    return url


def rentData_URL(zpid):
    return "https://www.zillow.com/ajax/homedetail/HomeValueChartData.htm?mt=9&zpid=" + str(zpid) + "&format=json"

result_str = '{http://www.zillow.com/static/xsd/SearchResults.xsd}searchresults'

In [None]:
# 'a' for appending newly scraping data
outfile = open('data.json', 'a')

merged_json = dict()
for zpid in range(16837859,(16837859+0)):
    try:
        json_property = json.loads(json.dumps(bf.data(ET.fromstring(urllib.request.urlopen(API_URL(zws_id,zpid)).read()))))
    except IndexError:
        continue
    if json_property[result_str]['message']['code']['$'] != 0:
        if json_property[result_str]['message']['code']['$'] == 7:
            print("this account has reached is maximum number of calls for today")
            print("The last index")
            print(zpid)
            break
        else:
            continue
    json_propertyResponse = json_property[result_str]['response']
    json_rentHistory = json.loads(urllib.request.urlopen(rentData_URL(zpid)).read())[0]
    json_propertyResponse["HomeValueChartData"] = json.dumps(json_rentHistory)
    merged_json['zpid'] = zpid
    merged_json['data'] = json_propertyResponse
    json.dump(merged_json, outfile)
    outfile.write('\n')
outfile.close()
print("Done")

In [None]:
# Load the result json file to dest_json
dest_json = dict()
for line in open('data.json','r'):
    temp = json.loads(line)
    dest_json[temp['zpid']] = temp['data']

In [None]:
pprint.pprint(dest_json[16842300])

### Data Cleaning

In order to analyze housing prices in UCSD, data that was pulled from Zillow needed to be cleaned. Observations needed to be filtered so that the homes and apartments were in University City, and posted during the time period 2010 - 2017. In the following code, the averages of the filtered posts on Zillow are calculated. Homes are categorized by the number of bathrooms and bedrooms. And these categories are averaged to find a trend from 2010 to 2017.

In [None]:
# we should create a cell at the top of all the imports 
import pandas as pd
import json
import time
import statistics

In [None]:
# read the json file

df = pd.read_json('data.json',lines=True)

In [None]:
# initialize dictionaries that will track year and price

bedroom1_bathroom1 = {}
bedroom1_bathroom15 = {}
bedroom2_bathroom1 = {}
bedroom2_bathroom15 = {}
bedroom2_bathroom2 = {}
bedroom2_bathroom25 = {}
bedroom3_bathroom2 = {}
bedroom3_bathroom25 = {}
bedroom4 = {}
bedroom5 = {}

c = 0

In [None]:
# helper function that will be used to update dictionaries with rent prices 

def update_bedroom_dict(num_bed, curr_zip, update_dict):
    if num_bed == 1:
        if bathroom == 1.0:
            bedroom1_bathroom1[curr_zip] = update_dict
        if bathroom == 1.5:
            bedroom1_bathroom15[curr_zip] = update_dict
    if num_bed == 2:
        if bathroom == 1:
            bedroom2_bathroom1[curr_zip] = update_dict
        if bathroom == 1.0:
            bedroom2_bathroom1[curr_zip] = update_dict
        if bathroom == 1.5:
            bedroom2_bathroom15[curr_zip] = update_dict
        if bathroom == 2:
            bedroom2_bathroom2[curr_zip] = update_dict
        if bathroom == 2.0:
            bedroom2_bathroom2[curr_zip] = update_dict
        if bathroom == 2.5:
            bedroom2_bathroom25[curr_zip] = update_dict
    if num_bed == 3:
        if bathroom == 2:
            bedroom3_bathroom2[curr_zip] = update_dict
        if bathroom == 2.0:
            bedroom3_bathroom2[curr_zip] = update_dict
        if bathroom == 2.5:
            bedroom3_bathroom25[curr_zip] = update_dict
    if num_bed == 4:
        bedroom4[zpid] = new_dict
    if num_bed == 5:
        bedroom5[zpid] = new_dict

In [None]:
# iterate the data to populate the dictionaries for each bedroom and bathroom type
for zpid in df['zpid'].tolist():
    homeValue = df.loc[df['zpid'] == zpid]['data'][c]['HomeValueChartData']
    
    try:
        bedroom = df.loc[df['zpid'] == zpid]['data'][c]['results']['result']['bedrooms']['$']
        bathroom = df.loc[df['zpid'] == zpid]['data'][c]['results']['result']['bathrooms']['$']
    except (TypeError,KeyError):
        c=c+1
        continue
    c = c + 1

    chart = json.loads(homeValue)
    y= {2010:[], 2011:[], 2012:[], 2013:[], 2014:[], 2015:[], 2016:[], 2017:[]}

    # append rent so that it can be averaged later
    # the key is the year from [-4:]
    for i in chart['points']:
        year = int(time.ctime(i['x']/1000)[-4:])
        y[year].append(i['y'])
  
    # new_dict will have key: year,  value: average rent price
    new_dict ={}
    for key, value in y.items():
        if (len(value) != 0):
            new_dict[key] = statistics.mean(value)
            
    update_bedroom_dict(bedroom, zpid, new_dict)

In [None]:
# Note that the logic here is repeated for the other bedroom options below

print("Average rent: 1 bedroom 1 bathrooms\n")

# need to create a new dict that will be the one printed for result
# so that we don't modify the dictionary that we are iterating over 

rent_dict_1 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom1_bathroom1.items():
    for year, rent in value.items():
        rent_dict_1[year].append(rent)
        
# update the dictionary with same key but with value of averages     

for key, value in rent_dict_1.items():
    rent_dict_1[key] = statistics.mean(value)
    
    
print(rent_dict_1)
        
print("\nAverage rent: 1 bedroom 1.5 bathrooms\n")

rent_dict_15 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom1_bathroom15.items():
    for year, rent in value.items():
        rent_dict_15[year].append(rent)
        
        
for key, value in rent_dict_15.items():
    rent_dict_15[key] = statistics.mean(value)
    
    
print(rent_dict_15)


In [None]:
print("Average rent: 2 bedroom 1 bathrooms\n")

rent_dict_21 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom2_bathroom1.items():
    for year, rent in value.items():
        rent_dict_21[year].append(rent)
        
        
for key, value in rent_dict_21.items():
    rent_dict_21[key] = statistics.mean(value)
    
    
print(rent_dict_21)

print("\nAverage rent: 2 bedroom 1.5 bathrooms\n")

rent_dict_2 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom2_bathroom15.items():
    for year, rent in value.items():
        rent_dict_2[year].append(rent)
        
        
for key, value in rent_dict_2.items():
    rent_dict_2[key] = statistics.mean(value)
    
    
print(rent_dict_2)

print("\nAverage rent: 2 bedroom 2 bathrooms\n")

rent_dict_22 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom2_bathroom2.items():
    for year, rent in value.items():
        rent_dict_22[year].append(rent)
        
        
for key, value in rent_dict_22.items():
    rent_dict_22[key] = statistics.mean(value)
    
    
print(rent_dict_22)

print("\nAverage rent: 2 bedroom 2.5 bathrooms\n")

rent_dict_25 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom2_bathroom25.items():
    for year, rent in value.items():
        rent_dict_25[year].append(rent)
        
        
for key, value in rent_dict_25.items():
    rent_dict_25[key] = statistics.mean(value)
    
    
print(rent_dict_25)

In [None]:
print("Average rent: 3 bedroom 2 bathrooms\n")

rent_dict_3 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom3_bathroom2.items():
    for year, rent in value.items():
        rent_dict_3[year].append(rent)
        
        
for key, value in rent_dict_3.items():
    rent_dict_3[key] = statistics.mean(value)
    
    
print(rent_dict_3)

print("\nAverage rent: 3 bedroom 2.5 bathrooms\n")

rent_dict_35 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom3_bathroom25.items():
    for year, rent in value.items():
        rent_dict_35[year].append(rent)
        
        
for key, value in rent_dict_35.items():
    rent_dict_35[key] = statistics.mean(value)
    
    
print(rent_dict_35)

In [None]:
print("Average rent: 4 bedroom\n")

rent_dict_4 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom4.items():
    for year, rent in value.items():
        rent_dict_4[year].append(rent)
        
        
for key, value in rent_dict_4.items():
    rent_dict_4[key] = statistics.mean(value)
    
    
print(rent_dict_4)

print("\nAverage rent: 5 bedroom\n")

rent_dict_5 = {2010:[],2011:[],2012:[],2013:[],2014:[],2015:[],2016:[],2017:[]}
for key, value in bedroom5.items():
    for year, rent in value.items():
        rent_dict_5[year].append(rent)
        
        
for key, value in rent_dict_5.items():
    rent_dict_5[key] = statistics.mean(value)
    
    
print(rent_dict_5)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(figsize=(15, 10))
# ax.plot(list(rent_dict_5.keys()), list(rent_dict_5.values()), 'ro-', label='5 bedroom')
# ax.plot(list(rent_dict_4.keys()),list(rent_dict_4.values()), 'o-', label='4 bedroom')
ax.plot(list(rent_dict_35.keys()),list(rent_dict_35.values()), 'ro-', label='3 bedroom 2.5 bathrooms')
ax.plot(list(rent_dict_3.keys()),list(rent_dict_3.values()), 'o-', label='3 bedroom 2 bathrooms')
ax.plot(list(rent_dict_25.keys()),list(rent_dict_25.values()), 'o-', label='2 bedroom 2.5 bathrooms')
ax.plot(list(rent_dict_22.keys()),list(rent_dict_22.values()), 'o-', label='2 bedroom 2 bathrooms')
ax.plot(list(rent_dict_2.keys()),list(rent_dict_2.values()), 'o-', label='2 bedroom 1.5 bathrooms')
ax.plot(list(rent_dict_21.keys()),list(rent_dict_21.values()), 'o-', label='2 bedroom 1 bathrooms')
ax.plot(list(rent_dict_15.keys()),list(rent_dict_15.values()), 'o-', label='1 bedroom 1.5 bathrooms')
ax.plot(list(rent_dict_1.keys()),list(rent_dict_1.values()), 'o-', label='1 bedroom 1 bathrooms')
legend = ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), shadow=True, fontsize='x-large')
plt.title('rent history')
plt.xlabel('Year')
plt.ylabel('Rent')
plt.grid(True)

for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
             ax.get_xticklabels() + ax.get_yticklabels()):
    item.set_fontsize(15)

In [None]:
fig, ag = plt.subplots(figsize=(15, 10))
ag.plot(list(rent_dict_5.keys()), list(rent_dict_5.values()), 'ro-', label='5 bedroom')
ag.plot(list(rent_dict_4.keys()),list(rent_dict_4.values()), 'o-', label='4 bedroom')
legend = ag.legend(loc='center left', bbox_to_anchor=(1, 0.5), shadow=True, fontsize='x-large')
plt.title('rent history')
plt.xlabel('Year')
plt.ylabel('Rent')
plt.grid(True)

for item in ([ag.title, ag.xaxis.label, ag.yaxis.label] +
             ag.get_xticklabels() + ag.get_yticklabels()):
    item.set_fontsize(15)