In [1]:
# Problem Statement:
# During a disaster, it is important to model and estimate the potential or forecasted effect of the event, 
# including the projected/forecasted damage.

In [2]:
# Considerations:
# Bear in mind that the value you provide to New Light Technologies may come from data ingestion, data cleaning, 
# EDA, and/ or a dashboard, etc. 
# While a model may not be immediately apparent, be creative. 
# Without being told exactly what model to build, how could we build a model to increase performance or 
# generate better insights when answering our problem statement?

In [3]:
# Useful Features:
# Existing indicators of forecasted damage include number of structures within the affected area, number of 
# people in the area, number of households, demographics of the impacted population, etc.
# Key feature of this project is the value of the properties in the affected area. Property values can be 
# estimated according to the market price of houses.

In [4]:
# Goals/Objectives:
# Leverage property market prices published in different real-estate websites (e.g. Zillow, Trulia, Realtor.com), 
# according to zip codes.
# Solution must allow users to automatically search for the mean, median, min, max and average value of the 
# properties in each zip code area.
# The objective is not to download the database from these sources. Rather, it should allow the user to feed 
# the code with a list of affected areas (zip codes) as input, and retrieve the current, historical, 
# annual, monthly, and/or quarterly property values.

In [5]:
# Deliverables:
# 1) Brief write-up describing the project.
# - A problem statement.
# - A succinct formulation of the question your analysis seeks to answer.
# - A table of contents, which should indicate which notebook or scripts a stakeholder should start 
#   with, and a link to an executive summary.
# - A paragraph description of the data you used, plus your data acquisition, ingestion, and cleaning steps.
# - A short description of software requirements (e.g., Pandas, Scikit-learn) required by your analysis.

# 2) An open source code (or a simple API) which takes, as input, a list of zip codes, and outputs the mean, 
# median, min and max property values in these areas.
# - Jupyter notebook(s) must be reproducible and error-free!
# - You should set a random seed at the start of every notebook, using np.random.seed(...). This will ensure 
# - that the random numbers generated in your notebook will be the same every time.
# - You need to provide a relative path to your data, so that if I clone your repo to my machine I can 
#   run everything in your repo without error. (You also provide links to any publicly accessible data.)
# - I should be able to Restart & Run All in your notebook(s) and see that the exact same results are reproduced.
# - To check that everything worked properly, you may consider forking your own repo to a different location 
#   on your computer and checking that all notebooks can run properly from top to bottom.

In [6]:
# PROJECT EVALUATION CRITERIA:
# 1) Project Requirements - Did our group meet all project requirements?
# 2) Audience - Is our project presentation appropriate for our stakeholder?
# 3) Methods - Are the methods we used appropriate for solving our problem statement?
# 4) Value - Did we provide value to our stakeholder through clear, data-driven recommendations?

In [7]:
#################################### Phase 1) Data Gathering ##################################################

In [21]:
# Import libraries:
import pandas as pd
import requests
from bs4 import BeautifulSoup
header = {'User-Agent':'cjbratkovics2'}

In [22]:
# URL resources:
zillow_url = 'https://www.zillow.com/'
trulia_url = 'https://www.trulia.com/'
realtor_url = 'https://www.realtor.com/'

In [23]:
z_res = requests.get(zillow_url, headers = header)
t_res = requests.get(trulia_url, headers = header)
r_res = requests.get(realtor_url)

In [24]:
print('Zillow status code = ', z_res.status_code)
print('Trulia status code = ', t_res.status_code)
print('Realtor.com status code = ', r_res.status_code)

Zillow status code =  200
Trulia status code =  200
Realtor.com status code =  200


In [25]:
z_soup = BeautifulSoup(z_res.content, 'lxml')
t_soup = BeautifulSoup(t_res.content, 'lxml')
r_soup = BeautifulSoup(r_res.content, 'lxml')

In [26]:
t_res.content[:500] # preview Trulia content

b'<!DOCTYPE html>\n<html lang="en">\n<head>\n\n  \n    <meta charset="utf-8">\n    \n  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, shrink-to-fit=no" />\n\n  \n\n      <meta name="csrft" value="mjk8ZYFWowze18XpPlHpdpj/oLtes10eyNr5CGPOCok=">\n  \n  \n      \n\n  \n\n  <title>\n      Trulia: Real Estate Listings, Homes For Sale, Housing Data\n  </title>\n\n  \n    <meta name="description" content="Your destination for all real estate listings and rental propert'