![Our Stakeholder, Dwell Development](images/Dwell.png)

# Green Housing Development Analysis

&nbsp;

**Authors:** Stefano Caruso, Holly Gultiano, Raul Torres
***

## Overview

**Disclaimer: this notebook is for a Flatiron School data analysis project, for educational purposes only** 

For the past 17 years, [Dwell Development](https://www.dwelldevelopment.com/home/about/) has been building sustainable houses in the Seattle area. In this project, we use linear regression and other exploratory data analysis techniques to help them plan their next steps

## Business Problem

Seattle's population has been growing every year, largely fueled by the world-renown tech sector here, home of many top companies. 
As the population grows, the price of [rent seems to be growing with it](https://www.kiro7.com/news/local/report-seattle-rent-increased-nearly-19-year-over-year/LMUY74T3FRF5FNEWTG2KBKMJQ4/). Dwell Development wants to respond to this demand by creating new, affordable, multi-family housing for the growing population of tech workers in King County. 

The business problem then breaks into two parts: which houses are available in the same zip codes as local tech companies, to minimize commute, and what features correlate most with price, in order to minimize costs for both Dwell Development and the families on the housing market.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***

In [None]:
# importing packages used for our analysis
import pandas as pd
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt 
import folium
from sklearn.preprocessing import OneHotEncoder

%matplotlib inline

#### Source Dataset
Our housing data csv is located in the [data](./data) folder on this repository

In [None]:
<VAR> = pd.read_csv('/data/kc_house_data.csv') # TODO rename me

#### Map of Tech Companies in the King County Area
sourced from [this blog post](https://flatironschool.com/blog/best-tech-companies-seattle/) covering 30 companies with a lot of draw, including Google and Amazon, as well as some local start-ups

In [None]:
techcompany_name = ['Microsoft', 'Google', 'Expedia', 'Getty', 'Outreach', 'Avalara', 'Amazon', 'Big Fish Games',
                   'Tableau', 'Cray', 'Zulily', 'Redfin', 'Porch', 'SAP Concur', 'F5', 'Xealth', 'Pulumi', 'Apptentive',
                   'Highspot', 'Impinj', 'Upbound', 'Skytap', 'Glowforge', 'Auction Edge', 'GeoEngineers', 'Twillio Zipwhip',
                   'Whitepages', 'Amperity', 'SkyKick', 'PitchBook']

techcompany_coord = {47.6395481:-122.1316979, 47.6491022:-122.3512428, 47.6278727:-122.3771439, 47.5968424:-122.3288311,
                    47.6207149:-122.3623911, 47.5978827:-122.3309175, 47.6149968:-122.3382836, 47.6035842:-122.3375176,
                    47.6478044:-122.3382225, 47.605816:-122.3319745, 47.6142513:-122.3522433, 47.616631:-122.332592,
                    47.5835923:-122.3336612, 47.6161371:-122.1968104, 47.6051851:-122.331118, 47.6019789:-122.3317164,
                    47.6107471:-122.3397581, 47.6110571:-122.3422495, 47.6114079:-122.3478381, 47.6227313:-122.33609, 47.5995348:-122.3313931,
                    47.5980919:-122.3309701, 47.5838846:-122.3328815, 47.5990386:-122.3349373, 47.6141707:-122.3424675,
                    47.6218759:-122.3615888, 47.614592:-122.3391944, 47.6046363:-122.3307528, 47.6210721:-122.3599327, 47.6056348:-122.3321834}

tc_coord_list = list(techcompany_coord.items())

def coordlister(index_num):
    ''' quick function to get the individual coordinates of tech companies for Folium markers'''
    return list(tc_coord_list[index_num])
techmap = folium.Map(location=[47.605, -122.331])

def maplabeler (n):
    folium.Marker(coordlister(n)).add_to(techmap)

for i in range(30):
    maplabeler(i)
    
techmap

In [None]:
# TODO add some graphs here probably 

## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to clean the data
# TODO put the 

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to model the data
# TODO all simpleLR and one multipleLR models here


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***