# Predicting Home Value for Zillow

by David Schneemann

## Project Goal

My goal with this project is to identify Zillow's key drivers of home value and to provide insight into why and how these factors are producing certain home values. With this information and the following recommendations, our organization can work together to improve business processes and procedures in order to more accurately predict home values moving forward.

## Project Description

Zillow is a website that utilizes a database of homes around the country in order to inform people who may be looking to buy, sell, or rent.

The ability to predict home value is essential as new homes are built each year and some existing homes don't currently have assessed value within this database.

In order to more accurately predict home value, we will analyze the attributes (features) of homes within a predetermined set of data. This dataset includes Single Family Properties that had a transaction during 2017.
We will then develop models for predicting home value based on these attributes and provide recommendations and predictions to Zillow for improving prediction of home values moving forward.

## Initial Questions

#### 1. Does a higher number of bedrooms increase home value?

- Ho = More bedrooms translates to <= home value
- Ha = More bedrooms translates to > home value

#### 2. Does a higher number of bathrooms increase home value?

- Ho = More bathrooms translates to <= home value
- Ha = More bathrooms translates to > home value

#### 3. Do more garage spaces increase home value?

- Ho = More garage spaces translates to <= home value
- Ha = More garage spaces translates to > home value

#### 4. Does county location affect home value?

- Ho = Orange county home values <= Ventura or LA County home values
- Ha = Orange county home values > Ventura or LA County home values

#### 5. Does a higher square footage increase home value?

- Ho = More sq_ft translates to <= home value
- Ha = More sq_ft translates to > home value

## Data Dictionary

In order to effectively meet our goals, the following module imports are required. \
Below is an extensive list of all modules I imported and used to create and complete the desired analysis for Zillow.

In [1]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math

from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, RFE, f_regression, SequentialFeatureSelector
from sklearn.linear_model import LinearRegression
from scipy import stats
import sklearn.preprocessing
from sklearn.metrics import mean_squared_error
from scipy.stats import pearsonr, spearmanr, kruskal


from env import user, password, host
import wrangle

| Variable      | Meaning |
| ----------- | ----------- |
| home_value      | The total tax assessed value of the parcel       |
| bedrooms   | The total number of bedrooms in a home        |
| bathrooms      | The total number of bathrooms in a home       |
| garage_spaces      | The total number of car slots in a garage       |
| year_built      | The year the home was built       |
| location      | Location of a home by county      |
| sq_ft      | The total square feet of a home       |
| lot_sq_ft      | The total square feet of a property lot       |
| decade_built   | The decade in which the home was built       |

## Wrangle Zillow Data

In [2]:
df = wrangle.wrangle_zillow()

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54167 entries, 0 to 56078
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   bedrooms        54167 non-null  int64  
 1   bathrooms       54167 non-null  float64
 2   garage_spaces   54167 non-null  float64
 3   year_built      54167 non-null  int64  
 4   location        54167 non-null  object 
 5   sq_ft           54167 non-null  int64  
 6   home_value      54167 non-null  int64  
 7   lot_sq_ft       54167 non-null  float64
 8   property_value  54167 non-null  float64
dtypes: float64(4), int64(4), object(1)
memory usage: 4.1+ MB


In [4]:
df.shape

(54167, 9)

In [5]:
df.head()

Unnamed: 0,bedrooms,bathrooms,garage_spaces,year_built,location,sq_ft,home_value,lot_sq_ft,property_value
0,3,4.0,4.0,1934,LA County,2822,1538506,7093.0,1076955.0
1,3,3.0,3.0,1990,LA County,2815,1106839,7601.0,651740.0
2,4,2.0,2.0,1936,LA County,2386,196622,7609.0,72463.0
3,2,1.0,1.0,1948,LA County,2406,628434,6244.0,456614.0
4,2,1.0,1.0,1936,LA County,962,545000,3752.0,387200.0


## Prepare Zillow Data

## Set the Data Context

## Exploratory Analysis

### 1. Does a higher number of bedrooms increase home value?
- Ho = More bedrooms translates to <= home value
- Ha = More bedrooms translates to > home value

#### Statistical Analysis

In [2]:
# Use the scipy stats function pearsonr to calculate the correlation coefficient and the p-value.
alpha = 0.05

r_bed , p_bed = pearsonr(x = train['bedrooms'], y = train['home_value'])

r_bed , p_bed

if p_bed < alpha:
    print('The number of bedrooms is correlated with home value.')
    print('The correlation coefficient is {}.'.format(r_bed))
else:
    print('The number of bedrooms is NOT correlated with home value.')

NameError: name 'pearsonr' is not defined

#### Answer 1 : Yes,  
Following our statistical test, we indicate that this feature is correlated with home value, meaning that there is statistical significance behind the claim that more bedrooms results in a higher home value. Thus we move forward with this feature.

### Summary: Which are the best predictors of home value?

## Predicting Home Value

#### Baseline

### Fit [ ] Models

#### Predict and Evaluate Test dataset

## Conclusion

### Summary

### Recommendations

### Next Steps