# Estimate Home Tax Values in Zillow
Authors: Matthew Mays, Gilbert Noriega

## Goals
- Our goal for this project is to create a model that will predict the values of single unit properties that the tax district assesses using the property data from those whose last transaction was during the months of May and June in 2017.

## Import Modules:

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from scipy import stats

from acquire import get_zillow_data
from prepare import prep_zillow_final
from model import plot_variable_pairs
from model import select_kbest, rfe
from model import linearReg_train, lassoLars_train, poly_linearReg_train
from model import linearReg_validate, lassoLars_validate, poly_linearReg_validate

from sklearn.metrics import mean_squared_error

import warnings
warnings.filterwarnings("ignore")

## Data Acquire

- Our function, **get_zillow_data**, uses the SQL query we created to bring in data from the Zillow database to include only single unit residential properties whose last transactions were in the months of May and June in 2017. We joined a table to get the last transaction which gave us a total row count of 15,036.

## Data Prep

In [None]:
df = prep_zillow_final()

## Data Exploration

In [None]:
plot_variable_pairs(train)
plt.show()

### Is tax value independent from the amount of bedrooms?

##### Hypothesis 1:

$H_0$: Tax value and the amount of bedrooms are **independent**

$H_a$: Tax value and the amount of bedrooms are **dependent**

##### Assigning test variables

In [None]:
x = train.bedroom
y = train.taxvalue

##### Setting Alpha

In [None]:
alpha = .05

##### Running Correlation Test

In [None]:
corr, p = stats.pearsonr(x, y)
corr, p

##### Analyzing the Results

In [None]:
print('Correlation between tax value and bedrooms')
print(f'  r = {corr:.3f}')

In [None]:
if p < alpha:
    print("We reject the null")
else:
    print("We fail to reject the null")
p

###### Conclusion: The tax value and the amount of bedrooms are dependent

## Feature Engineering

##### MVP Features

In [None]:
X_train_mvp = train[['sqrft_scaled', 'bedroom_scaled', 'bathroom_scaled']]
y_train_mvp = train[['taxvalue']]

X_validate_mvp = validate[['sqrft_scaled', 'bedroom_scaled', 'bathroom_scaled']]
y_validate_mvp = validate[['taxvalue']]

X_test_mvp = test[['sqrft_scaled', 'bedroom_scaled', 'bathroom_scaled']]
y_test_mvp = test[['taxvalue']]

##### SelectKBest Correlation

In [None]:
select_kbest(X_train, y_train, 2)

##### Recursive Feature Elimination

In [None]:
rfe(X_train, y_train, 2)

#### Top Features for Second Model

In [None]:
X_train2 = train[['sqrft_scaled', 'bedroom_scaled']]
y_train2 = train[['taxvalue']]

X_validate2 = validate[['sqrft_scaled', 'bedroom_scaled']]
y_validate2 = validate[['taxvalue']]

## Modeling

##### Setting the Baseline

In [None]:
np.mean(y_train)

In [None]:
baseline_rmse = mean_squared_error(y_train, np.full(8994, np.mean(y_train)))**(1/2)

In [None]:
baseline_rmse

### Train

In [None]:
lm_rmse = linearReg_train(X_train, y_train)

lm_rmse

##### LassoLars Model 1

In [None]:
lars_rmse = lassoLars_train(X_train, y_train)

lars_rmse

##### Polynomial Squared Model 1

In [None]:
lm_squared_rmse = poly_linearReg_train(X_train, y_train, 2)

lm_squared_rmse

##### Linear Regression Model 2

In [None]:
lm_rmse2 = linearReg_train(X_train2, y_train2)

lm_rmse2

##### LassoLars Model 2

In [None]:
lars_rmse2 = lassoLars_train(X_train2, y_train2)

lars_rmse2

##### Polynomial Squared Model 2

In [None]:
lm_squared_rmse2 = poly_linearReg_train(X_train2, y_train2, 2)

lm_squared_rmse2

In [None]:
##### Grouping the Results

print("Baseline, Mean: ", baseline_rmse)
print("Linear Regression Model 1: ", lm_rmse)
print("Linear Regression Model 2: ", lm_rmse2)
print("LassoLars Model 1: ", lars_rmse)
print("LassoLars Model 2: ", lars_rmse2)
print("Polynomial Squared Model 1: ", lm_squared_rmse)
print("Polynomial Squared Model 2: ", lm_squared_rmse2)

### Validate

##### Linear Regression Model 1

In [None]:
lm_rmse_val = linearReg_validate(X_train, y_train, X_validate, y_validate)

lm_rmse_val

##### LassoLars Model 1

In [None]:
lars_rmse_val = lassoLars_validate(X_train, y_train, X_validate, y_validate)

lars_rmse_val

##### Polynomial Squared Model 1

In [None]:
lm_squared_rmse_val = poly_linearReg_validate(X_train, y_train, X_validate, y_validate, 2)

lm_squared_rmse_val

##### Linear Regression Model 2

In [None]:
lm_rmse_val2 = linearReg_validate(X_train2, y_train2, X_validate2, y_validate2)

lm_rmse_val2

##### LassoLars Model 2

In [None]:
lars_rmse_val2 = lassoLars_validate(X_train2, y_train2, X_validate2, y_validate2)

lars_rmse_val2

##### Polynomial Squared Model 2

In [None]:
lm_squared_rmse_val2 = poly_linearReg_validate(X_train2, y_train2, X_validate2, y_validate2, 2)

lm_squared_rmse_val2

In [None]:
print("Baseline, Mean: ", baseline_rmse)
print("Linear Regression Model 1: ", lm_rmse, lm_rmse_val)
print("Linear Regression Model 2: ", lm_rmse2, lm_rmse_val2)
print("LassoLars Model 1: ", lars_rmse, lars_rmse_val)
print("LassoLars Model 2: ", lars_rmse2, lars_rmse_val2)
print("Polynomial Squared Model 1: ", lm_squared_rmse, lm_squared_rmse_val)
print("Polynomial Squared Model 2: ", lm_squared_rmse2, lm_squared_rmse_val2)

### Test

## Conclusion