# Some Title

## Goals:

- Explore the effects of the number of bedrooms, bathrooms and square footage of Single Family Properties that had a transaction in 2017.

- Construct a ML regression model that accurately predicts property tax value.

# Imports

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import math
import matplotlib.pyplot as plt

import wrangle as w
import explore as e
#import stat_analysis_viz as viz
#import modeling_eval as 

import env
from scipy import stats

from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, RFE, f_regression, SequentialFeatureSelector
from sklearn.linear_model import LinearRegression, LassoLars, TweedieRegressor
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler, QuantileTransformer, PolynomialFeatures

import warnings
warnings.filterwarnings("ignore")

# Acquire

- Data acquired from zillow database
- It contained 52,441 rows and 7 columns before cleaning
- Each row represents a Single Family Property that had a transaction in 2017
- Each column represents a feature that describes the Single Property Home

In [2]:
# Acquiring data and clean-up
df = w.wrangle_zillow()

# Prepare

- Check for nulls (nulls were found and removed)
- Renamed columns for readability
- Optimized data types to integers where possible without losing data
- Split data into train, validate, and test sets
- Properties with 6 or more bathrooms and bedrooms were considered outliers and removed.
- Properties with a tax value greater than 2,000,000 were considered outliers and removed.
- Properties with a square footage greater than 10,000 ft^2 were considered outliers and removed.
- We are left with 50790 rows, 97% of data remains after cleaning

# Data Dictionary (needs editing)

| Feature | Definition |
| :- | :- |
| bedrooms | Integer, # of bedrooms in a property |
| bathrooms | Decimal value, # of bathrooms in a property, including fractional bathrooms |
| sq_feet | Integer, calculated total living area in a property |
| tax_value | Integer, total tax assessed value of the parcel, our target variable |
| year_built | Integer, the year a property was built |
| tax_amount | Decimal value, total property tax assessed for that assessment year |
| fips | Integer, Federal Information Processing Standard code |

In [3]:
# Data split into train, validate, and test
train, validate, test = w.split_data(df)

# Explore

## Questions:

- Are the number of bedrooms related to tax value?

- Are the number of bathrooms related to tax value?

- Is square footage related to tax value?