# Iterating The Machine Learning Model With Normalized Data

In this Notebook I am going to repeat the normalization part that originally didn't work out.
I will then apply the normalized data to our models and check if there is a significant impact.
I will skip visualizing the data in this notebook. Please refer to the data wrangling notebook [here](https://github.com/Caparisun/Linear_Regression_Project/blob/master/Notebooks_and_data/2.Datawrangling.ipynb).

In [30]:
#importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn import preprocessing
import statsmodels.formula.api as sm

In [45]:
# Import data into pandas dataframe. 
# I am using the csv file that was the output of the data wrangling notebook to not repeat certain steps
df = pd.read_csv('model.csv')

In [46]:
# check if the import worked 
df.head()

Unnamed: 0.1,Unnamed: 0,bedrooms,bathrooms,sqft_living,floors,waterfront,view,condition,grade,yr_renovated,price
0,0,3,1.0,1180,1.0,0,0,3,7,0,221900
1,1,3,2.25,2570,2.0,0,0,3,7,1,538000
2,2,2,1.0,770,1.0,0,0,3,6,0,180000
3,3,4,3.0,1960,1.0,0,0,5,7,0,604000
4,4,3,2.0,1680,1.0,0,0,3,8,0,510000


In [47]:
# we will directly drop the column "Unnamed: 0"

df_drop=df.drop(['Unnamed: 0'], axis = 1) 


In [48]:
# check again if the columns were removed sucessfuly
df_drop.head()

Unnamed: 0,bedrooms,bathrooms,sqft_living,floors,waterfront,view,condition,grade,yr_renovated,price
0,3,1.0,1180,1.0,0,0,3,7,0,221900
1,3,2.25,2570,2.0,0,0,3,7,1,538000
2,2,1.0,770,1.0,0,0,3,6,0,180000
3,4,3.0,1960,1.0,0,0,5,7,0,604000
4,3,2.0,1680,1.0,0,0,3,8,0,510000


In [49]:
# define a quick function to convert the year renovated into a boolean value
# this is because we know from our real estate experience, that a renovation has an impact on the price, 
# but the year of renovation usually doesn't matter, only the fact that it got renovated

def boolean(x):
    if x == 0:
        n = 0
    elif x > 0:
        n = 1
    return n

In [50]:
# apply booloean function to yr_renovated colum
df_drop['yr_renovated']=df['yr_renovated'].apply(boolean)
# check if that worked by looking at the head again
df_drop.head()

Unnamed: 0,bedrooms,bathrooms,sqft_living,floors,waterfront,view,condition,grade,yr_renovated,price
0,3,1.0,1180,1.0,0,0,3,7,0,221900
1,3,2.25,2570,2.0,0,0,3,7,1,538000
2,2,1.0,770,1.0,0,0,3,6,0,180000
3,4,3.0,1960,1.0,0,0,5,7,0,604000
4,3,2.0,1680,1.0,0,0,3,8,0,510000


## Normalizing the data

In [53]:
# normalizing the columns so the values get ditributed between 0 and 1
# we use preprocessing from the sklearn model to achieve this

x = df_drop.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler() # choose the model with which we are normalizin
x_scaled = min_max_scaler.fit_transform(x) # create normalized values
df_norm = pd.DataFrame(x_scaled) # create new normalized dataframe
df_norm.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.2,0.066667,0.061503,0.0,0.0,0.0,0.5,0.4,0.0,0.01888
1,0.2,0.233333,0.167046,0.4,0.0,0.0,0.5,0.4,1.0,0.060352
2,0.1,0.066667,0.030372,0.0,0.0,0.0,0.5,0.3,0.0,0.013382
3,0.3,0.333333,0.120729,0.0,0.0,0.0,1.0,0.4,0.0,0.069011
4,0.2,0.2,0.099468,0.0,0.0,0.0,0.5,0.5,0.0,0.056678


In [54]:
# define headers for the normalized dataframe
df_norm.columns =["bedrooms", "bathrooms","sqft_living", "floors", "waterfront", "view", "condition", "grade", "yr_renovated", "price"]

In [55]:
#check if the normalization worked by looking at all distributions 
df_norm.head(2)

Unnamed: 0,bedrooms,bathrooms,sqft_living,floors,waterfront,view,condition,grade,yr_renovated,price
0,0.2,0.066667,0.061503,0.0,0.0,0.0,0.5,0.4,0.0,0.01888
1,0.2,0.233333,0.167046,0.4,0.0,0.0,0.5,0.4,1.0,0.060352


***

### I will now export this data as a CSV file and use it in a copy of the applying_model notebook. 
Please refer to [this](https://github.com/Caparisun/Linear_Regression_Project/blob/master/Notebooks_and_data/3.Applying_Model.ipynb) notebook to see the original

In [57]:
# export to csv
df_norm.to_csv('norm_model.csv')

***