# House Price Calculator Model v1.0

## Import Data

In [17]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
 
df = pd.read_csv("../data/house_data.csv")
 
# Printing first 5 records of the dataset
print(df.head(5))
df.shape

           id             date     price  bedrooms  bathrooms  sqft_living  \
0  7129300520  20141013T000000  221900.0         3       1.00         1180   
1  6414100192  20141209T000000  538000.0         3       2.25         2570   
2  5631500400  20150225T000000  180000.0         2       1.00          770   
3  2487200875  20141209T000000  604000.0         4       3.00         1960   
4  1954400510  20150218T000000  510000.0         3       2.00         1680   

   sqft_lot  floors  waterfront  view  ...  grade  sqft_above  sqft_basement  \
0      5650     1.0           0     0  ...      7        1180              0   
1      7242     2.0           0     0  ...      7        2170            400   
2     10000     1.0           0     0  ...      6         770              0   
3      5000     1.0           0     0  ...      7        1050            910   
4      8080     1.0           0     0  ...      8        1680              0   

   yr_built  yr_renovated  zipcode      lat     lo

(21613, 21)

## Feature Selection

In [18]:
columns = ['bedrooms', 
	'bathrooms', 
	'floors', 
	'yr_built', 
	'price']

df = df[columns]

X = df.iloc[:, 0:4]
y = df.iloc[:, 4:]

df.head(5)

Unnamed: 0,bedrooms,bathrooms,floors,yr_built,price
0,3,1.0,1.0,1955,221900.0
1,3,2.25,2.0,1951,538000.0
2,2,1.0,1.0,1933,180000.0
3,4,3.0,1.0,1965,604000.0
4,3,2.0,1.0,1987,510000.0


## Split the data

In [19]:
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
 
X = df.drop(['price'], axis=1)
Y = df['price']
 
# Split the dataset into training and validation set
X_train, X_valid, Y_train, Y_valid = train_test_split(
    X, Y, train_size=0.8, test_size=0.2, random_state=0)

## Train Model

### SVM – Support vector Machine

SVM can be used for both regression and classification model. It finds the hyperplane in the n-dimensional plane. To read more about svm [refer this](https://www.geeksforgeeks.org/support-vector-machine-algorithm/).



In [20]:

from sklearn import svm
from sklearn.svm import SVC
from sklearn.metrics import mean_absolute_percentage_error
 
model_SVR = svm.SVR()
model_SVR.fit(X_train,Y_train)
Y_pred = model_SVR.predict(X_valid)
 
print(mean_absolute_percentage_error(Y_valid, Y_pred))

0.4260895330802694


### Random Forest Regression

Random Forest is an ensemble technique that uses multiple of decision trees and can be used for both regression and classification tasks. To read more about random forests [refer this](https://www.geeksforgeeks.org/random-forest-regression-in-python/).

In [21]:
from sklearn.ensemble import RandomForestRegressor
 
model_RFR = RandomForestRegressor(n_estimators=10)
model_RFR.fit(X_train, Y_train)
Y_pred = model_RFR.predict(X_valid)
 
mean_absolute_percentage_error(Y_valid, Y_pred)

0.3898434754771682

### Linear Regression

Linear Regression predicts the final output-dependent value based on the given independent features. Like, here we have to predict price depending on features like number of bedrooms, bathrooms, floors and the year of construction. To read more about Linear Regression [refer this](https://www.geeksforgeeks.org/ml-linear-regression/).

In [22]:
from sklearn.linear_model import LinearRegression
 
model_LR = LinearRegression()
model_LR.fit(X_train, Y_train)
Y_pred = model_LR.predict(X_valid)
 
print(mean_absolute_percentage_error(Y_valid, Y_pred))

0.41123936322827975


## Dumping Model

In [23]:
import pickle
pickle.dump(model_RFR, open('model_v1.0.pkl', 'wb'))