### Setup

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn import linear_model

# Training data

In [2]:
df = pd.read_csv(r'data\2_homeprices.csv')
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


## Features and Labels
Here we will use __area of the house__, __number of bedrooms__ and __the age of the house__ as _features_ to determine the __price__ of the house, which will be our _label_.

We can represent this as an equation:

_`price` = (m1 x area) + (m2 x bedrooms) + (m3 x age) + b_

_b, m1, m2, m3_ are also written in academic literature as _θ0, θ1, θ2, θ3_.

But before we train our model first we must deal with missing values in our data.

##### Filling missing values
In this simple case we will just use the median value to fill the number of bedrooms.

In [3]:
df['bedrooms'].fillna(df['bedrooms'].median(), inplace=True)
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,4.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


# Training our Linear Model

In [4]:
model = linear_model.LinearRegression()
model.fit(X=df[['area', 'bedrooms', 'age']], y=df['price'])

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

##### Coeficients
Below are shown the coeficients _(θ1, θ2, θ3)_ for each of our features: __area__, __bedrooms__ and __age__.

In [5]:
model.coef_

array([  112.06244194, 23388.88007794, -3231.71790863])

##### Intercept
Below is the y intercept _(θ0)_ for our model

In [6]:
model.intercept_

221323.0018654043

# Prediction

Prediction for a house of area: 3000 sqft, 3 bedrooms and 40yrs of age

In [7]:
model.predict([[3000, 3, 40]])

array([498408.25158031])

In [8]:
model.predict([[2500, 3, 40]])

array([442377.03060924])