# ML Tutorial Day 3

## Multivariate Linear Regression
We will see how to predict the prices when it depends on multiple factors like number of bedrooms, age of the house and area.

Our goal is to find the price of the house which has:
1. 3000 sqft area, 3 bedrooms, 40 years old.
1. 2500 sqft area, 4 bedrooms, 5 years old.

In [None]:
import pandas as pd
import numpy as np
from sklearn import linear_model as linMod

df = pd.read_csv('homeprices.csv')
df

Looking at the dataset, we see that one value is missing, i.e., row two has an empty cell in `bedroom` column. We need to do something about that.

Next, to acertain that linear regression would the right approach for this problem, we can see a trend that the price increases as the area or number of bedroom increases, while the price decreses as the age of the house increases.

If we look at the mathematical relationship, we get the following equation:

![image.png](attachment:image.png)

Here, the independent variables are also known as features. The above equation can be generalized in the following manner:

![image-2.png](attachment:image-2.png)

We will cover the following:
1. Data preprocessing: Handling `NA` values
2. Linear regression using multiple variables


In [None]:
# looking at the missing value in the bedroom column, we can replace it with the median of the column
median_bedrooms = df['bedrooms'].median()
df['bedrooms'] = df['bedrooms'].fillna(median_bedrooms)
df

In [None]:
# implementing the linear regression algorithm
linReg = linMod.LinearRegression()
linReg.fit(df[['area', 'bedrooms', 'age']].values, df['price'].values)

In [None]:
# cheking the coefficient and intercept
print(linReg.coef_)
print(linReg.intercept_)

Now our model is trained and we are ready to calculate the price of the houses.

In [None]:
print(linReg.predict([[3000, 3, 40]])[0])
print(linReg.predict([[2500, 4, 5]])[0])