# Multiple Linear Regression

### Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Importing dataset

In [None]:
df = pd.read_csv('./sample_data/Real estate.csv',na_values=['??'])

In [None]:
df.head()

Unnamed: 0,No,X1 transaction date,X2 house age,X3 distance to the nearest MRT station,X4 number of convenience stores,X5 latitude,X6 longitude,Y house price of unit area
0,1,2012.917,32.0,84.87882,10,24.98298,121.54024,37.9
1,2,2012.917,19.5,306.5947,9,24.98034,121.53951,42.2
2,3,2013.583,13.3,561.9845,5,24.98746,121.54391,47.3
3,4,2013.5,13.3,561.9845,5,24.98746,121.54391,54.8
4,5,2012.833,5.0,390.5684,5,24.97937,121.54245,43.1


### Dataset description

1.  transaction date - date on which transaction was carried out
2.  house age - age of house in years
3.  distance to the nearest MRT station - distance of house from Mass Rapid Transit station
4.  number of convenience stores = No. of stores near to the house
5.  latitude - latitude of house
6.  longitude - longitude of house

### Dropping Nan values if any and 'No' column

In [None]:
df.dropna(inplace=True)
df.drop(columns=['No'], inplace=True)

### Dataset shape

In [None]:
df.shape

(403, 7)

### Generating features and labels

In [None]:
x = df.loc[:, 'X1 transaction date':'X6 longitude'].values
y = df['Y house price of unit area'].values.reshape(-1,1)

### Splitting the dataset into train and test

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)

### Importing Linear Regression model and fitting train dataset on model

In [None]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression(normalize=True)
regressor.fit(x_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)

### Predicting output for test features

In [None]:
y_pred = regressor.predict(x_test)

### Printing errors and R2 score

In [None]:
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score 
print('Mean Squared Error => ', mean_squared_error(y_test,y_pred))
print('Mean Absolute Error => ', mean_absolute_error(y_test,y_pred))
print('R2 score => ', r2_score(y_test,y_pred))

Mean Squared Error =>  55.47366509555721
Mean Absolute Error =>  5.6814242307588065
R2 score =>  0.5836173306500637


### Printing coefficients

In [None]:
print('Coefficients : ', regressor.coef_)
print('Regressor intercept ', regressor.intercept_)

Coefficients :  [[ 6.45720460e+00 -2.63444182e-01 -4.67009892e-03  1.08482432e+00
   2.21714875e+02  7.04259734e+00]]
Regressor intercept  [-19347.25768737]


### Comparing predicted and actual values

In [None]:
comparison = {
    'actual': y_test.flatten(),
    'predicted': y_pred.flatten()
}

In [None]:
temp = pd.DataFrame(comparison)

In [None]:
temp.head(10)

Unnamed: 0,actual,predicted
0,31.6,47.652783
1,36.7,46.022305
2,42.4,41.406355
3,29.8,33.545323
4,44.3,44.93317
5,21.8,30.884354
6,39.1,40.898473
7,33.4,36.928207
8,37.5,43.265004
9,28.5,34.738996


### Interpretation

It can be observed that the predicted and actual values are very much near to each other 