# Multiple Linear Regression with sklearn - Exercise Solution

You are given a real estate dataset. 

Real estate is one of those examples that every regression course goes through as it is extremely easy to understand and there is a (almost always) certain causal relationship to be found.

The data is located in the file: 'real_estate_price_size_year.csv'. 

You are expected to create a multiple linear regression (similar to the one in the lecture), using the new data. 

Apart from that, please:
-  Display the intercept and coefficient(s)
-  Find the R-squared and Adjusted R-squared
-  Compare the R-squared and the Adjusted R-squared
-  Compare the R-squared of this regression and the simple linear regression where only 'size' was used
-  Using the model make a prediction about an apartment with size 750 sq.ft. from 2009
-  Find the univariate (or multivariate if you wish - see the article) p-values of the two variables. What can you say about them?
-  Create a summary table with your findings

In this exercise, the dependent variable is 'price', while the independent variables are 'size' and 'year'.

Good luck!

## Import the relevant libraries

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from sklearn.linear_model import LinearRegression

## Load the data

In [4]:
data = pd.read_csv('real_estate_price_size_year.csv')

In [6]:
data.head()

Unnamed: 0,price,size,year
0,234314.144,643.09,2015
1,228581.528,656.22,2009
2,281626.336,487.29,2018
3,401255.608,1504.75,2015
4,458674.256,1275.46,2009


## Create the regression

### Declare the dependent and the independent variables

In [7]:
x = data[['price','size']]
y = data['year']

### Regression

In [9]:
reg = LinearRegression()
reg.fit(x,y)

LinearRegression()

### Find the intercept

In [10]:
reg.intercept_

2009.5777817735361

### Find the coefficients

In [11]:
reg.coef_

array([ 4.26318792e-05, -1.10649043e-02])

### Calculate the R-squared

In [12]:
reg.score(x,y)

0.13270264611912963

### Calculate the Adjusted R-squared

In [18]:
x.shape

(100, 2)

In [34]:
def adj_r2(x,y):
    r2 = reg.score(x,y)
    n = x.shape[0]
    p = x.shape[1]
    adjusted_r2 = 1-(1-r2)*(n-1)/(n-p-1)
    return adjusted_r2

In [36]:
adj_r2(x,y)

0.11482022645148282

### Compare the R-squared and the Adjusted R-squared

Answer... R-Sqaured is little bit larger than Adjusted R-Squared

### Compare the Adjusted R-squared with the R-squared of the simple linear regression

Answer...In Simple Regression 'Year' is not Bringing too much of a value

### Making predictions

Find the predicted price of an apartment that has a size of 750 sq.ft. from 2009.

In [23]:
reg.predict([[750,2009]])



array([1987.38036292])

### Calculate the univariate p-values of the variables

In [24]:
from sklearn.feature_selection import f_regression

In [25]:
f_regression(x,y)

(array([0.85525799, 0.94402621]), array([0.35734076, 0.33363658]))

In [29]:
p_values = f_regression(x,y)[1]
p_values

array([0.35734076, 0.33363658])

In [32]:
p_values.round(3)

array([0.357, 0.334])

### Create a summary table with your findings

In [33]:
reg_summary = pd.DataFrame(data = x.columns.values, columns=['Features'])
reg_summary ['Coefficients'] = reg.coef_
reg_summary ['p-values'] = p_values.round(3)
reg_summary

Unnamed: 0,Features,Coefficients,p-values
0,price,4.3e-05,0.357
1,size,-0.011065,0.334


Answer...Year is not mandatory so i removed it from summary table