# Multiple Linear Regression with sklearn on real estate data

Given a real estate dataset. 

 file: 'real_estate_price_size_year1.csv'. 

Creating a multiple linear regression and with the following: 

-  the intercept and coefficient(s)
-  the R-squared and Adjusted R-squared
-  Comparing the R-squared and the Adjusted R-squared
-  Using the model to make a prediction about an apartment with size 750 sq.ft. from 2009
-  Find the univariate (or multivariate if you wish - see the article) p-values of the two variables.
-  Create a summary table with your findings

In this exercise, the dependent variable is 'price', while the independent variables are 'size' and 'year'.

## Importing the relevant libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.linear_model import LinearRegression


## Load the data

In [2]:
data = pd.read_csv( 'real_estate_price_size_year1.csv')
data

Unnamed: 0,price,size,year
0,234314.144,643.09,2015
1,228581.528,656.22,2009
2,281626.336,487.29,2018
3,401255.608,1504.75,2015
4,458674.256,1275.46,2009
5,245050.280,575.19,2006
6,265129.064,570.89,2015
7,175716.480,620.82,2006
8,331101.344,682.26,2018
9,218630.608,694.52,2009


In [3]:
data.describe()

Unnamed: 0,price,size,year
count,100.0,100.0,100.0
mean,292289.47016,853.0242,2012.6
std,77051.727525,297.941951,4.729021
min,154282.128,479.75,2006.0
25%,234280.148,643.33,2009.0
50%,280590.716,696.405,2015.0
75%,335723.696,1029.3225,2018.0
max,500681.128,1842.51,2018.0


In [4]:
data.head()

Unnamed: 0,price,size,year
0,234314.144,643.09,2015
1,228581.528,656.22,2009
2,281626.336,487.29,2018
3,401255.608,1504.75,2015
4,458674.256,1275.46,2009


## Creating the regression

### Declaring the dependent and the independent variables

In [5]:
x = data[["size","year"]]
y = data["price"]

### Regression

In [6]:
reg = LinearRegression()

In [7]:
reg.fit(x,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

### Finding the intercept

In [8]:
reg.intercept_

-5772267.017463278

### Finding the coefficients

In [9]:
reg.coef_

array([ 227.70085401, 2916.78532684])

### Calculate the R-squared

In [10]:
reg.score(x,y)

0.7764803683276793

### Calculate the Adjusted R-squared

In [11]:
# R2 = 1-(1-r2)*n-1/n-p-1
r2 = reg.score(x,y)
n = x.shape[0]
p = x.shape[1]
Adjusted_r2 = 1-(1-r2)*n-1/(n-p-1)
Adjusted_r2

-21.362272445582587

### Compare the R-squared and the Adjusted R-squared

Answer...

### Compare the Adjusted R-squared with the R-squared of the simple linear regression

Answer...

### Making predictions

Find the predicted price of an apartment that has a size of 750 sq.ft. from 2009.

In [12]:
reg.predict([[750,2009]])

array([258330.34465995])

# Calculate the univariate p-values of the variables

In [13]:
from sklearn.feature_selection import f_regression

In [14]:
f_regression(x,y)

(array([285.92105192,   0.85525799]), array([8.12763222e-31, 3.57340758e-01]))

In [15]:
p_value = f_regression(x,y)[1]
p_value

array([8.12763222e-31, 3.57340758e-01])

In [16]:
p_value.round(3)

array([0.   , 0.357])

### Create a summary table with your findings

In [17]:
new_table = pd.DataFrame(data=x.columns.values, columns = ["features"])
new_table["coefficient"]  = reg.coef_
new_table["p_value"] = p_value.round(3)
new_table

Unnamed: 0,features,coefficient,p_value
0,size,227.700854,0.0
1,year,2916.785327,0.357


Answer...