# Multiple Linear Regression with Dummies - Exercise

You are given a real estate dataset. 

Real estate is one of those examples that every regression course goes through as it is extremely easy to understand and there is a (almost always) certain causal relationship to be found.

The data is located in the file: 'real_estate_price_size_year_view.csv'. 

You are expected to create a multiple linear regression (similar to the one in the lecture), using the new data. 

In this exercise, the dependent variable is 'price', while the independent variables are 'size', 'year', and 'view'.

#### Regarding the 'view' variable:
There are two options: 'Sea view' and 'No sea view'. You are expected to create a dummy variable for view and include it in the regression

Good luck!

## Import the relevant libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn
seaborn.set()

## Load the data

In [2]:
data = pd.read_csv('real_estate_price_size_year_view.csv')

In [3]:
data

Unnamed: 0,price,size,year,view
0,234314.144,643.09,2015,No sea view
1,228581.528,656.22,2009,No sea view
2,281626.336,487.29,2018,Sea view
3,401255.608,1504.75,2015,No sea view
4,458674.256,1275.46,2009,Sea view
...,...,...,...,...
95,252460.400,549.80,2009,Sea view
96,310522.592,1037.44,2009,No sea view
97,383635.568,1504.75,2006,No sea view
98,225145.248,648.29,2015,No sea view


In [4]:
data.describe()

Unnamed: 0,price,size,year
count,100.0,100.0,100.0
mean,292289.47016,853.0242,2012.6
std,77051.727525,297.941951,4.729021
min,154282.128,479.75,2006.0
25%,234280.148,643.33,2009.0
50%,280590.716,696.405,2015.0
75%,335723.696,1029.3225,2018.0
max,500681.128,1842.51,2018.0


## Create a dummy variable for 'view'

In [5]:
data['view'] = data['view'].map({'No sea view' : 0, 'Sea view' : 1})

In [6]:
data

Unnamed: 0,price,size,year,view
0,234314.144,643.09,2015,0
1,228581.528,656.22,2009,0
2,281626.336,487.29,2018,1
3,401255.608,1504.75,2015,0
4,458674.256,1275.46,2009,1
...,...,...,...,...
95,252460.400,549.80,2009,1
96,310522.592,1037.44,2009,0
97,383635.568,1504.75,2006,0
98,225145.248,648.29,2015,0


## Create the regression

### Declare the dependent and the independent variables

In [7]:
y = data['price']
x1 = data[['size', 'year', 'view']]

### Regression

In [8]:
x = sm.add_constant(x1)
result = sm.OLS(y, x).fit()
result.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.913
Model:,OLS,Adj. R-squared:,0.91
Method:,Least Squares,F-statistic:,335.2
Date:,"Fri, 28 Oct 2022",Prob (F-statistic):,1.02e-50
Time:,20:24:05,Log-Likelihood:,-1144.6
No. Observations:,100,AIC:,2297.0
Df Residuals:,96,BIC:,2308.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-5.398e+06,9.94e+05,-5.431,0.000,-7.37e+06,-3.43e+06
size,223.0316,7.838,28.455,0.000,207.473,238.590
year,2718.9489,493.502,5.510,0.000,1739.356,3698.542
view,5.673e+04,4627.695,12.258,0.000,4.75e+04,6.59e+04

0,1,2,3
Omnibus:,29.224,Durbin-Watson:,1.965
Prob(Omnibus):,0.0,Jarque-Bera (JB):,64.957
Skew:,1.088,Prob(JB):,7.85e-15
Kurtosis:,6.295,Cond. No.,942000.0


## Make Predictions

I will make predictions for the price of 2 real estates with:
<ul>
<li> size = 300; year = 2021; view = 1 </li>
<li> size = 800; year = 2019; view = 0 </li>
</ul>

In [9]:
x

Unnamed: 0,const,size,year,view
0,1.0,643.09,2015,0
1,1.0,656.22,2009,0
2,1.0,487.29,2018,1
3,1.0,1504.75,2015,0
4,1.0,1275.46,2009,1
...,...,...,...,...
95,1.0,549.80,2009,1
96,1.0,1037.44,2009,0
97,1.0,1504.75,2006,0
98,1.0,648.29,2015,0


In [10]:
new_data = pd.DataFrame({'const' : 1, 'size' : [300, 800], 'year' : [2021, 2019], 'view' : [1, 0]})
new_data

Unnamed: 0,const,size,year,view
0,1,300,2021,1
1,1,800,2019,0


In [11]:
new_data.index = ['New small home Sea View', 'Old large home Hill View']
new_data

Unnamed: 0,const,size,year,view
New small home Sea View,1,300,2021,1
Old large home Hill View,1,800,2019,0


In [12]:
predictions = result.predict(new_data)

In [13]:
predictions

New small home Sea View     220717.028005
Old large home Hill View    270068.920150
dtype: float64

In [14]:
predictions = pd.DataFrame({'Predictions':predictions})
new_data = new_data.join(predictions)
new_data

Unnamed: 0,const,size,year,view,Predictions
New small home Sea View,1,300,2021,1,220717.028005
Old large home Hill View,1,800,2019,0,270068.92015
