# Multiple Linear Regression with Dummies - Exercise

You are given a real estate dataset. 

Real estate is one of those examples that every regression course goes through as it is extremely easy to understand and there is a (almost always) certain causal relationship to be found.

The data is located in the file: 'real_estate_price_size_year_view.csv'. 

You are expected to create a multiple linear regression (similar to the one in the lecture), using the new data. 

In this exercise, the dependent variable is 'price', while the independent variables are 'size', 'year', and 'view'.

#### Regarding the 'view' variable:
There are two options: 'Sea view' and 'No sea view'. You are expected to create a dummy variable for view and include it in the regression

Good luck!

## Import the relevant libraries

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from Detailed_Details import Detailed_Details
import statsmodels.api as sm
import seaborn
seaborn.set()

## Load the data

In [2]:
df=pd.read_csv('datasets/real_estate_price_size_year_view.csv')

In [4]:
df.sample(5)

Unnamed: 0,price,size,year,view
38,292965.216,685.48,2018,Sea view
17,234178.16,623.94,2006,Sea view
51,393069.76,1021.95,2015,Sea view
0,234314.144,643.09,2015,No sea view
19,299416.976,1027.76,2018,No sea view


In [7]:
Detailed_Details(df,'view','price',2)

0,1,2,3,4,5,6
view,Total No. (view),Percentage (view),Total No. (Greater Than Mean),Percentage (Greater Than Mean),Total No. (Less Than Mean),Percentage (Less Than Mean)
No sea view,51,51.0 %,16,31.37 %,35,68.63 %
Sea view,49,49.0 %,29,59.18 %,20,40.82 %


## Create a dummy variable for 'view'

In [8]:
m={'No sea view':0,'Sea view':1}
df['view']=df['view'].map(m)

In [9]:
df.sample(5)

Unnamed: 0,price,size,year,view
88,211904.536,601.66,2018,0
77,365863.936,1334.1,2006,0
72,298926.496,656.22,2015,1
75,286161.6,685.48,2018,1
78,251560.04,682.26,2009,1


In [10]:
Detailed_Details(df,'view','price',2)

0,1,2,3,4,5,6
view,Total No. (view),Percentage (view),Total No. (Greater Than Mean),Percentage (Greater Than Mean),Total No. (Less Than Mean),Percentage (Less Than Mean)
0,51,51.0 %,16,31.37 %,35,68.63 %
1,49,49.0 %,29,59.18 %,20,40.82 %


## Create the regression

### Declare the dependent and the independent variables

In [11]:
y=df['price']
x1=df[['size','year','view']]

### Regression

In [15]:
x=sm.add_constant(x1)
results=sm.OLS(y,x).fit()

In [16]:
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.913
Model:,OLS,Adj. R-squared:,0.91
Method:,Least Squares,F-statistic:,335.2
Date:,"Wed, 14 Dec 2022",Prob (F-statistic):,1.02e-50
Time:,21:09:11,Log-Likelihood:,-1144.6
No. Observations:,100,AIC:,2297.0
Df Residuals:,96,BIC:,2308.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-5.398e+06,9.94e+05,-5.431,0.000,-7.37e+06,-3.43e+06
size,223.0316,7.838,28.455,0.000,207.473,238.590
year,2718.9489,493.502,5.510,0.000,1739.356,3698.542
view,5.673e+04,4627.695,12.258,0.000,4.75e+04,6.59e+04

0,1,2,3
Omnibus:,29.224,Durbin-Watson:,1.965
Prob(Omnibus):,0.0,Jarque-Bera (JB):,64.957
Skew:,1.088,Prob(JB):,7.85e-15
Kurtosis:,6.295,Cond. No.,942000.0
