# Multiple Linear Regression - Exercise Solution

You are given a real estate dataset. 

Real estate is one of those examples that every regression course goes through as it is extremely easy to understand and there is a (almost always) certain causal relationship to be found.

The data is located in the file: 'real_estate_price_size_year.csv'. 

You are expected to create a multiple linear regression (similar to the one in the lecture), using the new data. 

In this exercise, the dependent variable is 'price', while the independent variables are 'size' and 'year'.

Good luck!

## Import the relevant libraries

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
sns.set()

## Load the data

In [21]:
data = pd.read_csv('real_estate_price_size_year.csv')
#data['rand'] = np.random.randint(1, 6, data.shape[0])
data['year'] = data['year'].map(lambda x: x-2000)

In [22]:
data.head()

Unnamed: 0,price,size,year
0,234314.144,643.09,15
1,228581.528,656.22,9
2,281626.336,487.29,18
3,401255.608,1504.75,15
4,458674.256,1275.46,9


In [23]:
data.describe()

Unnamed: 0,price,size,year
count,100.0,100.0,100.0
mean,292289.47016,853.0242,12.6
std,77051.727525,297.941951,4.729021
min,154282.128,479.75,6.0
25%,234280.148,643.33,9.0
50%,280590.716,696.405,15.0
75%,335723.696,1029.3225,18.0
max,500681.128,1842.51,18.0


## Create the regression

### Declare the dependent and the independent variables

In [24]:
y = data['price']
x1 = data[['size','year']]

### Regression

In [25]:
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.776
Model:,OLS,Adj. R-squared:,0.772
Method:,Least Squares,F-statistic:,168.5
Date:,"Wed, 23 Jan 2019",Prob (F-statistic):,2.7700000000000004e-32
Time:,15:53:56,Log-Likelihood:,-1191.7
No. Observations:,100,AIC:,2389.0
Df Residuals:,97,BIC:,2397.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,6.13e+04,1.57e+04,3.913,0.000,3.02e+04,9.24e+04
size,227.7009,12.474,18.254,0.000,202.943,252.458
year,2916.7853,785.896,3.711,0.000,1357.000,4476.571

0,1,2,3
Omnibus:,10.083,Durbin-Watson:,2.25
Prob(Omnibus):,0.006,Jarque-Bera (JB):,3.678
Skew:,0.095,Prob(JB):,0.159
Kurtosis:,2.08,Cond. No.,3850.0


In [32]:
new_data = pd.DataFrame({'const': 1,'size': [1200, 700], 'year': [15, 10]})
new_data = new_data[['const', 'size','year']]
new_data.rename(index={0: 'h1',1:'h2'})
new_data

Unnamed: 0,const,size,year
0,1,1200,15
1,1,700,10


In [33]:
predictions = results.predict(new_data)
predictions

0    378296.440924
1    249862.087286
dtype: float64

In [34]:
predictionsdf = pd.DataFrame({'Predictions':predictions})
joined = new_data.join(predictionsdf)
joined.rename(index={0: 'h1',1:'h2'})

Unnamed: 0,const,size,year,Predictions
h1,1,1200,15,378296.440924
h2,1,700,10,249862.087286
