# Simple linear regression - Exercise

You are given a real estate dataset. 

Real estate is one of those examples that every regression course goes through as it is extremely easy to understand and there is a (almost always) certain causal relationship to be found.

The data is located in the file: 'real_estate_price_size.csv'. 

You are expected to create a simple linear regression (similar to the one in the lecture), using the new data. 

Apart from that, please:
-  Create a scatter plot (with or without a regression line)
-  Calculate the R-squared
-  Display the intercept and coefficient(s)
-  Using the model make a prediction about an apartment with size 750 sq.ft.

Note: In this exercise, the dependent variable is 'price', while the independent variable is 'size'.

Good luck!

## Import the relevant libraries

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from sklearn.linear_model import LinearRegression

## Load the data

In [3]:
data = pd.read_csv('real_estate_price_size.csv')

In [4]:
data.head()

<bound method NDFrame.head of          price     size
0   234314.144   643.09
1   228581.528   656.22
2   281626.336   487.29
3   401255.608  1504.75
4   458674.256  1275.46
5   245050.280   575.19
6   265129.064   570.89
7   175716.480   620.82
8   331101.344   682.26
9   218630.608   694.52
10  279555.096  1060.36
11  494778.992  1842.51
12  215472.104   694.52
13  418753.008  1009.25
14  444192.008  1300.96
15  440201.616  1379.72
16  248337.600   690.54
17  234178.160   623.94
18  225451.984   681.07
19  299416.976  1027.76
20  268125.080   620.71
21  171795.240   549.69
22  412569.472  1207.45
23  183459.488   518.38
24  168047.264   525.81
25  362519.720  1103.30
26  271793.312   570.89
27  406852.304  1334.10
28  297760.440   681.07
29  368988.432  1496.36
..         ...      ...
70  276875.632  1021.95
71  181587.576   643.41
72  298926.496   656.22
73  211724.096   549.80
74  228313.024   685.48
75  286161.600   685.48
76  382120.152  1183.46
77  365863.936  1334.10
78  251560

## Create the regression

### Declare the dependent and the independent variables

In [5]:
x = data['size']
y = data['price']

### Explore the data

In [6]:
x.shape
y.shape

(100,)

### Transform the inputs into a matrix (2D object)

In [8]:
x_matrix = x.values.reshape(-1, 1)
x_matrix.shape

(100, 1)

### Regression itself

In [9]:
reg = LinearRegression()
reg.fit(x_matrix, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

### Calculate the R-squared

In [10]:
reg.score(x_matrix, y)

0.7447391865847586

### Find the intercept

In [12]:
reg.intercept_

101912.60180122906

### Find the coefficients

In [13]:
reg.coef_

array([223.17874259])

### Making predictions

You find an apartment online with a size of 750 sq.ft.

All else equal what should be its price according to the model?

In [15]:
reg.predict( np.array([[750]]) )

array([269296.65874718])

Thus that would be $269,297 to the nearest dollar.