# Simple linear regression - Exercise

You are given a real estate dataset. 

Real estate is one of those examples that every regression course goes through as it is extremely easy to understand and there is a (almost always) certain causal relationship to be found.

The data is located in the file: 'real_estate_price_size.csv'. 

You are expected to create a simple linear regression (similar to the one in the lecture), using the new data. 

Apart from that, please:
-  Create a scatter plot (with or without a regression line)
-  Calculate the R-squared
-  Display the intercept and coefficient(s)
-  Using the model make a prediction about an apartment with size 750 sq.ft.

Note: In this exercise, the dependent variable is 'price', while the independent variable is 'size'.

Good luck!

## Import the relevant libraries

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import seaborn as sns
sns.set()

## Load the data

In [5]:
data = pd.read_csv('real_estate_price_size.csv')

In [6]:
data.head()

Unnamed: 0,price,size
0,234314.144,643.09
1,228581.528,656.22
2,281626.336,487.29
3,401255.608,1504.75
4,458674.256,1275.46


## Create the regression

### Declare the dependent and the independent variables

In [7]:
x = data['size']
y = data['price']

### Explore the data

In [11]:
x.shape

(100,)

In [12]:
y.shape

(100,)

### Transform the inputs into a matrix (2D object)

In [13]:
x_matrix = x.values.reshape(-1,1)

In [14]:
x_matrix.shape

(100, 1)

### Regression itself

In [32]:
x_matrix

array([[ 643.09],
       [ 656.22],
       [ 487.29],
       [1504.75],
       [1275.46],
       [ 575.19],
       [ 570.89],
       [ 620.82],
       [ 682.26],
       [ 694.52],
       [1060.36],
       [1842.51],
       [ 694.52],
       [1009.25],
       [1300.96],
       [1379.72],
       [ 690.54],
       [ 623.94],
       [ 681.07],
       [1027.76],
       [ 620.71],
       [ 549.69],
       [1207.45],
       [ 518.38],
       [ 525.81],
       [1103.3 ],
       [ 570.89],
       [1334.1 ],
       [ 681.07],
       [1496.36],
       [1010.33],
       [ 681.07],
       [ 597.9 ],
       [ 525.81],
       [ 857.54],
       [ 622.97],
       [ 823.21],
       [ 570.25],
       [ 685.48],
       [ 698.29],
       [1021.95],
       [ 682.26],
       [ 823.21],
       [1334.1 ],
       [1060.36],
       [ 698.29],
       [ 633.19],
       [ 698.29],
       [ 633.19],
       [ 617.05],
       [ 647.5 ],
       [1021.95],
       [1021.95],
       [ 727.88],
       [ 647.5 ],
       [15

In [15]:
reg = LinearRegression()

In [16]:
reg.fit(x_matrix,y)

LinearRegression()

### Calculate the R-squared

In [20]:
reg.score(x_matrix,y)

0.7447391865847587

### Find the intercept

In [21]:
reg.intercept_

101912.60180122912

### Find the coefficients

In [22]:
reg.coef_

array([223.17874259])

### Making predictions

You find an apartment online with a size of 750 sq.ft.

All else equal what should be its price according to the model?

In [45]:
pred = pd.DataFrame(data = ['750'], columns = ['size'])

In [47]:
pred.shape

(1, 1)

In [48]:
reg.predict(pred)



array([269296.65874718])