# ***`Multiple Linear Regression`***

We are going to predict House Prices. A house's price doesn't just depend on its Size. It depends on:

Size (Sq ft)

Number of Rooms

Age of the House (Old vs New)

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
#new tool
from sklearn.metrics import r2_score 

In [2]:
#1. create a fake  data of 100 houses

np.random.seed(42) #makes sure to get random numbers everytime

In [3]:
#feature 1: SIZE of the plot in between 1000 to 3000 sqft 

size = np.random.randint(1000 , 3000 , 100)

In [4]:
# feature 2: Bedrooms ranging from 2 to 5

bedrooms = np.random.randint(2,6,100)

In [5]:
# feature 3: Age of the place ranging from 0 to 50 years

age = np.random.randint(0,50,100)

In [6]:
# The "Real" Price Formula (The Robot doesn't know this!)
# Base $50k + $100/sqft + $10k/bedroom - $500/year_old

price = 50000 + (size * 100) + (bedrooms * 10000) - (age*500) + np.random.randint(-5000 , 5000 , 100)

In [7]:
# Put it in a DataFrame so it looks nice

df = pd.DataFrame({'Size':size,'Bedrooms':bedrooms,'Age':age,'Price':price})
print(df)

    Size  Bedrooms  Age   Price
0   2126         3   38  278161
1   2459         5   48  318886
2   1860         3   31  253838
3   2294         3    3  305811
4   2130         3   29  275234
..   ...       ...  ...     ...
95  2795         5   28  363670
96  2845         2    2  349199
97  2500         4   19  330588
98  1702         4   35  247031
99  1401         5   18  226290

[100 rows x 4 columns]


In [11]:
from tabulate import tabulate
table = (tabulate(df, headers='keys', tablefmt='github', showindex=False))

In [14]:
print(table)

|   Size |   Bedrooms |   Age |   Price |
|--------|------------|-------|---------|
|   2126 |          3 |    38 |  278161 |
|   2459 |          5 |    48 |  318886 |
|   1860 |          3 |    31 |  253838 |
|   2294 |          3 |     3 |  305811 |
|   2130 |          3 |    29 |  275234 |
|   2095 |          5 |    36 |  288343 |
|   2724 |          3 |    22 |  345080 |
|   2044 |          4 |    38 |  273376 |
|   2638 |          5 |    44 |  338759 |
|   1121 |          4 |    14 |  195859 |
|   1466 |          5 |    42 |  227294 |
|   2238 |          3 |    28 |  287185 |
|   1330 |          4 |    35 |  205236 |
|   2482 |          5 |    12 |  339002 |
|   1087 |          2 |    31 |  166355 |
|   2396 |          3 |     6 |  319720 |
|   2123 |          5 |    21 |  303416 |
|   1871 |          2 |    27 |  244134 |
|   2687 |          5 |     1 |  367261 |
|   1130 |          2 |    41 |  157762 |
|   2685 |          3 |    44 |  327787 |
|   2332 |          4 |     5 |  3

# The Split

In [15]:
# X is now 3 columns

X = df[['Size','Bedrooms' ,'Age']]
y = df['Price']

X_train , X_test , y_train , y_test= train_test_split(X, y , test_size = 0.2, random_state = 42)

# Train the Model

In [16]:
model = LinearRegression()
model.fit(X_train , y_train)

# The Inspection (Coefficients)

In [17]:
# This will print 3 numbers corresponding to [Size, Bedrooms, Age]

print("Co- efficients", model.coef_)
print("Intercept", model.intercept_)

Co- efficients [   99.90252051 10098.28550674  -535.48828204]
Intercept 50494.71727998307


The 1st number should be around 100 (Size).

The 2nd number should be around 10,000 (Bedrooms).

The 3rd number should be negative (around -500), because older houses are worth less!

# The Report Card (R-Squared)

How good is the model? We check the R^2 Score.

1.0 = Perfect Score (God-like).

0.0 = F- (Random guessing).

In [22]:
y_pred = model.predict(X_test)
score = r2_score(y_test , y_pred)

print(f"Model Accuracy  [ R-square ] : {score:.2f}")

Model Accuracy  [ R-square ] : 0.99


# Make a Custom Prediction

Let's predict the price of your dream house.

House: 2500 sq ft, 4 bedrooms, 5 years old.

In [21]:
# Must match the order: [Size, Bedrooms, Age]

my_house = pd.DataFrame([[2500 , 4, 5]], columns=['Size','Bedrooms','Age'])

predicted_price = model.predict(my_house)
print(f"Predicted Price : ${predicted_price[0]:,.2f}")

Predicted Price : $337,966.72
