# Linear Regression

To predict our rental housing prices, we will begin with a linear regression model. To find out more about the maths behind regression, check out the [following article](https://onlinestatbook.com/2/regression/regression.html).

In [28]:
"""
A simple linear regression model with all predictors and target variable of
price of unit area.
"""

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

In [29]:
# import csv dataset
df = pd.read_csv("../data/realestate.csv")

# split data into "X" and "y" set
X = df[["house age"]]
y = df["price_per_unit_area"]

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

#  split your data into a training and testing set, with 25% of data in the test set
X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.25)

In [31]:
# make linear regression object
reg = LinearRegression()

# fit model to data
reg.fit(X_train, y_train)

In [32]:
# generate predictions on X_test
y_pred = reg.predict(X_test)

# score MSE 
mse = mean_squared_error(y_test, y_pred)
print("MSE:", mse)

MSE: 100.85666170816141


In [33]:
# view accuracy of developed model
predictions = reg.predict(poly.fit_transform(X))

# scatter plot of house age & price of unit area
sns.scatterplot(data=df, x="house age", y="price_per_unit_area")
sns.lineplot(x=X.iloc[:, 0], y=predictions, color="r")
plt.show()



ValueError: X has 1 features, but LinearRegression is expecting 3 features as input.