## Predicting California Housing Price With Polynomial Regression
- Database used: sklearn's california_housing
- Model used: Polynomial Regression

### Global Variables

In [1]:
DEGREE = 2 # Degree of the polynomial regression model

### Importing The Data

In [2]:
from sklearn.datasets import fetch_california_housing
import numpy as np
from sklearn.model_selection import train_test_split

In [3]:
housing = fetch_california_housing()



In [4]:
X = housing.data
y = housing.target

### Adding Polynomial Columns

In [5]:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=DEGREE, include_bias=False)
print(len(X[0]))
X = poly.fit_transform(X)
print(len(X[0]))

8
44


### Spliting the Data

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Feature Scaling
- Important Note: When scaling test dataset, we use the mean and SD of the training data set, not the mean and SD of the test datasets. This is because we use the test dataset as a verdict to test our model's performance on unseen data, hence, we don't want to give our model any information except the X_test, including the mean and SD of the test dataset.

In [7]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### Training The Model

In [8]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

model = LinearRegression()
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)

training_error = np.sqrt(mean_squared_error(y_train, model.predict(X_train_scaled)))
error = np.sqrt(mean_squared_error(y_test, y_pred))

print("Error on training set", training_error)
print("Error on test set", error)

Error on training set 0.6486344233521343
Error on test set 0.6813967448044512


### Prediction Function
use the function below to get the estimated price for a house, guessed by the model

In [22]:
def predict(x):
  x = x.reshape(1, -1)
  x = poly.transform(x)
  x = scaler.transform(x)
  return model.predict(x)[0]

x = np.array([8.3252, 41, 6.98412698, 1.02380952, 322, 2.55555556, 37.88, -122.23])
print(predict(x))

4.007520962383484
