## 1. Create new Git branch
##### On terminal
- Create a new git branch called adv_mla_1_poly
<br /> git checkout -b adv_mla_1_poly # Switched to a new branch 'adv_mla_1_poly' # Switched to a new branch 'adv_mla_1_poly'
- Launch Jupyter Lab from your virtual environment
<br /> poetry run jupyter lab

## 2. Load the dataset

In [1]:
# Launch magic commands to automatically reload modules
%load_ext autoreload
%autoreload 2

In [2]:
# Import the pandas, numpy packages and dump from joblib
import pandas as pd
import numpy as np
from joblib import dump

In [3]:
# Load the saved sets from data/processed
X_train = pd.read_csv('../data/processed/X_train.csv')
X_val   = pd.read_csv('../data/processed/X_val.csv'  )
X_test  = pd.read_csv('../data/processed/X_test.csv' )
y_train = pd.read_csv('../data/processed/y_train.csv')
y_val   = pd.read_csv('../data/processed/y_val.csv'  )
y_test  = pd.read_csv('../data/processed/y_test.csv' )

## 3. Apply Polynomial Transformation

In [4]:
# Import PolynomialFeatures from sklearn.preprocessing
from sklearn.preprocessing import PolynomialFeatures

In [5]:
# Instantiate a PolynomialFeatures with degree 2
poly = PolynomialFeatures(2)

In [6]:
# Fit the PolynomialFeatures and perform transformation on X_train
X_train = poly.fit_transform(X_train)

In [7]:
# Display the dimensions of X_train
X_train.shape

(32000, 45)

In [8]:
# Perform transformation on X_val and X_test with PolynomialFeatures
X_val = poly.transform(X_val)
X_test = poly.transform(X_test)

## 4. Train Linear Regression model

In [9]:
# Import the linear regression module from sklearn
from sklearn.linear_model import LinearRegression

In [10]:
# Task: instantiate the LinearRegression class into a variable called reg
reg = LinearRegression()

In [11]:
# Task: Fit the model with the prepared data
reg.fit(X_train, y_train)

In [12]:
# Import dump from joblib and save the fitted model into the folder models as a file called linear_poly_2
from joblib import dump
dump(reg,  '../models/linear_poly_2.joblib')

['../models/linear_poly_2.joblib']

In [13]:
# Save the predictions from this model for the training and validation sets into 2 variables called y_train_preds and y_val_preds
y_train_preds = reg.predict(X_train)
y_val_preds = reg.predict(X_val)

In [14]:
# Import mean_squared_error and mean_absolute_error from sklearn.metrics
from sklearn.metrics import root_mean_squared_error as rmse
from sklearn.metrics import mean_absolute_error as mae

In [15]:
# Display the RMSE and MAE scores of this model on the training set
print(rmse(y_train_preds, y_train))
print(mae(y_train_preds, y_train))

6108.7816870858305
4104.514999981773


In [16]:
# Display the RMSE and MAE scores of this model on the validation set
print(rmse(y_val_preds, y_val))
print(mae(y_val_preds, y_val))

6107.831215961238
4105.425597989007


In [17]:
# Display the RMSE and MAE scores of this model on the testing set
y_test_preds = reg.predict(X_test)
print(rmse(y_test_preds, y_test))
print(mae(y_test_preds, y_test))

6183.970656308844
4213.112978610777


## 5. Push changes
- Add you changes to git staging area
  <br /> git add .
- Create the snapshot of your repository and add a description
  <br /> git commit -m "linear regression with poly 2"
- Push your snapshot to Github （after running code, go to to github and merge your change to the master/main branch）
  <br /> git push -u origin adv_mla_1_poly
  
- Check out to the master branch
  <br /> git checkout master
- Pull the latest updates
  <br /> git pull