# **Lab: ML Engineering**




## Exercise 4: Organising Git Repository

This time we will perform a polynomial transformation before training a Linear Regression model.


**Pre-requisites:**
- Create a github account (https://github.com/join)
- Install git (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)


The steps are:
1.   Create new Git branch
2.   Load the dataset
3.   Apply Polynomial Transformation
4.   Train Linear Regression model
5.   Push changes


## 1. Create new Git branch


**[1.1]** Create a new git branch called `adv_dsi_1_3`


In [None]:
git checkout -b adv_dsi_1_3

**[1.2]** Navigate the folder `notebooks` and create a new jupyter notebook called `3_linear_poly.ipynb`

## 2. Load the dataset


**[2.1]** Import the pandas, numpy packages and dump from joblib

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Import the pandas, numpy packages and dump from joblib

In [1]:
# Solution
import pandas as pd
import numpy as np
from joblib import dump

**[2.2]** Load the saved sets from `data/processed` using numpy

In [None]:
# Placeholder for student's code (6 lines of Python code)
# Task: Load the saved sets from data/processed using numpy

In [3]:
#Solution:
X_train = np.load('./data/processed/X_train.npy')
X_val   = np.load('./data/processed/X_val.npy'  )
X_test  = np.load('./data/processed/X_test.npy' )
y_train = np.load('./data/processed/y_train.npy')
y_val   = np.load('./data/processed/y_val.npy'  )
y_test  = np.load('./data/processed/y_test.npy' )

## 3. Apply Polynomial Transformation

**[3.1]** Create a new git branch called `adv_dsi_1_3`

In [None]:
git checkout -b adv_dsi_1_3

**[3.2]** Navigate the folder `notebooks` and create a new jupyter notebook called `3_linear_poly.ipynb`

**[3.3]** Import PolynomialFeatures from sklearn.preprocessing

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import PolynomialFeatures from sklearn.preprocessing

In [4]:
# Solution:
from sklearn.preprocessing import PolynomialFeatures

**[3.4]** Instantiate a PolynomialFeatures with degree 2

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Instantiate a PolynomialFeatures with degree 2

In [5]:
# Solution:
poly = PolynomialFeatures(2)

**[3.5]** Fit the PolynomialFeatures and perform transformation on X_train

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Fit the PolynomialFeatures and perform transformation on X_train

In [6]:
# Solution:
X_train = poly.fit_transform(X_train)

**[3.6]** Display the dimensions of X_train

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Display the dimensions of X_train

In [7]:
# Solution
X_train.shape

(25372, 1830)

**[3.7]** Perform transformation on X_val and X_test with PolynomialFeatures

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Perform transformation on X_val and X_test with PolynomialFeatures

In [8]:
# Solution:
X_val = poly.transform(X_val)
X_test = poly.transform(X_test)

# 4. Train Linear Regression model

**[4.1]** Import the linear regression module from sklearn

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import the linear regression module from sklearn

In [9]:
# Solution:
from sklearn.linear_model import LinearRegression

**[4.2]** Task: instantiate the LinearRegression class into a variable called reg

In [None]:
# Placeholder for student's code (1 line of code)
# Task: instantiate the LinearRegression class into a variable called reg

In [10]:
# Solution
reg = LinearRegression()

**[4.3]** Task: Fit the model with the prepared data

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Fit the model with the prepared data

In [11]:
# Solution
reg.fit(X_train, y_train)

**[4.4]** Import `dump` from `joblib` and save the fitted model into the folder `models` as a file called `linear_poly_2`

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Import dump from joblib and save the fitted model into the folder models as a file called linear_poly_2.joblib

In [13]:
# Solution:
from joblib import dump

dump(reg,  './models/linear_poly_2.joblib')

['./models/linear_poly_2.joblib']

**[4.5]** Save the predictions from this model for the training and validation sets into 2 variables called `y_train_preds` and `y_val_preds`


In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Save the predictions from this model for the training and validation sets into 2 variables called y_train_preds and y_val_preds

In [14]:
# Solution:
y_train_preds = reg.predict(X_train)
y_val_preds = reg.predict(X_val)

**[4.6]** Import the MSE and MAE metrics from sklearn

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import the MSE and MAE metrics from sklearn

In [15]:
# Solution:
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import mean_absolute_error as mae

**[4.7]** Display the RMSE and MAE scores of this model on the training set

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Display the RMSE and MAE scores of this model on the training set

In [16]:
# Solution:
print(mse(y_train, y_train_preds, squared=False))
print(mae(y_train, y_train_preds))

10643.227917160193
3767.8800190662937




**[4.8]** Display the RMSE and MAE scores of this model on the validation set

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Display the RMSE and MAE scores of this model on the validation set

In [17]:
# Solution:
print(mse(y_val, y_val_preds, squared=False))
print(mae(y_val, y_val_preds))

10262.059076368314
4139.872955423301




# 5.   Push changes

**[5.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (1 command line)
# Task: Add you changes to git staging area

In [None]:
# Solution:
git add .

**[5.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (1 command line)
# Task: Create the snapshot of your repository and add a description

In [None]:
# Solution:
git commit -m "linear regression with poly 2"

**[5.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (1 command line)
# Task: Push your snapshot to Github

In [None]:
# Solution:
git push --set-upstream origin adv_dsi_1_3

[5.4] Go to to github and merge your change to the master/main branch

**[5.5]** Check out to the master branch

In [None]:
# Placeholder for student's code (1 command line)
# Task: Check out to the master branch

In [None]:
# Solution:
git checkout master