<a href="https://colab.research.google.com/github/CazMayhem/AT1/blob/master/notebooks/AdvDSI_Lab1_Exercise3_Solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lab: ML Engineering**



## Exercise 3: Organising Git Repository

This time we will start our data science project by running a Docker image. We will train a Linear Regression model on the same dataset as previously but this time we will perform a polynomial transformation.


**Pre-requisites:**
- Create a github account (https://github.com/join)
- Install git (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
- Install Docker (https://docs.docker.com/get-docker/)

The steps are:
1.   Build Docker image
2.   Prepare Data
3.   Train Linear Regression model
7.   Push changes


### 1. Build Docker image

**[1.1]** Go to the folder you created previously `adv_dsi_lab_1`

In [None]:
# Placeholder for student's code (1 command line)
# Task: Go to the folder you created previously adv_dsi_lab_1

In [None]:
#Solution:
cd ~/Projects/adv_dsi_lab_1

**[1.2]** Build and run the jupyter/scipy-notebook image (https://hub.docker.com/r/jupyter/scipy-notebook)

In [None]:
docker run  -dit --rm --name adv_dsi_lab_1 -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v ${PWD}:/home/jovyan/work -v ${PWD}/../adv_dsi_lab_1/src:/home/jovyan/work/src jupyter/scipy-notebook:0ce64578df46

Syntax: docker run [OPTIONS] IMAGE

Options:

`-dit: Run container in background and interactive`

`--rm: Automatically remove the container when it exits`

`--name: Assign a name to the container`

`-p: Publish a container's port(s) to the host`

`-e: Set environment variables`

`-v Bind mount a volume`

Documentation: https://docs.docker.com/engine/reference/commandline/run/

**[1.3]** List all built docker images

In [None]:
docker images

Documentation: https://docs.docker.com/engine/reference/commandline/images/

**[1.4]** List running images

In [None]:
docker ps

Documentation: https://docs.docker.com/engine/reference/commandline/ps/

**[1.5]** Display last 50 lines of logs

In [None]:
docker logs --tail 50 adv_dsi_lab_1

Syntax: docker logs [OPTIONS] CONTAINER

Options:

`--tail: Number of lines to show from the end of the logs`

Documentation: https://docs.docker.com/engine/reference/commandline/logs/

Copy the url displayed and paste it to a browser in order to launch Jupyter Lab

**[1.6]** Create a new git branch called `adv_dsi_1_3`

In [None]:
git checkout -b adv_dsi_1_3

Documentation: https://www.atlassian.com/git/tutorials/using-branches/git-checkout

**[1.7]** Navigate the folder `notebooks` and create a new jupyter notebook called `2_linear_poly.ipynb`

### 2. Prepare Data

**[2.1]** Import the pandas and numpy package

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Import the pandas and numpy package

In [None]:
# Solution
import pandas as pd
import numpy as np

**[2.2]** Load the saved sets from `data/processed` using numpy

In [None]:
# Placeholder for student's code (6 lines of Python code)
# Task: Load the saved sets from data/processed using numpy

In [None]:
#Solution:
X_train = np.load('../data/processed/X_train.npy')
X_val   = np.load('../data/processed/X_val.npy'  )
X_test  = np.load('../data/processed/X_test.npy' )
y_train = np.load('../data/processed/y_train.npy')
y_val   = np.load('../data/processed/y_val.npy'  )
y_test  = np.load('../data/processed/y_test.npy' )

**[2.3]** Display the dimensions of X_train

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Display the dimensions of X_train

In [None]:
# Solution
X_train.shape

(25372, 59)

**[2.4]** Import PolynomialFeatures from sklearn.preprocessing

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import PolynomialFeatures from sklearn.preprocessing

In [None]:
# Solution:
from sklearn.preprocessing import PolynomialFeatures

**[2.5]** Instantiate a PolynomialFeatures with degree 2

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Instantiate a PolynomialFeatures with degree 2

In [None]:
# Solution:
poly = PolynomialFeatures(2)

**[2.6]** Fit the PolynomialFeatures and perform transformation on X_train

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Fit the PolynomialFeatures and perform transformation on X_train

In [None]:
# Solution:
X_train = poly.fit_transform(X_train)

**[2.7]** Display the dimensions of X_train

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Display the dimensions of X_train

In [None]:
# Solution
X_train.shape

(25372, 1830)

**[2.8]** Perform transformation on X_val and X_test with PolynomialFeatures

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Perform transformation on X_val and X_test with PolynomialFeatures

In [None]:
# Solution:
X_val = poly.transform(X_val)
X_test = poly.transform(X_test)

# 3. Train Linear Regression model

**[3.1]** Import the linear regression module from sklearn

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import the linear regression module from sklearn

In [None]:
# Solution:
from sklearn.linear_model import LinearRegression 

**[3.2]** Task: instantiate the LinearRegression class into a variable called reg

In [None]:
# Placeholder for student's code (1 line of code)
# Task: instantiate the LinearRegression class into a variable called reg

In [None]:
# Solution
reg = LinearRegression()

**[3.3]** Task: Fit the model with the prepared data

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Fit the model with the prepared data

In [None]:
# Solution
reg.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

**[3.4]** Import `dump` from `joblib` and save the fitted model into the folder `models` as a file called `linear_poly_2`

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Import dump from joblib and save the fitted model into the folder models as a file called linear_poly_2.joblib

In [None]:
# Solution:
from joblib import dump  

dump(reg,  '../models/linear_poly_2.joblib')

['../models/linear_poly_2.joblib']

**[3.5]** Save the predictions from this model for the training and validation sets into 2 variables called `y_train_preds` and `y_val_preds`


In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Save the predictions from this model for the training and validation sets into 2 variables called y_train_preds and y_val_preds

In [None]:
# Solution:
y_train_preds = reg.predict(X_train)
y_val_preds = reg.predict(X_val)

**[3.6]** Import the MSE and MAE metrics from sklearn

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import the MSE and MAE metrics from sklearn

In [None]:
# Solution:
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import mean_absolute_error as mae

**[3.7]** Display the RMSE and MAE scores of this model on the training set

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Display the RMSE and MAE scores of this model on the training set

In [None]:
# Solution:
print(mse(y_train, y_train_preds, squared=False))
print(mae(y_train, y_train_preds))

10640.860619369127
3762.726320399333


**[3.8]** Display the RMSE and MAE scores of this model on the validation set

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Display the RMSE and MAE scores of this model on the validation set

In [None]:
# Solution:
print(mse(y_val, y_val_preds, squared=False))
print(mae(y_val, y_val_preds))

10263.2950112644
4134.950863033068


# 4.   Push changes

**[4.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (1 command line)
# Task: Add you changes to git staging area

In [None]:
# Solution:
git add .

**[4.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (1 command line)
# Task: Create the snapshot of your repository and add a description

In [None]:
# Solution:
git commit -m "linear regression with poly 2"

**[4.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (1 command line)
# Task: Push your snapshot to Github

In [None]:
# Solution:
git push --set-upstream origin adv_dsi_1_3

**[4.4]** Check out to the master branch

In [None]:
# Placeholder for student's code (1 command line)
# Task: Check out to the master branch

In [None]:
# Solution:
git checkout master

**[4.5]** Pull the latest updates

In [None]:
# Placeholder for student's code (1 command line)
# Task: Pull the latest updates

In [None]:
# Solution:
git pull

**[4.6]** Check out to the `adv_dsi_1_3` branch

In [None]:
# Placeholder for student's code (1 command line)
# Task: Check out to the adv_dsi_1_3 branch

In [None]:
# Solution:
git checkout adv_dsi_1_3

**[4.7]** Merge the `master` branch and push your changes

In [None]:
# Placeholder for student's code (2 command lines)
# Task: Merge the master branch and push your changes

In [None]:
# Solution:
git merge master
git push

Documentation: https://www.atlassian.com/git/tutorials/using-branches/git-merge

**[4.8]** Go to Github and merge the branch after reviewing the code and fixing any conflict

**[4.9]** Stop the Docker container

In [None]:
docker stop adv_dsi_lab_1

Documentation: https://docs.docker.com/engine/reference/commandline/stop/