# **Lab: Time-Series**



## Exercise 2: Random Forest

We will train a RandomForest model on the same dataset as previously.


**Pre-requisites:**
- Create a github account (https://github.com/join)
- Install git (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
- Install Docker (https://docs.docker.com/get-docker/)

The steps are:
1.   Launch Docker image
2.   Load Data
3.   Train RandomForest model
4.   Push changes


### 1. Launch Docker image

**[1.1]** Go to the folder you created previously `adv_dsi_lab_2`

In [None]:
# Placeholder for student's code (1 command line)
# Task: Go to the folder you created previously adv_dsi_lab_2

In [None]:
#Solution:
cd ~/Projects/adv_dsi/adv_dsi_lab_2

**[1.2]** Run the built Docker image

In [None]:
docker run  -dit --rm --name adv_dsi_lab_2 -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v ~/Projects/adv_dsi/adv_dsi_lab_2:/home/jovyan/work -v ~/Projects/adv_dsi/src:/home/jovyan/work/src fbprophet-notebook:latest

Syntax: docker run [OPTIONS] IMAGE

Options:

`-dit: Run container in background and interactive`

`--rm: Automatically remove the container when it exits`

`--name: Assign a name to the container`

`-p: Publish a container's port(s) to the host`

`-e: Set environment variables`

`-v Bind mount a volume`

Documentation: https://docs.docker.com/engine/reference/commandline/run/

**[1.3]** Display last 50 lines of logs

In [None]:
docker logs --tail 50 adv_dsi_lab_2

Syntax: docker logs [OPTIONS] CONTAINER

Options:

`--tail: Number of lines to show from the end of the logs`

Documentation: https://docs.docker.com/engine/reference/commandline/logs/

Copy the url displayed and paste it to a browser in order to launch Jupyter Lab

**[1.4]** Create a new git branch called `rf_default`

In [None]:
git checkout -b rf_default

Documentation: https://www.atlassian.com/git/tutorials/using-branches/git-checkout

**[1.7]** Navigate the folder `notebooks` and create a new jupyter notebook called `2_rf_default.ipynb`

### 2. Load Data

**[2.1]** Import the function you created `load_sets` from `src/data/sets`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import the function you created load_sets from src/data/sets

In [None]:
# Solution
from src.data.sets import load_sets

**[2.2]** Load the saved sets from `data/processed`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Load the saved sets from data/processed

In [None]:
#Solution:
X_train, y_train, X_val, y_val, X_test, y_test = load_sets(path='../data/processed/')

# 3. Train Random Forest model

**[3.1]** Import the RandomForest module from sklearn

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import the RandomForest module from sklearn

In [None]:
# Solution:
from sklearn.ensemble import RandomForestRegressor

**[3.2]** Instantiate the RandomForest class into a variable called rf with random_state=8

In [None]:
# Placeholder for student's code (1 line of code)
# Task: instantiate the RandomForest class into a variable called rf

In [None]:
# Solution
rf = RandomForestRegressor(random_state=8)

**[3.3]** Task: Fit the model with the prepared data

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Fit the model with the prepared data

In [None]:
# Solution
rf.fit(X_train, y_train)

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=8, verbose=0, warm_start=False)

**[3.4]** Import `dump` from `joblib` and save the fitted model into the folder `models` as a file called `rf_default`

In [None]:
# Placeholder for student's code (2 line of Python code)
# Task: Import dump from joblib and save the fitted model into the folder models as a file called rf_default

In [None]:
# Solution:
from joblib import dump 

dump(rf,  '../models/rf_default.joblib')

['../models/rf_default.joblib']

**[3.5]** Save the predictions from this model for the training and validation sets into 2 variables called `y_train_preds` and `y_val_preds`


In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Save the predictions from this model for the training and validation sets into 2 variables called y_train_preds and y_val_preds

In [None]:
# Solution:
y_train_preds = rf.predict(X_train)
y_val_preds = rf.predict(X_val)

**[3.6]** Import `print_reg_perf` from `src/models/performance` and display the RMSE and MAE scores of this baseline model on the training and validation sets

In [None]:
# Placeholder for student's code (3 lines of Python code)
# Task: Import print_reg_perf from src/models/performance and display the RMSE and MAE scores of this baseline model on the training and validation sets

In [None]:
# Solution
from src.models.performance import print_reg_perf

print_reg_perf(y_preds=y_train_preds, y_actuals=y_train, set_name='Training')
print_reg_perf(y_preds=y_val_preds, y_actuals=y_val, set_name='Validation')

RMSE Training: 49.094319563872496
MAE Training: 28.843348519362184
RMSE Validation: 1047.282041263682
MAE Validation: 846.8197260273972


# 4.   Push changes

**[4.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (1 command line)
# Task: Add you changes to git staging area

In [None]:
# Solution:
git add .

**[4.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (1 command line)
# Task: Create the snapshot of your repository and add a description

In [None]:
# Solution:
git commit -m "randomforest default"

**[4.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (1 command line)
# Task: Push your snapshot to Github

In [None]:
# Solution:
git push

**[4.4]** Check out to the master branch

In [None]:
# Placeholder for student's code (1 command line)
# Task: Check out to the master branch

In [None]:
# Solution:
git checkout master

**[4.5]** Pull the latest updates

In [None]:
# Placeholder for student's code (1 command line)
# Task: Pull the latest updates

In [None]:
git pull

**[4.6]** Check out to the `rf_default` branch


In [None]:
# Placeholder for student's code (1 command line)
# Task: Merge the branch rf_default

In [None]:
# Solution:
git checkout rf_default

**[4.7]** Merge the `master` branch and push your changes



In [None]:
# Placeholder for student's code (2 command lines)
# Task: Merge the master branch and push your changes

In [None]:
# Solution:
git merge master
git push

Documentation: https://www.atlassian.com/git/tutorials/using-branches/git-merge

**[4.8]** Go to Github and merge the branch after reviewing the code and fixing any conflict




**[4.9]** Stop the Docker container

In [None]:
docker stop adv_dsi_lab_2

Documentation: https://docs.docker.com/engine/reference/commandline/stop/