# **Lab: Time-Series**



## Exercise 2: Random Forest

We will train a RandomForest model on the same dataset as previously.


**Pre-requisites:**
- Create a github account (https://github.com/join)
- Install git (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)


The steps are:
1.   Load Data
2.   Train RandomForest model
3.   Push changes


**[1.1]** Go to the folder you created previously `adv_dsi_lab_2`

In [None]:
# Placeholder for student's code (1 command line)
# Task: Go to the folder you created previously adv_dsi_lab_2

In [None]:
#Solution:
cd ~/Projects/adv_dsi/adv_dsi_lab_2

**[1.2]** Create a new git branch called `rf_default`

In [None]:
git checkout -b rf_default

Documentation: https://www.atlassian.com/git/tutorials/using-branches/git-checkout

**[1.3]** Navigate the folder `notebooks` and create a new jupyter notebook called `2_rf_default.ipynb`

### 2. Load Data

**[2.1]** Import the function you created `load_sets`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import the function you created load_sets from src/data/sets

In [1]:
# Solution
def load_sets(path='。。/data/processed/', val=False):
    """Load the different locally save sets

    Parameters
    ----------
    path : str
        Path to the folder where the sets are saved (default: '../data/processed/')

    Returns
    -------
    Numpy Array
        Features for the training set
    Numpy Array
        Target for the training set
    Numpy Array
        Features for the validation set
    Numpy Array
        Target for the validation set
    Numpy Array
        Features for the testing set
    Numpy Array
        Target for the testing set
    """
    import numpy as np
    import os.path

    X_train = np.load(f'{path}X_train.npy', allow_pickle=True) if os.path.isfile(f'{path}X_train.npy') else None
    X_val   = np.load(f'{path}X_val.npy', allow_pickle=True  ) if os.path.isfile(f'{path}X_val.npy')   else None
    X_test  = np.load(f'{path}X_test.npy' , allow_pickle=True) if os.path.isfile(f'{path}X_test.npy')  else None
    y_train = np.load(f'{path}y_train.npy', allow_pickle=True) if os.path.isfile(f'{path}y_train.npy') else None
    y_val   = np.load(f'{path}y_val.npy' , allow_pickle=True ) if os.path.isfile(f'{path}y_val.npy')   else None
    y_test  = np.load(f'{path}y_test.npy', allow_pickle=True ) if os.path.isfile(f'{path}y_test.npy')  else None

    return X_train, y_train, X_val, y_val, X_test, y_test


**[2.2]** Load the saved sets from `data/processed`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Load the saved sets from data/processed

In [4]:
#Solution:
X_train, y_train, X_val, y_val, X_test, y_test = load_sets(path='../data/processed/')

# 3. Train Random Forest model

**[3.1]** Import the RandomForest module from sklearn

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import the RandomForest module from sklearn

In [6]:
# Solution:
from sklearn.ensemble import RandomForestRegressor

**[3.2]** Instantiate the RandomForest class into a variable called rf with random_state=8

In [None]:
# Placeholder for student's code (1 line of code)
# Task: instantiate the RandomForest class into a variable called rf

In [7]:
# Solution
rf = RandomForestRegressor(random_state=8)

**[3.3]** Task: Fit the model with the prepared data

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Fit the model with the prepared data

In [8]:
# Solution
rf.fit(X_train, y_train)

**[3.4]** Import `dump` from `joblib` and save the fitted model into the folder `models` as a file called `rf_default`

In [None]:
# Placeholder for student's code (2 line of Python code)
# Task: Import dump from joblib and save the fitted model into the folder models as a file called rf_default

In [9]:
# Solution:
from joblib import dump

dump(rf,  '../models/rf_default.joblib')

['../models/rf_default.joblib']

**[3.5]** Save the predictions from this model for the training and validation sets into 2 variables called `y_train_preds` and `y_val_preds`


In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Save the predictions from this model for the training and validation sets into 2 variables called y_train_preds and y_val_preds

In [10]:
# Solution:
y_train_preds = rf.predict(X_train)
y_val_preds = rf.predict(X_val)

**[3.6]** Import `print_reg_perf` from `src/models/performance` and display the RMSE and MAE scores of this baseline model on the training and validation sets

In [None]:
# Placeholder for student's code (3 lines of Python code)
# Task: Import print_reg_perf from src/models/performance and display the RMSE and MAE scores of this baseline model on the training and validation sets

In [12]:
# Solution
# from src.models.performance import print_reg_perf
def print_reg_perf(y_preds, y_actuals, set_name=None):
    """Print the RMSE and MAE for the provided data

    Parameters
    ----------
    y_preds : Numpy Array
        Predicted target
    y_actuals : Numpy Array
        Actual target
    set_name : str
        Name of the set to be printed

    Returns
    -------
    """
    from sklearn.metrics import mean_squared_error as mse
    from sklearn.metrics import mean_absolute_error as mae

    print(f"RMSE {set_name}: {mse(y_actuals, y_preds, squared=False)}")
    print(f"MAE {set_name}: {mae(y_actuals, y_preds)}")

print_reg_perf(y_preds=y_train_preds, y_actuals=y_train, set_name='Training')
print_reg_perf(y_preds=y_val_preds, y_actuals=y_val, set_name='Validation')

RMSE Training: 48.12836273965697
MAE Training: 28.957653758542158
RMSE Validation: 1021.0753040836846
MAE Validation: 819.0061643835616




# 4.   Push changes

**[4.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (1 command line)
# Task: Add you changes to git staging area

In [None]:
# Solution:
git add .

**[4.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (1 command line)
# Task: Create the snapshot of your repository and add a description

In [None]:
# Solution:
git commit -m "randomforest default"

**[4.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (1 command line)
# Task: Push your snapshot to Github

In [None]:
# Solution:
git push

**[4.4]** Check out to the master branch

In [None]:
# Placeholder for student's code (1 command line)
# Task: Check out to the master branch

In [None]:
# Solution:
git checkout master

**[4.5]** Pull the latest updates

In [None]:
# Placeholder for student's code (1 command line)
# Task: Pull the latest updates

In [None]:
git pull

**[4.6]** Check out to the `rf_default` branch


In [None]:
# Placeholder for student's code (1 command line)
# Task: Merge the branch rf_default

In [None]:
# Solution:
git checkout rf_default

**[4.7]** Merge the `master` branch and push your changes



In [None]:
# Placeholder for student's code (2 command lines)
# Task: Merge the master branch and push your changes

In [None]:
# Solution:
checkout master/main
git merge master/main
git push --set-upstream origin rf_default


Documentation: https://www.atlassian.com/git/tutorials/using-branches/git-merge

**[4.8]** Go to Github and merge the branch after reviewing the code and fixing any conflict


