# **Lab: Neural Networks**

## Exercise 3: Multi-Class Classification with Pytorch

In this exercise, we will build a a Neural Networks with Pytorch for predicting car evaluation. We will be woking on the Car dataset:
https://raw.githubusercontent.com/aso-uts/applied_ds/master/unit3/dataset/Car%20Evaluation.csv


The steps are:
1.   Setup Repository
2.   Load and Explore Dataset
3.   Prepare Data
4.   Baseline Model
5.   Define Architecture
6.   Train Model
7.   Push Changes

**[1.1]** Go inside the created folder `adv_dsi_lab_5`

**[1.2]** Create a new git branch called `pytorch_multi_class`

In [None]:
# Go inside the created folder adv_dsi_lab_5
cd adv_dsi_lab_5

# Create a new git branch called pytorch_multi_class
git checkout -b pytorch_multi_class

**[1.2]** Run the built image

**[1.3]** Display last 50 lines of logs

In [None]:
docker run  -dit --rm --name adv_dsi_lab_5 -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes 
-v ~/Projects/adv_dsi/adv_dsi_lab_5:/home/jovyan/work 
-v ~/Projects/adv_dsi/src:/home/jovyan/work/src 
-v ~/Projects/adv_dsi/data:/home/jovyan/work/data 
pytorch-notebook:latest 
    
docker logs --tail 50 adv_dsi_lab_5            

### 2.   Load and Explore Dataset

**[2.1]** Launch the magic commands for auto-relaoding external modules


In [1]:
# Launch the magic commands for auto-relaoding external modules
%load_ext autoreload
%autoreload 2

**[2.2]** Import the pandas and numpy packages

**[2.3]** Create a variable called `file_url` containing th url to the raw dataset

**[2.4]** Load the data in a dataframe called `df`

**[2.5]** Display the first 5 rows of df

In [2]:
# import the pandas and numpy packages
import pandas as pd
import numpy as np

# variable called file_url containing th url to the raw dataset
file_url = 'https://raw.githubusercontent.com/aso-uts/applied_ds/master/unit3/dataset/Car%20Evaluation.csv'

# Load the data in a dataframe called df
df = pd.read_csv(file_url)

# Display the first 5 rows of df
df.head()

Unnamed: 0,buying_price,maintenance_cost,doors,persons_capacity,luggage_boot,safety,evaluation
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


**[2.6]** Display the dimensions (shape) of df

In [3]:
# Display the dimensions (shape) of df
df.shape

(1728, 7)

**[2.7]** Display the summary (info) of df

In [4]:
# Display the summary (info) of df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1728 entries, 0 to 1727
Data columns (total 7 columns):
buying_price        1728 non-null object
maintenance_cost    1728 non-null object
doors               1728 non-null object
persons_capacity    1728 non-null object
luggage_boot        1728 non-null object
safety              1728 non-null object
evaluation          1728 non-null object
dtypes: object(7)
memory usage: 94.6+ KB


**[2.8]** Display the descriptive statistics of df

In [6]:
# Display the descriptive statictics of df
df.describe()

Unnamed: 0,buying_price,maintenance_cost,doors,persons_capacity,luggage_boot,safety,evaluation
count,1728,1728,1728,1728,1728,1728,1728
unique,4,4,4,3,3,3,4
top,med,med,2,2,med,med,unacc
freq,432,432,432,576,576,576,1210


**[2.9]** Save the dataframe locally in the `data/raw` folder

In [7]:
# Save the dataframe locally in the data/raw folder
df.to_csv('../data/raw/car_evaluation.csv', index=False)

### 3. Prepare Data

**[3.1]** Create a copy of `df` and save it into a variable called `df_cleaned`

**[3.2]** Create a dictionary called `cats_dict` that contains the categorical variables as keys and their respective values sorted in ascending order

In [8]:
# Create a copy of df and save it into a variable called df_cleaned
df_cleaned = df.copy()

# Create a dictionary called cats_dict that contains the categorical variables as keys and their respective values sorted in ascending order
cats_dict = {
    'buying_price': [['low', 'med', 'high', 'vhigh']],
    'maintenance_cost': [['low', 'med', 'high', 'vhigh']],
    'doors': [['2', '3', '4', '5more']],
    'persons_capacity': [['2', '4', 'more']],
    'luggage_boot': [['small', 'med', 'big']],
    'safety': [['low', 'med', 'high']],
    'evaluation': [['unacc', 'acc', 'good', 'vgood']],
}


**[3.3]** Import `StandardScaler` and `OrdinalEncoder` from `sklearn.preprocessing`

**[3.4]** Iterate through the elements of `cast_dict`, instantiate an OrdinalEncoder() and transform the values of each column with this encoder

**[3.5]** Create a list called `num_cols` that contains all numeric columns



In [9]:
# Import StandardScaler and OrdinalEncoder from sklearn.preprocessing
from sklearn.preprocessing import StandardScaler, OrdinalEncoder

# Iterate through the elements of cast_dict, instantiate an OrdinalEncoder() and transform the values of each column with this encoder
for col, cats in cats_dict.items():
    col_encoder = OrdinalEncoder(categories=cats)
    df_cleaned[col] = col_encoder.fit_transform(df_cleaned[[col]])
    
# Create a list called num_cols that contains all numeric columns
num_cols = ['buying_price', 'maintenance_cost', 'doors', 'persons_capacity', 'luggage_boot', 'safety']

**[3.6]** Instantiate a `StandardScaler` and called it `sc`

**[3.7]** Fit and transform the numeric feature of `df_cleaned` and replace the data into it

**[3.8]** Convert the column `evaluation` as integer

In [10]:
# Instantiate a StandardScaler and called it sc
sc = StandardScaler()

# Fit and transform the numeric feature of X_train_cleaned and replace the data into it
df_cleaned[num_cols] = sc.fit_transform(df_cleaned[num_cols])

# Convert the column evaluation as integer
df_cleaned['evaluation'] = df_cleaned['evaluation'].astype(int)

**[3.9]** Import `split_sets_random` and `save_sets` from `src.data.sets`

**[3.10]** Split the data into training and testing sets with 80-20 ratio

**[3.11]** Create the following folder: ../data/processed/car_evaluation/

In [11]:
# Import split_sets_random from sklearn.model_selection
from src.data.sets import split_sets_random, save_sets

# Split the data into training and testing sets with 80-20 ratio
X_train, y_train, X_val, y_val, X_test, y_test = split_sets_random(df_cleaned, target_col='evaluation', test_ratio=0.2)

!mkdir ../data/processed/car_evaluation

**[3.12]** Save the sets in the `data/processed/car_evaluation` folder

**[3.13]** Import this class from `src/models/pytorch` and convert all sets to PytorchDataset

In [12]:
# Save the sets in the data/processed/car_evaluation folder
save_sets(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, 
          path='../data/processed/car_evaluation/')

# Import this class from src/models/pytorch and convert all sets to PytorchDataset
from src.models.pytorch import PytorchDataset

train_dataset = PytorchDataset(X=X_train, y=y_train)
val_dataset = PytorchDataset(X=X_val, y=y_val)
test_dataset = PytorchDataset(X=X_test, y=y_test)

### 4. Baseline Model

**[4.1]** Import `NullModel` from `src.models.null`

**[4.2]** Instantiate a `NullModel` and call `.fit_predict()` on the training target to extract your predictions into a variable called `y_base`

**[4.3]** Import `print_class_perf` from `src.models.performance`

**[4.4]** Print the classification metrics for this baseline model



In [13]:
# Import NullModel from src.models.null
from src.models.null import NullModel

# Instantiate a NullModel and call .fit_predict() on the training target to extract 
# your predictions into a variable called y_base
baseline_model = NullModel(target_type='classification')
y_base = baseline_model.fit_predict(y_train)

# Import print_class_perf from src.models.performance
from src.models.performance import print_class_perf

# Print the classification metrics for this baseline model
print_class_perf(y_base, y_train, set_name='Training', average='weighted')

Accuracy Training: 0.6988416988416989
F1 Training: 0.5749561249561249


### 5. Define Architecture

**[5.1]** Create in `src/models/pytorch.py` a class called `PytorchMultiClass` that inherits from `nn.Module` with:
- `num_features` as input parameter
- attributes:
    - `layer_1`: fully-connected layer with 32 neurons
    - `layer_out`: fully-connected layer with 4 neurons
    - `softmax`: softmax function
- methods:
    - `forward()` with `inputs` as input parameter, perform ReLU and DropOut on the fully-connected layer followed by the output layer with softmax
    

**[5.2]** Import `torch`, `torch.nn` as `nn` and `torch.nn.functional` as `F`

**[5.3]** Instantiate `PytorchMultiClass` with the correct number of input feature and save it into a variable called `model`

**[5.4]** Import get_device() from `src.models.pytorch` and set model to use the device available

**[5.5]** Print the architecture of `model`

In [15]:
# Import torch and torch.nn as nn
import torch
import torch.nn as nn
import torch.nn.functional as F

# Instantiate PytorchMultiClass with the correct number of input feature and save it into a variable called model
from src.models.pytorch import PytorchMultiClass

model = PytorchMultiClass(X_train.shape[1])

# Set model to use the device available
from src.models.pytorch import get_device

device = get_device()
model.to(device)

# Print the architecture of model
# print(model)

PytorchMultiClass(
  (layer_1): Linear(in_features=6, out_features=32, bias=True)
  (layer_out): Linear(in_features=32, out_features=4, bias=True)
  (softmax): Softmax(dim=1)
)

### 6. Train Model

**[6.1]** Instantiate a `nn.CrossEntropyLoss()` and save it into a variable called `criterion` 

**[6.2]** Instantiate a `torch.optim.Adam()` optimizer with the model's parameters and 0.1 as learning rate and save it into a variable called `optimizer`

In [17]:
# Instantiate a nn.CrossEntropyLoss() and save it into a variable called criterion
criterion = nn.CrossEntropyLoss()

# Instantiate a torch.optim.Adam() optimizer with the model's parameters and 0.1 as learning rate
# and save it into a variable called optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

**[6.3]** Create a function called `train_classification()` that will perform forward and back propagation and calculate loss and Accuracy scores

**[6.5]** Create a function called `test_classification()` that will perform forward and calculate loss and accuracy scores

**[6.5]** Create 2 variables called `N_EPOCHS` and `BATCH_SIZE` that will take respectively 50 and 32 as values

**[6.6]** Create a for loop that will iterate through the specified number of epochs and will train the model with the training set and assess the performance on the validation set and print their scores



In [18]:
# variables N_EPOCHS and BATCH_SIZE that will take respectively 50 and 32 as values
N_EPOCHS = 50
BATCH_SIZE = 32

# Create a for loop that will iterate through the specified number of epochs and will train the model
# with the training set and assess the performance on the validation set and print their scores
from src.models.pytorch import train_classification, test_classification

for epoch in range(N_EPOCHS):
    train_loss, train_acc = train_classification(train_dataset, model=model, criterion=criterion, optimizer=optimizer, batch_size=BATCH_SIZE, device=device)
    valid_loss, valid_acc = test_classification(val_dataset, model=model, criterion=criterion, batch_size=BATCH_SIZE, device=device)

    print(f'Epoch: {epoch}')
    print(f'\t(train)\t|\tLoss: {train_loss:.4f}\t|\tAcc: {train_acc * 100:.1f}%')
    print(f'\t(valid)\t|\tLoss: {valid_loss:.4f}\t|\tAcc: {valid_acc * 100:.1f}%')

Epoch: 0
	(train)	|	Loss: 0.0340	|	Acc: 67.9%
	(valid)	|	Loss: 0.0332	|	Acc: 69.4%
Epoch: 1
	(train)	|	Loss: 0.0318	|	Acc: 73.6%
	(valid)	|	Loss: 0.0301	|	Acc: 79.5%
Epoch: 2
	(train)	|	Loss: 0.0306	|	Acc: 77.7%
	(valid)	|	Loss: 0.0298	|	Acc: 80.1%
Epoch: 3
	(train)	|	Loss: 0.0302	|	Acc: 78.8%
	(valid)	|	Loss: 0.0291	|	Acc: 82.1%
Epoch: 4
	(train)	|	Loss: 0.0290	|	Acc: 83.4%
	(valid)	|	Loss: 0.0288	|	Acc: 83.8%
Epoch: 5
	(train)	|	Loss: 0.0290	|	Acc: 83.3%
	(valid)	|	Loss: 0.0280	|	Acc: 85.8%
Epoch: 6
	(train)	|	Loss: 0.0285	|	Acc: 84.8%
	(valid)	|	Loss: 0.0283	|	Acc: 85.3%
Epoch: 7
	(train)	|	Loss: 0.0284	|	Acc: 85.4%
	(valid)	|	Loss: 0.0279	|	Acc: 86.7%
Epoch: 8
	(train)	|	Loss: 0.0279	|	Acc: 86.6%
	(valid)	|	Loss: 0.0279	|	Acc: 86.7%
Epoch: 9
	(train)	|	Loss: 0.0280	|	Acc: 86.0%
	(valid)	|	Loss: 0.0278	|	Acc: 87.0%
Epoch: 10
	(train)	|	Loss: 0.0282	|	Acc: 85.5%
	(valid)	|	Loss: 0.0277	|	Acc: 87.3%
Epoch: 11
	(train)	|	Loss: 0.0285	|	Acc: 84.8%
	(valid)	|	Loss: 0.0279	|	Acc: 86.7%
Ep

**[6.7]** Save the model into the `models` folder

**[6.8]** Assess the model performance on the testing set and print its scores

In [19]:
# Save the model into the models folder
torch.save(model, "../models/pytorch_multi_car_evaluation.pt")

# Assess the model performance on the testing set and print its scores
test_loss, test_acc = test_classification(test_dataset, model=model, criterion=criterion, batch_size=BATCH_SIZE, device=device)
print(f'\tLoss: {test_loss:.4f}\t|\tAccuracy: {test_acc:.1f}')

	Loss: 0.0285	|	Accuracy: 0.8


### 7.   Push changes

In [None]:
"""
# Add your changes to git staging area
git add .

# Create the snapshot of your repository and add a description
git commit -m "pytorch regression"

# Push your snapshot to Github
git push https://<insert_pat>@github.com/CazMayhem/adv_dsi_lab_5.git

# Check out to the master branch
git checkout master

# Pull the latest updates
git pull https://<insert_pat>@github.com/CazMayhem/adv_dsi_lab_5.git

# Merge the branch pytorch_reg
git checkout pytorch_multi_class

# Merge the master branch and push your changes, 
# any merge issues use:  git merge master --allow-unrelated-histories
git merge master 
git push https://<insert_pat>@github.com/CazMayhem/adv_dsi_lab_5.git

"""