<img src="https://futurejobs.my/wp-content/uploads/2021/05/d-min-1024x297.png" width="300"> </img>

> **Copyright &copy; 2021 Skymind Education Group Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). \
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0** 

# Hyperparameter Tuning Exercise
Authored by : [Nazurah Kamil](mailto:nazurah.kamil@skymind.my)

## Learning Outcome
By the end of this notebook, you will be able to:
- Perform hyperparameter tuning on relevant hyperparameters
- Avoid overfitting model
- Train this model to reach optimum accuracy


### **Dataset Used: Wine Classification Dataset**
This is a dataset with data on red and white variants of the Portuguese "Vinho Verde" wine. The dataset can be retrieved from [here](https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009).<br>

**Content :**<br>
For more information on this dataset, you may refer to <a href=https://repositorium.sdum.uminho.pt/bitstream/1822/10029/1/wine5.pdf>[Cortez et al., 2009]</a>.<br>

**Input variables**<br>
These variables below are the features that are to be used in this modelling task:<br>
1 - fixed acidity<br>
2 - volatile acidity<br>
3 - citric acid<br>
4 - residual sugar<br>
5 - chlorides<br>
6 - free sulfur dioxide<br>
7 - total sulfur dioxide<br>
8 - density<br>
9 - pH<br>
10 - sulphates<br>
11 - alcohol<br>

**Output variable (based on sensory data):**<br>
These variables below are the labels that are to be used in this modelling task:
12 - quality (score between 0 and 10)

We will approach this task in the manner of a binary classification task even though there are a total of 6 classes for the target variable. In order to achieve this, we would first need to bin the target variable into 2 classes, which are:
- **0**: Lesser quality, consists of labels from 3 to 5 (inclusive)
- **1**: Better quality, consists of labels from 6 to 8 (inclusive)

We would perform the preprocessing step as we go along.

## Task :
For this noteboook, you are required to complete the following tasks:

1. Build a deep neural network model which converged at the end
2. Prevent overfitting issue (difference between train and test accuracy falls within 3%)
3. Achieve a minimum of 75% accuracy for train and test set

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# Torch libraries
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F


# Path to find dataset
from pathlib import Path

# Prevent warning 
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Path specification
import pandas as pd


wine_data = pd.read_csv("https://raw.githubusercontent.com/Saranya-Skymind/Datasets/main/wine.csv")

In [None]:
# Print the first 5 rows of our dataset
wine_data.head()

In [None]:
wine_data.shape

In [None]:
wine_data.describe()

In [None]:
# Check missing values
wine_data.info()

Here, all the columns in the dataset are `int64` and `float64` data types. Next, let's display the clasess for the target variable.

In [None]:
wine_data['quality'].unique()

Let's bin our target variable to just two classes: **0** (lesser quality) and **1** (better quality).

In [None]:
# if wine quality is more than 6, label it as 1, otherwise 0
wine_data['quality'] = np.where(wine_data['quality'] >= 6, 1, 0)

# Check again classes of target variable
wine_data['quality'].unique()

Next, let us check for the class distribution of target varable. 

In [None]:
wine_data['quality'].value_counts()

In [None]:
class_0_distribution = wine_data['quality'].value_counts()[0] / len(wine_data) * 100
class_1_distribution = wine_data['quality'].value_counts()[1] / len(wine_data) * 100
print("Class 0 is: ", class_0_distribution)
print("Class 1 is: ", class_1_distribution)

Here we can see that our data has 53% of 'better quality' wine and others is 'lesser quality' wine. Let's proceed with the next step which is to separate between features and label of our data.

## Features and Label
Here we will split the dataset into features and label.

In [None]:
X = wine_data.iloc[:, :-1].values
y = wine_data.iloc[:, -1].values

## Split into Train And Test
Split into train and test by using `train_test_split`.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size = 0.3, random_state = 21)

## Data Preprocessing
Perform feature scaling onto `X_train` and `X_test`.

In [None]:
# Feature scaling
scaler = MinMaxScaler()

# Only fit the train_set but transform both train and test sets
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Hyperparameters

Hyperparameter tuning is the process of selecting a set of ideal hyperparameters for a machine learning algorithm. A hyperparameter is a model argument whose value is set before start of the model training process.

Here are the hyperparameters that can be used to fine tune our model.

In [None]:
# Hyperparameter
num_input = 11
num_classes = 1
"""
Tweak this hyperparameter to increase accuracy
"""
epochs = 20
learning_rate = 0.001

## Dataset

In [None]:
# Create Custom_Dataset class
class Custom_Dataset(Dataset):
    def __init__(self, features, labels):
        # Convert numpy to tensor to tensor
        self.features = torch.tensor(features, dtype = torch.float32)
        self.labels = torch.tensor(labels, dtype  = torch.float32)

    def __len__(self):
        return self.features.shape[0]
    
    def __getitem__(self, idx):
        return self.features[idx], self.labels[idx]

In [None]:
# Implement class Custom_Dataset
train_set = Custom_Dataset(X_train, y_train)
test_set = Custom_Dataset(X_test, y_test)

## DataLoader
`DataLoader` enables us to perform mini-batching and shuffling data using `Dataset` object which we have just instantiated just now.

In [None]:
# Input train_set and test_set into DataLoader
train_loader = DataLoader(train_set, batch_size = 128, shuffle = True)
test_loader = DataLoader(test_set, batch_size = 128, shuffle = False)

dataloader = {'train':train_loader, 'test':test_loader}

# Model Training
For this training, we will be using `nn.Module` because it has more flexibility in designing the network. For example, you can write your own `forward()` method.

In [None]:
# Create class for LogisticRegression 
class LogisticRegression(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc_1 = nn.Linear(num_input, 500)
        """
        Increase/Adding layer until error does not improve anymore
        """
        self.fc_2 = nn.Linear(500, 150)
        self.fc_3 = nn.Linear(150, num_classes)
        
    def forward(self, x):
        y_hat = F.relu(self.fc_1(x))
        y_hat = F.relu(self.fc_2(y_hat))
        # add activation function at output layer - sigmoid
        y_hat = F.sigmoid(self.fc_3(y_hat))
        return y_hat

###### Change optimizer and criterion accordingly

In [None]:
torch.manual_seed(5)
model = LogisticRegression()
"""
Change any optimizer and criterion that match
"""
optimizer = optim.Adam(model.parameters(), lr = learning_rate)
criterion = nn.BCELoss()

By default, our model is in training mode `model.train`. For model validation and testing, we will set model to evaluation mode with `model.eval`. 

* **model.train**<br>
It informs our model that we are in the training phase, so it keeps all layers active, such as dropout and batch-normalization, which usage is to prevent overfitting of model during training phase.

* **model.eval**<br>
`model.eval` does the opposite. As a result, after using `model.eval`, our model deactivates such layers (whose sole function is to prevent overfitting during training phase), allowing the model to output its inference as predicted.

In [None]:

# Keep track of Loss and accuracy
loss_score = {'train': [], 'test': []}
accuracy_score = {'train': [], 'test': []}

for epoch in range(1, epochs+1):
    print(f'\nEpoch {epoch}\n--------')
    
    for loader in ['train', 'test']:
        running_loss = 0.0
        running_size = 0
        correct = 0
        
        if loader == 'train':
            model.train()
        else:
            model.eval()

        for X, y in dataloader[loader]:
            with torch.set_grad_enabled(loader == 'train'):
                output = model(X)
                loss = criterion(output, y.unsqueeze(1)) # Unsqueeze to prevent broadcasting
                
                # Calculate loss
                running_loss += loss.item()*output.size(0)
                running_size += output.size(0) # To get until remainder batch

                # Calculate accuracy
                # Set threshold for sigmoid instead of using torch.max
                predictions = output > 0.5
                correct += (predictions == y.unsqueeze(1)).sum().item()

                if loader == 'train':
                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()

        # Accuracy and loss per epoch
        accuracy = (100*correct) / running_size
        loss_per_epoch = running_loss / running_size
        
        print(f'{loader.capitalize()} Loss:{loss_per_epoch} {loader.capitalize()} Accuracy:{accuracy}')
        loss_score[loader].append(loss_per_epoch)
        accuracy_score[loader].append(accuracy)

The model training has completed, we can now visualize the loss of model by epoch.

In [None]:
# Visualize loss
fig, ax = plt.subplots()
fig.set_size_inches(14, 7)
ax.set_title("Loss Score against Epoch")
ax.grid(b=True)
ax.set_xlabel("Epoch Number")
ax.set_ylabel("Loss Score")

ax.plot(loss_score['train'], color='blue', label='Training Loss')
ax.plot(loss_score['test'], color='red', label='Testing Loss')
ax.legend();

Let's also visualize the accuracy of model by epoch.

In [None]:
# Visualize accuracy
fig, ax = plt.subplots()
fig.set_size_inches(14, 7)
ax.set_title("Accuracy against Epoch")
ax.grid(b=True)
ax.set_xlabel("Epoch Number")
ax.set_ylabel("Accuracy")

ax.plot(accuracy_score['train'], color='blue', label='Training Accuracy')
ax.plot(accuracy_score['test'], color='red', label='Testing Accuracy')
ax.legend();