# Beer_type Prediction -  Neural Network Model

The project is to deploy a Machine Learning model into production. I will train a custom neural networks model that will accurately predict a type of beer based on some rating criterias such as appearance, aroma, palate or taste. I will also build a web app and deploy it online in order to serve my model for real time predictions. This notebook is for designing,training and analysing model.

**Student Name:** Wenying Wu 

**Student Number:** 14007025

**Prerequisite:**
- Docker
- Functions from lab exercise 
- Datasets saved from Beer_type Prediction - Data Preparation notebook

**Sections:**
1. Prepare datasets
2. Baseline Model
3. Define Architecture
4. Train & Save Model
5. Evaluate model

## 1. Prepare Datasets

### 1.1 Load magic command

In [1]:
%load_ext autoreload
%autoreload 2

### 1.2 Import packages and data set

In [2]:
from src.data.sets import load_sets

In [3]:
X_train, y_train, X_val, y_val, X_test, y_test = load_sets()

### 1.3 Transfer numpy array to tensor

In [4]:
from src.models.pytorch import PytorchDataset

train_dataset = PytorchDataset(X=X_train, y=y_train)
val_dataset = PytorchDataset(X=X_val, y=y_val)
test_dataset = PytorchDataset(X=X_test, y=y_test)

In [5]:
# Casual check
train_dataset.y_tensor

tensor([65., 44., 25.,  ..., 13., 12., 29.])

## 2. Baseline Model

### 2.1 Load baseline model and get baseline prediction

In [6]:
from src.models.null import NullModel
baseline_model = NullModel(target_type='classification')
y_base = baseline_model.fit_predict(y_train)

In [7]:
from src.models.performance import print_class_perf
print_class_perf(y_base, y_train, set_name='Training', average='weighted')

Accuracy Training: 0.07444192974099043
F1 Training: 0.010315310209270104


**Note:** We can see that the accuracy is very low, around 7.4% and F1 score is only 0.01. And the highest beer-type percentage occupied by 'American IPA' is around 7.4% from data preperation notobook command 12.

## 3. Define Neural Network Architecture

This notebook only shows the final architect used for this project. The trial and error process is not shown here, but is is shown in the drafts folder, you can have a look if you are interested.

After trial and error test, the below architecture is chosen as the final model. It is an 8-layer (1 input layer, 1 output layer and 6 hidden layers) feed-forward neural network with dropout and batch-norm. Number of neurons in each layer can be found in below cells.

### 3.1 Import pytorch, define architecture and initialize model

In [8]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [9]:
class PytorchMultiClass(nn.Module):
    def __init__(self, num_features):
        super(PytorchMultiClass, self).__init__()
        
        self.layer_1 = nn.Linear(num_features, 8192) # 2**15
        self.layer_2 = nn.Linear(8192, 4096)
        self.layer_3 = nn.Linear(4096, 2048)
        self.layer_4 = nn.Linear(2048, 1024)
        self.layer_5 = nn.Linear(1024, 512)
        self.layer_6 = nn.Linear(512, 256)
        self.layer_7 = nn.Linear(256, 128) 
        self.layer_out = nn.Linear(128, 104)
        self.softmax = nn.Softmax(dim=1)
        
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(p=0.2)
        self.batchnorm1 = nn.BatchNorm1d(8192)
        self.batchnorm2 = nn.BatchNorm1d(4096)
        self.batchnorm3 = nn.BatchNorm1d(2048)
        self.batchnorm4 = nn.BatchNorm1d(1024)
        self.batchnorm5 = nn.BatchNorm1d(512)
        self.batchnorm6 = nn.BatchNorm1d(256)
        self.batchnorm7 = nn.BatchNorm1d(128)


    def forward(self, x):
        x = self.layer_1(x)
        x = self.batchnorm1(x)
        x = self.relu(x)
        
        x = self.layer_2(x)
        x = self.batchnorm2(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        x = self.layer_3(x)
        x = self.batchnorm3(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        x = self.layer_4(x)
        x = self.batchnorm4(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        x = self.layer_5(x)
        x = self.batchnorm5(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        x = self.layer_6(x)
        x = self.batchnorm6(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        x = self.layer_7(x)
        x = self.batchnorm7(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        x = self.layer_out(x)
        return x # nn.CrossEntropyLoss does log_softmax() for us so we can simply return x
    
model = PytorchMultiClass(X_train.shape[1])

### 3.2 Check if GPU is active

In [10]:
from src.models.pytorch import get_device
device = get_device()
model.to(device)

PytorchMultiClass(
  (layer_1): Linear(in_features=6, out_features=8192, bias=True)
  (layer_2): Linear(in_features=8192, out_features=4096, bias=True)
  (layer_3): Linear(in_features=4096, out_features=2048, bias=True)
  (layer_4): Linear(in_features=2048, out_features=1024, bias=True)
  (layer_5): Linear(in_features=1024, out_features=512, bias=True)
  (layer_6): Linear(in_features=512, out_features=256, bias=True)
  (layer_7): Linear(in_features=256, out_features=128, bias=True)
  (layer_out): Linear(in_features=128, out_features=104, bias=True)
  (softmax): Softmax(dim=1)
  (relu): ReLU()
  (dropout): Dropout(p=0.2, inplace=False)
  (batchnorm1): BatchNorm1d(8192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (batchnorm2): BatchNorm1d(4096, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (batchnorm3): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (batchnorm4): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affi

## 4. Train & Save Model

### 4.1 Define 2 dictionaries that will store the accuracy/epoch and loss/epoch for both train and validation sets for later visualization

In [None]:
accuracy_stats = {'train': [], "val": []}
loss_stats = {'train': [], "val": []}

### 4.2 Initialize criterion, optimizer and scheduler

CrossEntropyLoss is used because this is a multiclass classification problem. We don’t have to manually apply a log_softmax layer after our final layer because nn.CrossEntropyLoss does that for us. However, we need to apply log_softmax for validation and testing.

In [12]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9)

### 4.3 Set number of EPOCH and batch size

50 EPOCH is chosen because from the trail and error experience, the model stops to increase it's predicting power after 40~45 epoch. 

In [13]:
N_EPOCHS = 50
BATCH_SIZE = 1000

### 4.4 Start training

In [14]:
#use train_classification, test_classification defined in lab5
from src.models.pytorch import train_classification, test_classification

for epoch in range(N_EPOCHS):
    train_loss, train_acc = train_classification(train_dataset, model=model, criterion=criterion, optimizer=optimizer, batch_size=BATCH_SIZE, device=device, scheduler=scheduler, accuracy_stats=accuracy_stats, loss_stats=loss_stats, shuffle=True)
    valid_loss, valid_acc = test_classification(val_dataset, model=model, criterion=criterion, batch_size=BATCH_SIZE, device=device, accuracy_stats=accuracy_stats, loss_stats=loss_stats)

    print(f'Epoch: {epoch}')
    print(f'\t(train)\t|\tLoss: {train_loss:.4f}\t|\tAcc: {train_acc * 100:.1f}%')
    print(f'\t(valid)\t|\tLoss: {valid_loss:.4f}\t|\tAcc: {valid_acc * 100:.1f}%')

Epoch: 0
	(train)	|	Loss: 0.0032	|	Acc: 19.2%
	(valid)	|	Loss: 0.0029	|	Acc: 23.6%
Epoch: 1
	(train)	|	Loss: 0.0030	|	Acc: 23.2%
	(valid)	|	Loss: 0.0028	|	Acc: 27.1%
Epoch: 2
	(train)	|	Loss: 0.0029	|	Acc: 25.4%
	(valid)	|	Loss: 0.0026	|	Acc: 29.9%
Epoch: 3
	(train)	|	Loss: 0.0028	|	Acc: 27.1%
	(valid)	|	Loss: 0.0025	|	Acc: 32.5%
Epoch: 4
	(train)	|	Loss: 0.0027	|	Acc: 28.6%
	(valid)	|	Loss: 0.0024	|	Acc: 34.4%
Epoch: 5
	(train)	|	Loss: 0.0026	|	Acc: 29.7%
	(valid)	|	Loss: 0.0024	|	Acc: 35.9%
Epoch: 6
	(train)	|	Loss: 0.0026	|	Acc: 30.8%
	(valid)	|	Loss: 0.0023	|	Acc: 36.7%
Epoch: 7
	(train)	|	Loss: 0.0026	|	Acc: 31.8%
	(valid)	|	Loss: 0.0023	|	Acc: 37.6%
Epoch: 8
	(train)	|	Loss: 0.0025	|	Acc: 32.7%
	(valid)	|	Loss: 0.0022	|	Acc: 39.2%
Epoch: 9
	(train)	|	Loss: 0.0025	|	Acc: 33.6%
	(valid)	|	Loss: 0.0022	|	Acc: 40.4%
Epoch: 10
	(train)	|	Loss: 0.0025	|	Acc: 34.3%
	(valid)	|	Loss: 0.0022	|	Acc: 41.0%
Epoch: 11
	(train)	|	Loss: 0.0024	|	Acc: 34.9%
	(valid)	|	Loss: 0.0021	|	Acc: 42.1%
Ep

**Experiment log:**
- lab 5 architecture: train 16.5 | val 16.7 | test ~20 (30 EPOCH, no batchnorm)
- lab 5 architecture + SMOTE Dataset: train 15.6 | val 16.2 | test ~20 (30 EPOCH, no batchnorm)
- lab 5 architecture + Oversampling Dataset: train 30.1 | val 21.6 | test ~30 (30 EPOCH, no batchnorm) - oversampling tends to overfit the model
- 2 layers(6-108-104): train 20.2 | val 21.1 | test ~30 (30 EPOCH)
- 2 layers(6-55-104): train 19.3 | val 20.3 | test ~30 (30 EPOCH)
- 4 layers(6-512-256-128-104): train 30.9 | val 37.4 | test ~37 (30 EPOCH)
- 6 layers(6-1024-512-256-128-64-104): train 33.1 | val 40.7 | test 40.5 (100 EPOCH)
- 7 layers(6-2048-1024-512-256-128-64-104): train 33.8 | val 41.3 | test 41.0 (100 EPOCH)
- 7 layers(6-2048-1024-512-256-128-108-104): train 35.1 | val 43.0 | test 42.7 (seems like more neurons on the last layer improves) (100 EPOCH)
- 7 layers(6-2048-1512-1024-512-256-128-104): train 39.8 | val 47.7 | test 47.5 (seems like more neurons on the all layers improves) (100 EPOCH)
- 11 layers(6-4096-3584-3072-2560-2048-1536-1024-512-256-128-104): train 40.7 | val 48.4 | test 48.7 (loss:0.0022, 0.0019,0.0019, higher than previous 0.0003, more reasonable) (28 epoch, accidently stopped)
- 8 layers(6-8192-4096-2048-1024-512-256-128-104): train 41.1 | val 49.8 | test 49.5 (50 EPOCH)  
  

### 4.5 Save model state_dict

Save the state_dict instead of whole model. According to https://pytorch.org/tutorials/beginner/saving_loading_models.html :
Saving a whole model will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.

In [15]:
# save statedict instead of whole model (pytorch official recommended way)
torch.save(model.state_dict(), "../models/nn_final_dict.pt")

## 5. Evaluate Model

### 5.1 Predict on test set

In [18]:
test_loss, test_acc, y_pred_list = test_classification(test_dataset, model=model, criterion=criterion, batch_size=BATCH_SIZE, device=device, accuracy_stats=accuracy_stats, loss_stats=loss_stats)
print(f'\tLoss: {test_loss:.4f}\t|\tAccuracy: {test_acc:.3f}')

	Loss: 0.0018	|	Accuracy: 0.495
