# Part 2: Model Build and Evaluation

This notebook is structured to help guide you through the second half of this challenge. If additional cells are needed to build and train your classifier, please feel free to use additional cells. Otherwise please refrain from adding cells at any point in the notebook during this challenge. Please also do not delete or modify the provided headers to the cells. You are welcome to additional comments, though, if needed! Thank you!

### Import your libraries in the cell below

In [1]:
import pandas as pd
import numpy as np
from sklearn import svm
import torch
from torch.utils.data import TensorDataset, DataLoader
from torch import nn
import pickle
from tqdm import tqdm
from sklearn.metrics import classification_report

### Import in your csv from the previous notebook in the cell below

In [2]:
train_data = pd.read_csv('preprocessed_train.csv')
test_data = pd.read_csv('preprocessed_test.csv')
train = (train_data.iloc[:, 4:].to_numpy(dtype=np.float32), train_data.iloc[:,3].to_numpy(dtype=np.int64))
test = (test_data.iloc[:, 4:].to_numpy(dtype=np.float32), test_data.iloc[:,3].to_numpy(dtype=np.int64))

In [3]:
# Store data for PyTorch
batch_size = 500
train_tensor = (torch.tensor(train[0]), torch.tensor(train[1]))
test_tensor = (torch.tensor(test[0]), torch.tensor(test[1]))
train_dataset = TensorDataset(*train_tensor)
test_dataset = TensorDataset(*test_tensor)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size = 1)

### Build and Train your Classifier in this and the following cell(s) 

In [36]:
# Linear SVM
linear_svm = svm.LinearSVC(dual=False)
linear_svm.fit(*train)
pickle.dump(linear_svm, open('linear_svm.pth', 'wb'))
print('Completed training')

Completed training


In [None]:
# Radial SVM
radial_svm = svm.SVC(kernel='rbf')
radial_svm.fit(*train)
pickle.dump(radial_svm, open('radial_svm.pth', 'wb'))
print('Completed Training')

In [10]:
# Define PyTorch's Train Loop
def train_net(dataloader, model, loss_fn, optimizer, epochs):
    for i in range(epochs):
        with tqdm(dataloader, desc=f'Epoch {i+1}') as progress:
            for (X, y) in progress:
                model.train()
                pred = model(X)
                loss = loss_fn(pred, y)

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
                progress.set_postfix(Loss=f'{loss.item()}')

In [21]:
# Make PyTorch network
network = nn.Sequential(nn.Linear(301, 600), nn.ReLU(), nn.Linear(600, 400), nn.ReLU(), nn.Linear(400, 5), nn.ReLU())#nn.Softmax(dim=1))
loss_fn = nn.CrossEntropyLoss()
learning_rate = 1e-3
batch_size = 100
epochs = 5
optimizer = torch.optim.Adam(params=network.parameters(), lr=learning_rate)
train_net(train_dataloader, network, loss_fn, optimizer, epochs)
torch.save(network, 'network.pth')
print('Completed Training')

Epoch 1: 100%|██████████| 188/188 [00:49<00:00,  3.80it/s, Loss=1.190236210823059] 
Epoch 2: 100%|██████████| 188/188 [00:46<00:00,  4.09it/s, Loss=1.1171693801879883]
Epoch 3: 100%|██████████| 188/188 [00:53<00:00,  3.49it/s, Loss=1.1013606786727905]
Epoch 4: 100%|██████████| 188/188 [00:45<00:00,  4.17it/s, Loss=0.9732760190963745]
Epoch 5: 100%|██████████| 188/188 [00:44<00:00,  4.23it/s, Loss=0.8131004571914673]

Completed Training





### Create your Predictions in the cell below

In [4]:
linear_svm = pickle.load(open('linear_svm.pth', 'rb'))
linear_predictions = linear_svm.predict(test[0])

In [15]:
radial_svm = pickle.load(open('radial_svm.pth', 'rb'))
print('This takes a while')
radial_predictions = radial_svm.predict(test[0])

This takes a while


In [22]:
print(f'Needs to reach {len(test_dataloader)}')
network = torch.load('network.pth')
network.eval()
torch_predictions = np.zeros(len(test_dataloader))
torch_actual = np.zeros(len(test_dataloader))
with torch.no_grad():
    for i, (X,y) in tqdm(enumerate(test_dataloader)):
        torch_predictions[i] = network(X).argmax(1)
        torch_actual[i] = y
print('Completed')

1it [00:00,  5.64it/s]

Needs to reach 62416


62416it [06:09, 168.70it/s]

Completed





### Perform the final evaluation of the Performance of your model in the cell below

In [23]:
def print_metrics(pred, actual):
    print(classification_report(actual, pred, zero_division=1))
    correct = pred == actual
    accuracy = correct.sum() / len(correct)
    if accuracy < 0.5:
        print('Bad')
    elif accuracy < 0.6:
        print('Okay')
    elif accuracy < 0.7:
        print('Good!')
    else:
        print('Great!!')

In [6]:
print('Linear SVM Results:')
print_metrics(linear_predictions, test[1])

Linear SVM Results:
              precision    recall  f1-score   support

           0       0.60      0.00      0.01      3007
           1       0.44      0.24      0.31     11487
           2       0.60      0.90      0.72     30971
           3       0.48      0.34      0.40     13133
           4       0.62      0.02      0.05      3818

    accuracy                           0.56     62416
   macro avg       0.55      0.30      0.30     62416
weighted avg       0.55      0.56      0.50     62416

Okay


In [16]:
print('Radial SVM Results:')
print_metrics(radial_predictions, test[1])

Radial SVM Results:
              precision    recall  f1-score   support

           0       0.75      0.00      0.00      3007
           1       0.46      0.36      0.40     11487
           2       0.65      0.86      0.74     30971
           3       0.48      0.48      0.48     13133
           4       0.00      0.00      0.00      3818

    accuracy                           0.59     62416
   macro avg       0.47      0.34      0.33     62416
weighted avg       0.55      0.59      0.54     62416

Okay


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [24]:
print('Neural Network Results:')
print_metrics(torch_predictions, torch_actual)

Neural Network Results:
              precision    recall  f1-score   support

         0.0       0.68      0.02      0.03      3007
         1.0       0.47      0.47      0.47     11487
         2.0       0.70      0.79      0.74     30971
         3.0       0.48      0.58      0.52     13133
         4.0       1.00      0.00      0.00      3818

    accuracy                           0.60     62416
   macro avg       0.67      0.37      0.35     62416
weighted avg       0.63      0.60      0.57     62416

Good!


In [14]:
torch_predictions.std()

0.0

In [None]:
"""
Conclusions:
The linear model made satisfactory predictions, but it certainly left something to be desired. It might perform better with 
upsampling of the 0s and 4s.
The radial SVM almost did good, but the precision for these lower value-counts was terrible - it would also benefit from upsampling.
The torch model did the best, with a 0.6 accuracy, which is barely better than the radial SVM. This close difference is likely due to my inability
to get the softmax activation function to work properly. The torch model would likely do better with the softmax activation.
"""