# MACHINE LEARNING FINAL PROJECT


Students Info:
- Vũ Minh Chiến - 22127045
- Cao Nguyễn Huy Hoàng - 22127120

### **IMPORT LIBRARIES**

In [3]:
import numpy as np
import pickle
import gzip
import matplotlib.pyplot as plt
from sklearn import (tree, neural_network)
import tensorflow as tf
import seaborn as sns
from torch import(nn, optim, torch)
import os, sys, humanize, psutil, GPUtil

### **Data Preparation**

#### 1. Data collection

This dataset is collect from [cs.toronto.edu](https://www.cs.toronto.edu/~kriz/cifar.html). All files have been downloaded, unpacked and stored in `cifar-10-python` folder. The data will be stored in 5 variables with corresponding size:
- train_X (50000x3072)
- train_Y (50000x1)
- test_X (10000x3072)
- test_Y (10000x1)

The code below is used to read the data from the files as well as remove the redundant categorical features such as `batch-label` and `file-name`.

In [4]:
# function to read the pickle file
def unpickle(file):
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='latin1')
    return dict 


# define variables
test_X = test_Y = train_X = train_Y = None


# read batch file 1-5 to train_X and train_Y
for i in range(1, 6):
    # file name
    file = 'cifar-10-python/data_batch_' + str(i)

    # read data to a dictionary
    dict = unpickle(file)

    # convert data to numpy array and store to test_X and test_Y
    if train_X is None:
        train_X = np.array(dict['data'])
        train_Y = np.array(dict['labels']).reshape(-1, 1)
    else:
        train_X = np.concatenate((train_X, np.array(dict['data'])), axis=0)
        train_Y = np.concatenate((train_Y, np.array(dict['labels']).reshape(-1, 1)), axis=0)
# change train_Y to 1d
train_Y = train_Y.reshape(-1)

# read test_batch to test_X and test_Y
dict = unpickle('cifar-10-python/test_batch')
test_X = np.array(dict['data'])
test_Y = np.array(dict['labels']).reshape(-1)

#### 2. Data preprocessing

Find any missing value in the dataset

In [5]:
print(np.isnan(train_X).any(), np.isnan(train_Y).any(), np.isnan(test_X).any(), np.isnan(test_Y).any())

False False False False


So there aren't any missing values in this dataset.

Next is the normalization step, this is a image dataset so the values must be between [0, 255], but first let's check if this is right.

In [6]:
print(train_X.min(), train_X.max())

0 255


We have determined the values range is between [0, 255], now we can normalize it to range [0, 1]. This is a crucial step which can help promoting faster convergence during gradient-based training.

In [7]:
# this is shortern form of MinMax scaling
train_X = train_X / 255
test_X = test_X / 255

Applying one-hot for the label of the image. First we need to find the range of the label.

In [8]:
print(np.unique(train_Y))

[0 1 2 3 4 5 6 7 8 9]


Use a identity matrix to generate one-hot matrix

In [9]:
# create identity matrix with len = 10 (0-9)
eye_matrix = np.eye(10)

# generate one-hot
train_Y_one_hot = eye_matrix[train_Y.reshape(-1)]
test_Y_one_hot = eye_matrix[test_Y.reshape(-1)]

### **Define MLP Architecture:**
1. Number of layers: 2 hidden layers (128 and 64 neurons for the first and second layer, respectively).
2. Number of neurons per layer: 128 and 64 for first and second layer, respectively.
3. Activation function: ReLU for hidden layer and Softmax for outer layer.

### **Loss function and Optimizer:**
1. Loss function: Cross-entropy loss
2. Optimizer: Stochastic Gradient Descent (SGD)

### **Model Training**

1. Epochs: 100
2. Batch size: 32
3. Early stopping: 0.1

In [10]:
# function to calculate gpu usage
def mem_report():  
    GPUs = GPUtil.getGPUs()
    for i, gpu in enumerate(GPUs):
        print(f'GPU usage: {round(gpu.memoryUtil * 100, 2)}%')

- Sklearn

In [11]:
model_sklearn = neural_network.MLPClassifier(
    hidden_layer_sizes=(128, 64),  # Two hidden layers: 128 and 64 neurons
    activation='relu',             # Activation function for hidden layers
    solver='sgd',                  # Optimizer
    learning_rate_init=0.01,       # Initial learning rate
    max_iter=100,                   # Number of epochs
    batch_size=32,                 # Training progress
    validation_fraction=0.1        # Early stopping
)
# Apply softmax for outer layer
model_sklearn.out_activation_ = 'softmax'

In [12]:
%%time
# use sklearn to learn
model_1 = model_sklearn.fit(train_X[:1000, :], train_Y[:1000])

# get gpu usage
mem_report()

GPU usage: 0.0%
CPU times: total: 9.77 s
Wall time: 39.1 s


- Pytorch

In [13]:
class MLPClassifier_pytorch(nn.Module):
    def __init__(self, input_size, num_classes):
        super(MLPClassifier_pytorch, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_size, 128), # First hidden layer with 128 neurons and input size
            nn.ReLU(),                  # Activation function for first layer
            nn.Linear(128, 64),         # Second hidden layer with 64 neurons
            nn.ReLU(),                  # Activation function for the second layer
            nn.Linear(64, num_classes), # Output layer
            nn.Softmax(dim=1)           # Softmax for multi-class classification probabilities
        )
    
    def forward(self, x):
        return self.model(x) # Forward pass

model_pytorch = MLPClassifier_pytorch(input_size=3072, num_classes=10)

loss_pytorch = nn.CrossEntropyLoss() #Loss
optimizer_pytorch = optim.SGD(model_pytorch.parameters(), lr=0.01) #Optimizer


In [14]:
# https://medium.com/@juanc.olamendy/mini-batch-gradient-descent-in-pytorch-4bc0ee93f591

- Tensorflow

In [19]:
Dense = tf.keras.layers.Dense
Softmax = tf.keras.layers.Softmax
ReLU = tf.keras.layers.ReLU

SGD = tf.keras.optimizers.SGD

model_tensor = tf.keras.Sequential([
    Dense(128, input_shape=(3072, )),  # First hidden layer with 128 neurons and input size
    ReLU(),                           # Activation function for the first layer 
    Dense(64),                        # Second hidden layer with 64 neurons
    ReLU(),                           # Activation function for the second layer 
    Dense(10),                        # Output layer
    Softmax()                         # Softmax for multi-class classification probabilities
])

model_tensor.compile(optimizer=SGD(learning_rate=0.01), #Optimizer 
              loss='categorical_crossentropy',          #Loss
              metrics=['accuracy'])                     #Accuracy for evaluation

In [None]:
#https://www.tensorflow.org/api_docs/python/tf/keras/Sequential (trong trang này có chỗ xài nên t xài theo á)
%%time
epochs = 100
batch_size = 32
model_3 = model_tensor.fit(train_X, train_Y_one_hot, batch_size, epochs)

mem_report()

Epoch 1/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.6575 - loss: 0.9648
Epoch 2/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.6636 - loss: 0.9469
Epoch 3/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.6628 - loss: 0.9554
Epoch 4/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.6643 - loss: 0.9471
Epoch 5/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.6694 - loss: 0.9307
Epoch 6/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.6695 - loss: 0.9312
Epoch 7/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.6728 - loss: 0.9239
Epoch 8/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.6737 - loss: 0.9288
Epoch 9/100
[1m

### **Reference**

1. Sklearn model: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
 
2. Pytorch neural network: https://pytorch.org/docs/stable/nn.html

3. Pytorch optimization: https://pytorch.org/docs/stable/optim.html

4. Tensorflow model: https://www.tensorflow.org/api_docs/python/tf/keras/Sequential

5. Tensorflow layers: https://www.tensorflow.org/api_docs/python/tf/keras/layers

6. Tensorflow optimizer: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers