# Model training - without PCA's feature reduction

This file was used for training models without PCA reduced features. It has detailed steps and described reasoning, the reasoning and steps apply to ModelTrainingPCA.ipynb

## Loading data

In [2]:
from keras import Model
from keras.layers import Input, Dense
from sklearn.model_selection import train_test_split
import pandas as pd
import ImageLoader
import tensorflow as tf
import numpy as np

In [3]:
images_dataframe = ImageLoader.load()

In [4]:
targets_dataframe = pd.read_csv("cleaned_fashion.csv", sep=';')
targets_dataframe.set_index("id", inplace=True)
targets_dataframe.index

Index([15970, 39386, 59263, 21379, 53759,  1855, 30805, 26960, 29114, 30039,
       ...
       30614, 13496, 55283, 12544, 42234, 17036,  6461, 18842, 46694, 51623],
      dtype='int64', name='id', length=41121)

In [50]:
images_dataframe.index

Index([15970, 39386, 59263, 21379, 53759,  1855, 30805, 26960, 29114, 30039,
       ...
       30614, 13496, 55283, 12544, 42234, 17036,  6461, 18842, 46694, 51623],
      dtype='int32', name='id', length=41121)

In [5]:
X = images_dataframe.values

In [6]:
classes = targets_dataframe.target.unique()
targets_numeric = np.arange(classes.size, dtype=int)
targets_dict = dict(zip(classes, targets_numeric))
targets_dataframe['target_numeric'] = [ targets_dict.get(target) for target in targets_dataframe.target ]

In [53]:
targets_dataframe.head(4)

Unnamed: 0_level_0,target,target_numeric
id,Unnamed: 1_level_1,Unnamed: 2_level_1
15970,Topwear,0
39386,Bottomwear,1
59263,Accessories,2
21379,Bottomwear,1


In [7]:
y = targets_dataframe.target_numeric

## ***Irregular distribution of classes!***

Numbers of classes in the chosen dataset are not equally distributed thus making the learning process harder.   
One of the key points I wanted to observe in my project was the impact of using PCA's feature reduction on model's performance and classification for underrepresented classes.   
I am aware of having only one sample in class 7 so the model won't be capable of neither learning nor testing the sample of this class. Instead of removing it, I am simply pretending this class is not there; in accuracy testing, it won't have almost any impact on the final result.

In [291]:
targets_dataframe['target_numeric'].value_counts()

target_numeric
0    15398
2    11279
3     9221
1     2692
4     2400
5      105
6       25
7        1
Name: count, dtype: int64

## Deciding classifier
### KNN?
I reject KNN because of the data size and computation speed. Without reducing number of features there is in total 41121 rows and 4800 columns, therefore computing distances to predict the class would be very inefficient and tedious.
### MLP?
Neural network is my pick, because thanks to saving and loading weights, I gain fast computation and flexibility for classification while still keeping good accuracy. I choose MLP, because of big number of features - it does require more layers for good performance.

## Deciding activation functions
Input values are in range of [0, 255] therefore using sigmoid function would badly cut values, also initial values are greater or equal to zero. Best activation function for input and hidden layers seems to be ReLU as it doesn't cut the values at all.    
Another approach would be normalizing data with min-max normalization to values [0, 1] then sigmoid wouldn't be that bad choice, however bigger problem would be floating point precision.
   
However, activation function related to output layer should be softmax, because from 7 classes, I need to later define the most probable. Previous layers had ReLU activation functions, so next the numbers coming from them should be squashed to a vector in the range (0, 1) and all the resulting elements would add up to 1. Thanks to that, the model can finally predicted the class for a sample picking the biggest number. 

In [78]:
print(f"Minimum pixel value = {images_dataframe.values.min()}")
print(f"Maximum pixel value = {images_dataframe.values.max()}")

Minimum pixel value = 0.0
Maximum pixel value = 255.0


Features number

In [79]:
X.shape[1]

4800

In [80]:
y_train.shape

(28784,)

## Data split, neuron set up

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=120)

In [9]:
classes_num = classes.size

### One hot 

As loss function I will be using categorical crossentropy, because I have multi-class classification with softmax activation on output. The labels need to be one-hot (it's required by categorical crossentropy), so only the positive class keeps its term in the loss. Using one-hot, there is only one element of the Target vector which is not zero.

In [10]:
from tensorflow.keras.utils import to_categorical
y_train = to_categorical(y_train, num_classes=8)
y_test = to_categorical(y_test, num_classes=8)

In [14]:
y_train[235]

array([0., 0., 1., 0., 0., 0., 0., 0.])

### Neurons

In [38]:
num_features = X.shape[1]
input_layer = Input( shape = (num_features,) )

In [39]:
hidden_layer_1 = Dense( num_features//2, activation='relu')(input_layer)

In [40]:
hidden_layer_2 = Dense( num_features//4, activation='relu')(hidden_layer_1)

In [41]:
hidden_layer_last = Dense( num_features//4, activation='relu')(hidden_layer_2)

In [42]:
output_layer = Dense(classes_num, activation='softmax')(hidden_layer_last)

In [43]:
mlp_class = Model(inputs=[input_layer], outputs=[output_layer])
mlp_class.compile(loss='categorical_crossentropy', optimizer='adam')

## Model Training

### Checkpointing

I'm observing and saving the model for the lowest val_loss. Validation loss should be the lowest and the gap between validation loss and loss should be the small for good performance. Validation loss is more important for me, because it better indicates model's performance on unseen data, while loss shows model's performance on trained data. High val_loss and small loss indicates possible overfitting. High val_loss and high loss indicates underfitting. I want them both as small as possible. I observed that when validation loss is small, the loss is small as well - so model should work equally well on both: trained and unseen data.

In [44]:
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint(
    filepath='best_model_3_layers_2_3_quarter-long-train.keras',  # File to save the model
    monitor='val_loss',        # Monitor validation loss
    save_best_only=True,       # Save only the best weights
    mode='min',                # We are minimizing the validation loss
    verbose=1
)

### Training

In [45]:
training_history = mlp_class.fit(X_train, y_train, epochs=500, 
                                 batch_size=500, validation_split=0.2, 
                                 verbose=2, callbacks=[checkpoint], 
                                 shuffle=True, )

Epoch 1/500





Epoch 1: val_loss improved from inf to 2.20014, saving model to best_model_3_layers_2_3_quarter-long-train.keras
47/47 - 10s - 219ms/step - loss: 15.6218 - val_loss: 2.2001
Epoch 2/500

Epoch 2: val_loss did not improve from 2.20014
47/47 - 6s - 137ms/step - loss: 2.0575 - val_loss: 2.3323
Epoch 3/500

Epoch 3: val_loss did not improve from 2.20014
47/47 - 7s - 139ms/step - loss: 1.4787 - val_loss: 2.8195
Epoch 4/500

Epoch 4: val_loss improved from 2.20014 to 1.05665, saving model to best_model_3_layers_2_3_quarter-long-train.keras
47/47 - 8s - 167ms/step - loss: 1.7643 - val_loss: 1.0567
Epoch 5/500

Epoch 5: val_loss did not improve from 1.05665
47/47 - 7s - 141ms/step - loss: 1.3676 - val_loss: 1.5862
Epoch 6/500

Epoch 6: val_loss improved from 1.05665 to 0.93481, saving model to best_model_3_layers_2_3_quarter-long-train.keras
47/47 - 8s - 166ms/step - loss: 1.4097 - val_loss: 0.9348
Epoch 7/500

Epoch 7: val_loss did not improve from 0.93481
47/47 - 7s - 144ms/step - loss: 0.99

## Save training history, load best weights

In [46]:
import pickle

# Save history
with open("training_history_3_layers_2_3_quarter-long-train.pkl", "wb") as file:
    pickle.dump(training_history.history, file)

In [68]:
mlp_class.load_weights('best_model_3_layers_2_3_quarter-long-train.keras')

## Test accuracy

In [69]:
y_pred = mlp_class.predict(X_test)



[1m386/386[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step


In [70]:
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)
accuracy = np.mean(y_pred_classes == y_test_classes) * 100
print(f"Accuracy: {accuracy:.2f}%")

Accuracy: 94.99%


In [297]:
keys = list(targets_dict.keys())
values = list(targets_dict.values())
print(targets_dict)

{'Topwear': 0, 'Bottomwear': 1, 'Accessories': 2, 'Footwear': 3, 'Personal Care': 4, 'Free Items': 5, 'Sporting Goods': 6, 'Home': 7}


# With dropout layer

## Training

In [26]:
from keras.layers import Dropout
checkpoint_dropout = ModelCheckpoint(
    filepath='dropout_best_model_3_layers_2_3_quarter-long-train-dropout.keras',  # File to save the model
    monitor='val_loss',        # Monitor validation loss
    save_best_only=True,       # Save only the best weights
    mode='min',                # We are minimizing the validation loss
    verbose=1
)

### Model

In [27]:
input_layer = Input( shape = (num_features,) )
hidden_layer_1 = Dense( num_features//2, activation='relu')(input_layer)
hidden_layer_2 = Dense( num_features//4, activation='relu')(hidden_layer_1)
hidden_layer_last = Dense( num_features//4, activation='relu')(hidden_layer_2)
dropout_layer = Dropout(0.5)(hidden_layer_last)
output_layer_dropout = Dense(classes_num, activation='softmax')(dropout_layer)
mlp_class_dropout = Model(inputs=[input_layer], outputs=[output_layer_dropout])
mlp_class_dropout.compile(loss='categorical_crossentropy', optimizer='adam')

### Training

In [28]:
training_history = mlp_class.fit(X_train, y_train, epochs=500, 
                                 batch_size=500, validation_split=0.2, 
                                 verbose=2, callbacks=[checkpoint_dropout], 
                                 shuffle=True, )

Epoch 1/500

Epoch 1: val_loss improved from inf to 0.71365, saving model to dropout_best_model_3_layers_2_3_quarter-long-train-dropout.keras
47/47 - 8s - 175ms/step - loss: 0.8735 - val_loss: 0.7137
Epoch 2/500

Epoch 2: val_loss did not improve from 0.71365
47/47 - 7s - 143ms/step - loss: 1.0558 - val_loss: 0.9830
Epoch 3/500

Epoch 3: val_loss improved from 0.71365 to 0.57719, saving model to dropout_best_model_3_layers_2_3_quarter-long-train-dropout.keras
47/47 - 9s - 197ms/step - loss: 0.8551 - val_loss: 0.5772
Epoch 4/500

Epoch 4: val_loss did not improve from 0.57719
47/47 - 7s - 144ms/step - loss: 0.6541 - val_loss: 0.6457
Epoch 5/500

Epoch 5: val_loss improved from 0.57719 to 0.56140, saving model to dropout_best_model_3_layers_2_3_quarter-long-train-dropout.keras
47/47 - 8s - 169ms/step - loss: 0.5339 - val_loss: 0.5614
Epoch 6/500

Epoch 6: val_loss improved from 0.56140 to 0.55733, saving model to dropout_best_model_3_layers_2_3_quarter-long-train-dropout.keras
47/47 - 10

## After training

In [31]:
# Save history
import pickle
with open("training_history_dropout_3_layers_2_3_quarter-long-train.keras.pkl", "wb") as file:
    pickle.dump(training_history.history, file)

In [29]:
mlp_class.load_weights("dropout_best_model_3_layers_2_3_quarter-long-train-dropout.keras")
y_pred = mlp_class.predict(X_test)

[1m 19/386[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 6ms/step 



[1m386/386[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step


In [32]:
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)
accuracy = np.mean(y_pred_classes == y_test_classes) * 100
print(f"Accuracy: {accuracy:.2f}%")

Accuracy: 94.95%
