### Preliminaries

In [None]:
import pandas as pd
import numpy as np
from numpy import save

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier

import keras
from keras.utils import to_categorical
from keras.models import Sequential,Input,Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Conv1D, MaxPooling1D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import LeakyReLU

In [None]:
## load  processed data
training_data = np.load("training_data.npy", allow_pickle = True)
labels = np.load("labels.npy", allow_pickle = True)
training_data.shape,training_data.shape

### 2D Convolutional Neural Network 

Ok, now the moment that all of us were waiting, let's apply Convolutional Neural Network. In this case, we will apply the 2D layer, which is mainly use for image classification, but also in this case we can reach an important accuracy (more than 90%) by reshaping the dimensions of our input shape. For creating this model, I was inspired by this very well done [tutorial](https://www.datacamp.com/community/tutorials/convolutional-neural-networks-python) on DataCamp.

First of all, let's upload the data in new variables so that we have everything nice and clean.

In [None]:
# change name of variables 
train_X = training_data
#test_X = test_data
train_Y = labels

As I said before, we need to reshape the input space, [reshape function](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html) can help us.

In [None]:
# reshape in four dimensions for input CNN
train_X = train_X.reshape(-1, 99,13, 1)
#test_X = test_X.reshape(-1, 99,13, 1)
train_X.shape, test_X.shape

In [None]:
# transform data type in float
train_X = train_X.astype('float32')
#test_X = test_X.astype('float32')

With CNN we are going to predict a probability for each possible class for each example and for this we need to change again our target. We will use the [to_categorical](https://keras.io/utils/) function. This function creates for each target a list with as many position as number of classes-1 (because it starts from 0) and each *ith* position represents the *ith* class.

In [None]:
# transform the labels
train_Y_one_hot = to_categorical(train_Y)

For example, here we can see that the first target is the 22° class, this is because in the 21th position we have 1, while in the others we have 0.

In [None]:
train_Y_one_hot[0]

Before running the model we need of course to create a validation set.

In [None]:
## CREATE THE VALIDATION SET 
train_X,valid_X,train_label,valid_label = train_test_split(train_X, train_Y_one_hot, test_size=0.2, 
                                                           random_state=13)

One last check.

In [None]:
# check all the shape
train_X.shape,valid_X.shape,train_label.shape,valid_label.shape

#### First attempt

As you might know, before running CNN we need to set up some hyperparameters.

In [None]:
# set up hyperparameters 
batch_size = 64
epochs = 1
num_classes = 35 # fix
np.random.seed(222)

Here our model: 3 CNN2D layers and a fully connected layer before the output layer.

In [None]:
# set up the layers 
fashion_model = Sequential()

fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',input_shape=(99,13,1),padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D((2, 2),padding='same'))

fashion_model.add(Conv2D(64, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))

fashion_model.add(Conv2D(128, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))                  
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))

fashion_model.add(Flatten())
fashion_model.add(Dense(128, activation='linear'))
fashion_model.add(LeakyReLU(alpha=0.1))  
fashion_model.add(Dense(num_classes, activation='softmax'))

In [None]:
fashion_model.compile(loss=keras.losses.categorical_crossentropy, 
                      optimizer=keras.optimizers.Adam(),metrics=['accuracy'])

In [None]:
## check the summary
fashion_model.summary()

Let's run. Also here, I just use one epoch for the sake of the example. You should try with at least 20 epochs.

In [None]:
## train and test the accuracy in the validation set
fashion_train = fashion_model.fit(train_X, train_label, batch_size=batch_size,epochs=epochs,
                                  verbose=1,validation_data=(valid_X, valid_label))

In conclusion, with this model you should reach 85% accuracy in validation. Not bad eh? However, we can go much more further.

#### Second attempt

Here we try to control overfitting by introducing the Dropout function and we hope to improve accuracy. Spoiler alert: we did!

To put it simply, during training, some number of layer outputs are randomly ignored or "dropped out." For this we use less parameters and as a result we can control overfitting.

In [None]:
# set up hyperparameters 
batch_size = 124
epochs = 1
num_classes = 35 # fix
np.random.seed(222)

In [None]:
## set up the dropout to improve accuracy
fashion_model = Sequential()
fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',input_shape=(99,13,1),padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D((2, 2),padding='same'))
fashion_model.add(Dropout(0.25))

fashion_model.add(Conv2D(64, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Dropout(0.25))

fashion_model.add(Conv2D(128, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))                  
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Dropout(0.4))

fashion_model.add(Flatten())
fashion_model.add(Dense(128, activation='linear'))
fashion_model.add(LeakyReLU(alpha=0.1))  
fashion_model.add(Dropout(0.3))
fashion_model.add(Dense(num_classes, activation='softmax'))

In [None]:
fashion_model.compile(loss=keras.losses.categorical_crossentropy, 
                      optimizer=keras.optimizers.Adam(),
                      metrics=['accuracy'])

In [None]:
fashion_model.summary()

In [None]:
fashion_train = fashion_model.fit(train_X, train_label, 
                                  batch_size=batch_size,epochs=epochs,verbose=1,
                                  validation_data=(valid_X, valid_label))

In conclusion, with this model you should reach at least 90% accuracy by increasing the number of epochs, let's say a number between 20 and 50.