<a href="https://colab.research.google.com/github/Koshman-Nikita/machineLearning/blob/main/Untitled2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Importing key libraries, and reading data**

In [1]:
import pandas as pd
import numpy as np

np.random.seed(1212)

import keras
from keras.models import Model
from keras.layers import *
from keras import optimizers

**Using TensorFlow backend.**

In [10]:
df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')

In [11]:
df_train.head() 
# 784 features, 1 label


Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# **Splitting into training and validation dataset**

In [12]:
df_features = df_train.iloc[:, 1:785]
df_label = df_train.iloc[:, 0]

X_test = df_test.iloc[:, 0:784]

print(X_test.shape)

(28000, 784)


In [21]:
from sklearn.model_selection import train_test_split
X_train, X_cv, y_train, y_cv = train_test_split(df_features, df_label, 
                                                test_size = 0.2,
                                                random_state = 1212)

X_train = X_train.values.reshape(33600,784)
X_cv = X_cv.values.reshape(8400,784)
X_test = X_test.reshape(28000, 784)

# **Data cleaning, normalization and selection**

In [22]:
print((min(X_train[1]), max(X_train[1])))

(0, 255)


In [23]:
# Feature Normalization 
X_train = X_train.astype('float32'); X_cv= X_cv.astype('float32'); X_test = X_test.astype('float32')
X_train /= 255; X_cv /= 255; X_test /= 255

# Convert labels to One Hot Encoded
num_digits = 10
y_train = keras.utils.to_categorical(y_train, num_digits)
y_cv = keras.utils.to_categorical(y_cv, num_digits)

In [24]:
# Printing 2 examples of labels after conversion
print(y_train[0]) # 2
print(y_train[3]) # 7

[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]


# **Model Fitting**

We proceed by fitting several simple neural network models using Keras (with TensorFlow as our backend) and collect their accuracy. The model that performs the best on the validation set will be used as the model of choice for the competition.

**Model 1: Simple Neural Network with 4 layers (300, 100, 100, 200)**

In our first model, we will use the Keras library to train a neural network with the activation function set as ReLu. To determine which class to output, we will rely on the SoftMax function

In [25]:
# Input Parameters
n_input = 784 # number of features
n_hidden_1 = 300
n_hidden_2 = 100
n_hidden_3 = 100
n_hidden_4 = 200
num_digits = 10

In [26]:
Inp = Input(shape=(784,))
x = Dense(n_hidden_1, activation='relu', name = "Hidden_Layer_1")(Inp)
x = Dense(n_hidden_2, activation='relu', name = "Hidden_Layer_2")(x)
x = Dense(n_hidden_3, activation='relu', name = "Hidden_Layer_3")(x)
x = Dense(n_hidden_4, activation='relu', name = "Hidden_Layer_4")(x)
output = Dense(num_digits, activation='softmax', name = "Output_Layer")(x)

In [27]:
# Our model would have '6' layers - input layer, 4 hidden layer and 1 output layer
model = Model(Inp, output)
model.summary() # We have 297,910 parameters to estimate

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 784)]             0         
                                                                 
 Hidden_Layer_1 (Dense)      (None, 300)               235500    
                                                                 
 Hidden_Layer_2 (Dense)      (None, 100)               30100     
                                                                 
 Hidden_Layer_3 (Dense)      (None, 100)               10100     
                                                                 
 Hidden_Layer_4 (Dense)      (None, 200)               20200     
                                                                 
 Output_Layer (Dense)        (None, 10)                2010      
                                                                 
Total params: 297,910
Trainable params: 297,910
Non-trainable

In [28]:
# Insert Hyperparameters
learning_rate = 0.1
training_epochs = 20
batch_size = 100
sgd = optimizers.SGD(lr=learning_rate)

  super().__init__(name, **kwargs)


In [30]:
# We rely on the plain vanilla Stochastic Gradient Descent as our optimizing methodology
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

In [31]:
history1 = model.fit(X_train, y_train,
                     batch_size = batch_size,
                     epochs = training_epochs,
                     verbose = 2,
                     validation_data=(X_cv, y_cv))

Epoch 1/20
336/336 - 4s - loss: 1.8833 - accuracy: 0.4814 - val_loss: 1.0472 - val_accuracy: 0.7706 - 4s/epoch - 12ms/step
Epoch 2/20
336/336 - 3s - loss: 0.6589 - accuracy: 0.8315 - val_loss: 0.4629 - val_accuracy: 0.8756 - 3s/epoch - 8ms/step
Epoch 3/20
336/336 - 3s - loss: 0.4051 - accuracy: 0.8867 - val_loss: 0.3546 - val_accuracy: 0.9000 - 3s/epoch - 10ms/step
Epoch 4/20
336/336 - 2s - loss: 0.3322 - accuracy: 0.9048 - val_loss: 0.3137 - val_accuracy: 0.9111 - 2s/epoch - 7ms/step
Epoch 5/20
336/336 - 3s - loss: 0.2922 - accuracy: 0.9149 - val_loss: 0.2811 - val_accuracy: 0.9198 - 3s/epoch - 8ms/step
Epoch 6/20
336/336 - 2s - loss: 0.2634 - accuracy: 0.9229 - val_loss: 0.2622 - val_accuracy: 0.9220 - 2s/epoch - 7ms/step
Epoch 7/20
336/336 - 4s - loss: 0.2424 - accuracy: 0.9298 - val_loss: 0.2431 - val_accuracy: 0.9275 - 4s/epoch - 12ms/step
Epoch 8/20
336/336 - 2s - loss: 0.2236 - accuracy: 0.9344 - val_loss: 0.2265 - val_accuracy: 0.9346 - 2s/epoch - 7ms/step
Epoch 9/20
336/336 - 

Using a 4 layer neural network with:

20 training epochs

A training batch size of 100

Hidden layers set as (300, 100, 100, 200)

Learning rate of 0.1

Achieved a training score of around 96-98% and a test score of around 95 - 97%.

In [32]:
test_pred = pd.DataFrame(model.predict(X_test, batch_size=200))
test_pred = pd.DataFrame(test_pred.idxmax(axis = 1))
test_pred.index.name = 'ImageId'
test_pred = test_pred.rename(columns = {0: 'Label'}).reset_index()
test_pred['ImageId'] = test_pred['ImageId'] + 1

test_pred.head()



Unnamed: 0,ImageId,Label
0,1,2
1,2,0
2,3,9
3,4,9
4,5,3


In [33]:
test_pred.to_csv('mnist_submission.csv', index = False)