# Image classification model using tensorflow

Deep neural network architecture with one input, one output, two hidden, and one dropout layer is used for training the model.

In [1]:
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout
from sklearn.model_selection import GridSearchCV

In [2]:
params = {
    'dropout': 0.25,
    'batch-size': 128,
    'epochs': 50,
    'layer-1-size': 128,
    'layer-2-size': 128,
    'initial-lr': 0.01,
    'decay-steps': 2000,
    'decay-rate': 0.9,
    'optimizer': 'adamax'
}

In [3]:
#Loading MNIST data
mnist = tf.keras.datasets.mnist  
num_class = 10
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Need to reshape and normalize the training and test images, where normalization bounds image pixel intensity between 0 and 1

In [5]:
# reshape and normalize the data
x_train = x_train.reshape(60000, 784).astype("float32")/255
x_test = x_test.reshape(10000, 784).astype("float32")/255

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, num_class)
y_test = to_categorical(y_test, num_class)

Architecture:
1. Adding a Flatten layer to convert 2D image matrices to vectors. The input neurons correspond to the numbers in these vectors.
2. Dense() method is used to add two hidden dense layers pulling in the hyperparameters from the “params” dictionary defined earlier with activation function as relu (Rectified Linear Unit).
3. Add the dropout layer using the Dropout method. It is used to avoid overfitting while training the neural network.
4. The output layer is the last layer in our network, which is defined using the Dense() method, corresponding to the number of classes.its).

In [7]:
# Model Definition
# Get parameters from logged hyperparameters
model = Sequential([
  Flatten(input_shape=(784, )),
  Dense(params['layer-1-size'], activation='relu'),
  Dense(params['layer-2-size'], activation='relu'),
  Dropout(params['dropout']),
  Dense(10)
  ])

In [10]:
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=params['initial-lr'],
    decay_steps=params['decay-steps'],
    decay_rate=params['decay-rate']
    )

In [11]:
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

model.compile(optimizer='adamax', 
              loss=loss_fn,
              metrics=['accuracy'])

In [13]:
model.fit(x_train, y_train,
    batch_size=params['batch-size'],
    epochs=params['epochs'],
    validation_data=(x_test, y_test),)

Epoch 1/50
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 8ms/step - accuracy: 0.7642 - loss: 0.8231 - val_accuracy: 0.9324 - val_loss: 0.2318
Epoch 2/50
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 7ms/step - accuracy: 0.9262 - loss: 0.2556 - val_accuracy: 0.9500 - val_loss: 0.1711
Epoch 3/50
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step - accuracy: 0.9446 - loss: 0.1890 - val_accuracy: 0.9571 - val_loss: 0.1429
Epoch 4/50
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step - accuracy: 0.9543 - loss: 0.1538 - val_accuracy: 0.9616 - val_loss: 0.1225
Epoch 5/50
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 7ms/step - accuracy: 0.9611 - loss: 0.1313 - val_accuracy: 0.9648 - val_loss: 0.1096
Epoch 6/50
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.9659 - loss: 0.1141 - val_accuracy: 0.9688 - val_loss: 0.1003
Epoch 7/50
[1m469/469[0m 

<keras.src.callbacks.history.History at 0x1432445b310>

In [14]:
score = model.evaluate(x_test, y_test)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9784 - loss: 0.0921


#### Tuning Hyperparameters in Neural Networks
##### Learning Rate:  
The learning rate tells the model how much to change based on its errors. If the learning rate is high, the model learns quickly but might make mistakes. If the learning rate is low, the model learns slowly but more carefully. This leads to less errors and better accuracy There are ways of adjusting the learning rate to achieve the best results possible. This involves adjusting the learning rate at predefined intervals during training. Furthermore, optimizers like the Adam enables a self-tuning of the learning rate according to the execution of the training.

##### Batch Size:
##### Number of Epochs:
##### Activation Function:
Common activation functions include ReLU, Sigmoid and Tanh.  
ReLU makes the training of neural networks faster since it permits only the positive activations in neurons. Sigmoid is used for assigning probabilities since it outputs a value between 0 and 1. Tanh is advantageous especially when one does not want to use the whole scale which ranges from 0 to ± infinity. The selection of a right activation function requires careful consideration since it dictates whether the network shall be able to make a good prediction or not.
##### Dropout:
Dropout is a technique which is used to avoid overfitting of the model. It randomly deactivates or "drops out" some neurons by setting their outputs to zero during each training iteration. This process prevents neurons from relying too heavily on specific inputs, features, or other neurons. By discarding the result of specific neurons, dropout helps the network to focus on essential features in the process of training. Dropout is mostly implemented during training while it is disabled in the inference phase...


In [None]:
#providing grid for hyperparameter tunning
'''param_grid = {
    'initial-lr': [0.001, 0.01, 0.1],
    'batch-size': [32, 64, 128]
}'''


In [None]:
#using grid search
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)