
<p align="center">
  <img src="https://storage.googleapis.com/kaggle-datasets-images/2243/3791/9384af51de8baa77f6320901f53bd26b/dataset-cover.png" />
  Image source: https://www.kaggle.com/
</p>

## Stage 1: Installing dependencies and setting up GPU environment

In [106]:
!pip install tensorflow



## Stage 2: Import dependencies for the project

In [107]:
import numpy as np
import datetime
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist #our dataset is within keras dataset

In [108]:
tf.__version__

'2.17.0'

## Stage 3: Dataset preprocessing



### Loading the dataset

In [109]:
"""Loading the Fashion Mnist dataset (Fashion-MNIST is a dataset of Zalando's article images—consisting of a training
set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with
a label from 10 classes)"""
#fashion_mnist dataset by keras dataset has load_data() which lets us easily load data for training and test dataset
#this datasets are  actually images
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

### Image normalization

We devide each image in the training and testing dataset with the maxiumum number of pixels (255).

In this way each pixel will be in the rainge [0, 1]. By normalizing imaes we are making sure that our model (ANN) trains faster.

In [110]:
X_train = X_train / 255.0

In [111]:
X_test = X_test / 255.0

In [112]:
X_train.shape #3D Array (index, dimension of the arrays which contains pixels of the images(2D))

(60000, 28, 28)

In [113]:
print(X_train)

[[[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 ...

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]]


In [114]:
X_test.shape #3D Array

(10000, 28, 28)

### Reshaping of the dataset


Here, Xtrain contains the 60,000 images, meaning it's actually a 3D tensor where the first dimension gives the index of the images, you know, telling which image it is. And then the other two dimensions are the dimensions of the arrays which contain the pixels of the images.


And so what we want to do now is flatten basically every image, meaning we're going to transform each of the 2D arrays in X train, you know, each of the 60,000 to the arrays in X train into 60,000 1D vectors by flattening all the pixels into a single 1D vector. And we will do that through reshape.


Since we are using fully connected network, we reshape the training and testing subsets to be in the vector format.

In [115]:
#Since each image is 28x28, we simply use reshape the full dataset to [-1 (all elements), height * width]
#-1 is taken to take all elements and next provide how many collumns we want to have. As the pixel shape is 28*28,
#we want to have all of them and keep in the column.
X_train = X_train.reshape(-1, 28*28)

In [116]:
X_train.shape #2D (Index, 1D flatten vector containing the pixels of each image)

(60000, 784)

In [117]:
print(X_train)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [118]:
#Reshape the testing subset in the same way
X_test = X_test.reshape(-1, 28*28)

In [119]:
print(X_test)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


## Stage 4: Building an Artificial Neural network

### Defining the model

Simply define an object of the Sequential model.

In [120]:
model = tf.keras.models.Sequential()

### Adding the first layer (Dense layer)

Layer hyper-parameters:
- number of units/neurons: 128
- activation function: ReLU
- input_shape: (784, )

In [121]:
#First fully connected hidden layer
#which has 128 neurons, ReLU  as activation function and
#finally the input shape represents  all columns (index=60000, column representing all pixels=784)
#we had in the Training set (flattend pixels to 1D in the data pre processing layer)
model.add(tf.keras.layers.Dense(units=128, activation='relu', input_shape=(784, )))

### Adding a Dropout layer to prevent overfitting



1) Overfitting: This occurs when a model becomes too complex and learns the training data too well, including the noise and irregularities. As a result, the model performs poorly on new, unseen data. It's like memorizing a test rather than understanding the underlying concepts.

2) Underfitting: This happens when a model is too simple and fails to capture the underlying patterns in the training data. It performs poorly on both the training and test data. It's like trying to explain a complex concept with overly simplistic terms.


Dropout is a Regularization technique where we randomly set neurons in a layer to zero. In this way, while training those neurons won't be updated. Because some percentage of neurons won't be updated the whole training process is long and we have less chance for overfitting.

In [122]:
model.add(tf.keras.layers.Dropout(0.2)) #here 20% of the neurons won't be updated to avoid overfitting

### Adding the second layer (output layer)

- units == number of classes (10 in the case of Fashion MNIST)
- activation = 'softmax'

In [123]:
#We have 10 labels in the dataset and we want 10 units or neurons therefore
#Also, we have multiple labels , we  use softmax activation. If we had 2 labels, we would have used sigmoid.
#check this video to understand the reason: https://youtu.be/Y9qdKsOHRjA

model.add(tf.keras.layers.Dense(units=10, activation='softmax'))

### Compiling the model

- Optimizer: Adam
- Loss: Sparse softmax (categorical) crossentropy
- Metrics: Sparse categorical accuracy is a metric used to evaluate the performance of classification models, particularly when dealing with integer-encoded labels. It's a suitable choice when your target variable is represented as integers rather than one-hot encoded vectors.



In [124]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

In [125]:
#Gives summary of the  layers. Here we can see our layer 1 which is dense and has 128 neurons. Second layer had dropout
#layer to reduce overfitting and the final layer with 10 neurons as the data had 10 labels.

#Also params are the weights in the neural network
#As we don't have  hyperparameter, so non-trainable params are 0
model.summary()

### Training the model

In [126]:
model.fit(X_train, y_train, epochs=7)

#epochs, meaning the number of times you're going to train on the full amount of images in X train.
#You know, basically it's the number of times the full training set will enter the neural network.
#Get the predictions, incur the loss back, propagate the loss.
#And this for all the observations in the training set.

Epoch 1/7
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 2ms/step - loss: 0.6687 - sparse_categorical_accuracy: 0.7654
Epoch 2/7
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - loss: 0.4060 - sparse_categorical_accuracy: 0.8533
Epoch 3/7
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - loss: 0.3650 - sparse_categorical_accuracy: 0.8677
Epoch 4/7
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - loss: 0.3504 - sparse_categorical_accuracy: 0.8723
Epoch 5/7
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - loss: 0.3260 - sparse_categorical_accuracy: 0.8794
Epoch 6/7
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - loss: 0.3163 - sparse_categorical_accuracy: 0.8821
Epoch 7/7
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - loss: 0.3078 - sparse_categorical_accuracy: 0.8859


<keras.src.callbacks.history.History at 0x7cb989b3f9a0>

### Model evaluation and prediction

Till now, we have created our own model and checked the accuracy in 5 epochs     


Now, we will use that model to evaluate using our test data (X_test, y_test)

- test_loss: This represents the average loss value calculated across all samples in the test set. It quantifies how well the model's predictions match the ground truth.
- test_accuracy: This indicates the proportion of correctly classified samples in the test set. It's a measure of the model's overall performance on the test data.

In [127]:
test_loss, test_accuracy = model.evaluate(X_test, y_test)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.3435 - sparse_categorical_accuracy: 0.8771


In [128]:
print("Test accuracy: {}".format(test_accuracy)) #formatting the test accuracy we got in the last  box

Test accuracy: 0.8744999766349792
