In [2]:
import tensorflow as tf

In [3]:
tf.__version__

'2.4.1'

In [4]:
import numpy as np
import datetime
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

# <mark>`Data Preprocessing`</mark>

**<mark>`Loading data`</mark>**

In [5]:
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

<mark>**`Normalizing the images`**</mark>

we divide each and every pixel of the image in the trianing and testing sets by the maximum number of pixels (255)
In this way each pixel will be in the range[0,1], Ny normalizing images we make sure that our model (ANN) trains faster


In [6]:
X_train = X_train / 255.0

In [7]:
X_test = X_test / 255.0

In [8]:
X_train.shape

(60000, 28, 28)

## <mark>`Reshaping the data`</mark>

we are working on building a fully connected neural network, for that we reshape the training and test into vector form..

vector ---> flate 



tensor ---> X and y

In [9]:
# you can see that our datasets are 28*28 dimensions..
# we will reshape and flatten the images of dataset

X_train = X_train.reshape(-1, 28*28)

# you can see that our images is 28*28 so we multiply X-axis and y-axis of the images to flatten the images of dataset
# and -1 means all images... meaning convert nth number of image.


In [10]:
X_train.shape

(60000, 784)

In [11]:
X_test = X_test.reshape(-1, 28*28)

In [12]:
X_test.shape

(10000, 784)

**See in the above cells that our all images of the dataset are now flatten**

# <mark>`Building Artificial Neural Network`</mark>
**Defining the Model**

Simply define an object of the Sequential model.

### <mark>`ANN architecture`</mark>

<img src="https://www.researchgate.net/profile/Facundo-Bre/publication/321259051/figure/fig1/AS:614329250496529@1523478915726/Artificial-neural-network-architecture-ANN-i-h-1-h-2-h-n-o.png">


**<mark>`Activation Functions`</mark>**


<img src="https://miro.medium.com/max/1200/1*ZafDv3VUm60Eh10OeJu1vw.png">

**<mark>`Dropout`<mark>**
    
    
<img src="https://www.oreilly.com/library/view/tensorflow-for-deep/9781491980446/assets/tfdl_0408.png">

In [13]:
model = tf.keras.models.Sequential()

**Adding a fully-connected Hidden layer**
layer hyperparameter:
  * number of neurons/units : 128
  * activation function : ReLU
  * input_shape: (784, )

In [14]:
model.add(tf.keras.layers.Dense(units=128, activation='relu', kernel_initializer='he_uniform', input_shape=(784, )))

`Relu` effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network.



**Adding PReLU Acitivation function to the archetecture of the model**

In [15]:
# model.add(tf.keras.layers.PReLU(alpha_initializer='zeros'))

**with tanh activation functions**

In [16]:
# model.add(tf.keras.layers.Dense(units=64, activation='tanh', input_shape=(784, )))

In [17]:
# model.add(tf.keras.layers.Dense(units=64, activation='relu',  input_shape=(784, ))) 
# # we can initialize weight by kernel_initializer='he_uniform',

**Defining Leaky_relu activation function**

In [18]:
# from tensorflow.keras.layers import LeakyReLU

# leaky_relu = LeakyReLU(alpha=0.01)
# model.add(tf.keras.layers.Dense(units=64, activation=leaky_relu, input_shape=(784, )))

In [19]:
# model.add(tf.keras.layers.Dense(units=32, activation='relu',  input_shape=(784, )))

# # we can also initialize weight by kernel_initializer='he_uniform',

**Adding a Second layer with Dropout**
>Dropout is a Regularization technique where we randomly set neurons in a layer to zero. That way while training those neurons won't be uploaded. Because some percentage od neurons won't be updated the whole training process is long and we have less chance for overfitting.

In [20]:
model.add(tf.keras.layers.Dropout(0.2))

**Adding the output layer**
  * units number of classes (10 in the Fashion MNIST dataset)
  * activation: softmax

In [21]:
model.add(tf.keras.layers.Dense(units=10, activation='softmax'))

`Softmax` takes a set of values, and effectively picks the biggest one, so, for example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it saves you from fishing through it looking for the biggest value, and turns it into [0,0,0,0,1,0,0,0,0] -- The goal is to save a lot of coding!

**Output layer with Sigmoid activation function**

In [22]:
# model.add(tf.keras.layers.Dense(units=10, activation='sigmoid'))

**Compiling the model**
  * Optimizer: Adam
  * Loss : Sparse softmax (categorical) crossentropy

**Setting up intilial_learning_rate, total_steps/decay_steps, decay_rate/warmup_learning_rate**

In [33]:
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-2,
    decay_steps=10000,
    decay_rate=0.0001)

## decay_steps == warmup_steps
## decay_rate == decay_learning_rate

**Setting up Learning rate of optimizer**

In [24]:
# defining Learning rate for adam optimizer


adam = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07)
    
    
    

**Compiling the model**

In [25]:
model.compile(optimizer=adam,  loss='sparse_categorical_crossentropy', metrics='sparse_categorical_accuracy')

**Trying SGD (Stochostic Gradient Descent) optimizer**

In [26]:
# Defining SGD Optimizer and assigining Learning_rate
# from keras.optimizers import SGD

# # decay=0.01
# # With learning rate decay, the learning rate is calculated each update (e.g. end of each mini-batch) as follows:
# # lrate = initial_lrate * (1 / (1 + decay * iteration))

# opt = SGD(lr=0.01, momentum=0.9, decay=0.01)


**Checking the Artichecture of our model**

In [27]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________


**<mark>`Train the model`</mark>**

In [28]:
model.fit(X_train, y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x1f5e3aa7700>

In [29]:
X_train.shape

(60000, 784)

**Model evaluation and prediction**

In [30]:
test_loss, test_accuracy = model.evaluate(X_test, y_test)



In [31]:
test_loss

0.33483272790908813

In [32]:
test_accuracy

0.8883000016212463

# <mark>`Final Overview`</mark>
  Tried many activation functions and optimizers with defferent learning rates..

  * the best activation function is relu for hidden layers in all model archetecture.
  * the best optimizers is Adam with learning rate 0.01 
  * softmax activation function is use for multi-classes dataset
  * Sigmoid activation function is use for binary classes dataset. (e.g 1 or 2, yes or no, apple or banana)
  * Kernal_initializer == weight initializer in kera
  * decay_steps == is warmup_steps
  * decay_rate == warmup_learning_rate

### <mark>`Callback to Control Training`</mark>
> When you set your epochs to 10 and you desire accuracy is reached at epoch 6 so how can you stop your program? how to callback?

Earlier when you trained for extra epochs you had an issue where your loss might change. It might have taken a bit of time for you to wait for the training to do that, and you might have thought 'wouldn't it be nice if I could stop the training when I reach a desired value?' -- i.e. 95% accuracy might be enough for you, and if you reach that after 3 epochs, why sit around waiting for it to finish a lot more epochs....So how would you fix that? Like any other program...you have callbacks! Let's see them in action...

In [None]:
import tensorflow as tf
print(tf.__version__)

class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('loss')<0.4):
      print("\nReached 60% accuracy so cancelling training!")
      self.model.stop_training = True

callbacks = myCallback()
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.fit(training_images, training_labels, epochs=5, callbacks=[callbacks])