<a href="https://colab.research.google.com/github/ashikshafi08/Learning_Tensorflow/blob/main/Other%20Courses/Getting_Started_with_TensorFlow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook contains all the materials and notes for the Getting Started with TensorFlow 2 Course by Imperial College London. 



In [None]:
# Importing TensorFlow 
import tensorflow as tf 
import numpy as np 
import matplotlib.pyplot as plt 

## The Sequential model API 

### Build a Sequential Model 

It's really easy and intuitive way to construct a deeplearning models. Probably most of the neural networks that we work with, can be built using the Sequential Class. 

This will have the list of keras layers. 

In [None]:
# Importing the layers we're going to use 

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense , Flatten , Softmax 

Build a feedforward neural network model

In [None]:
model = tf.keras.Sequential([
  Flatten(input_shape = (28 , 28)) , # Explicitly specifying the input_shape (to build the model) 
  Dense(16 , activation = 'relu') , 
  Dense(10 , activation = 'relu'), 
  Dense(10 , activation= 'relu') , 
  #Dense(10 , activation = 'sigmoid')
  tf.keras.layers.Activation('sigmoid')
])

# Getting the model summary 
model.summary()

### Convolutional and Pooling Layers in TensorFlow 

Previously we build our models with Feedforward networks, but now will use Convolutional layers to build our model.

In [None]:
# Importing the needed packages 
from tensorflow.keras.layers import Flatten, Dense , Conv2D , MaxPooling2D

In [None]:
# Building a Convolutional Model 
model = tf.keras.Sequential([
  Conv2D(filters= 16 , kernel_size= 3 , 
         activation = 'relu' , input_shape = (32 , 32 , 3)) , 
  MaxPooling2D(pool_size= 3) , 
  Flatten() , 
  Dense(64 ,  activation= 'relu'), 
  Dense(10 , activation= 'softmax')
])

# Getting the summary of the model 
model.summary()

In [None]:
# Build the Sequential convolutional neural network model

model = Sequential([
    Conv2D(32 , kernel_size=3 , padding = 'SAME' , strides = 2 , input_shape = (224 , 224, 3)) , 
    MaxPooling2D(3), 
    Conv2D(16 , 3 , 2),
    Flatten(),
    Dense(30 , activation = 'relu'),
    Dense(10 , activation = 'sigmoid')
])

# Summary of the model 
model.summary()

### Weight and bias initializers 
Will discuss the different ways to intialize weights and biases in the layers of neural networks.

#### Default weight and biases
The models we've worked so far, we have not specified the **initial values of the weights and biases** in each layers. 

Tensorflow set's the default value depends upon what type of layer's we are using. 

For instance, 
- In `Dense` layer the **biases** are set to zero (`zeros`) by default. 
- While the **weights** are set according to the `glorot_uniform`, or the Glorot uniform initializer. 



#### Initializing your own weights and biases 

We can even initialize our own weights and biases, and TensorFlow makes the process quite straightforward. 

This can be achieved by using tweaking two optional arguments in each layer, 
- `kernel_initialiser` - for weights. 
- `bias_initialiser` - for the biases. 

Note: For `MaxPooling` layers we need not to specify the weights and biases. Will throw an error. 

Let's initialize the weights and biases by ourselves. 

In [None]:
# Importing again (to make a practice)
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Flatten, Dense , Conv2D , MaxPool2D

In [None]:
# Constructing a model (with manual weight and bias initializer)

model = Sequential([
  Conv2D(16 , 3 , kernel_initializer='random_uniform' , 
         bias_initializer = 'zeros' , activation = 'relu' 
         , input_shape = (224 ,224 , 3)), 
  MaxPooling2D(3) , 
  Flatten(), 
  Dense(64, kernel_initializer= 'he_uniform' , bias_initializer='ones' , 
        activation = 'relu'), 
  
])

We can even instantiate initialisers in slightly different manner, allowing us to set optional arguments of the initalisation metod. 

- https://keras.io/api/layers/initializers/
- https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/initializers

In [None]:
# Slightly different method for more flexibility 

model = Sequential([
  Conv2D(16 , 3 , kernel_initializer='random_uniform' , 
         bias_initializer = 'zeros' , activation = 'relu' 
         , input_shape = (224 ,224 , 3)), 
  MaxPooling2D(3) , 
  Flatten(), 
  Dense(64, kernel_initializer= 'he_uniform' , bias_initializer='ones' , 
        activation = 'relu'), 
  Dense(64 , kernel_initializer= tf.keras.initializers.TruncatedNormal(mean = 0.0 , stddev= 0.05) , 
        bias_initializer = tf.keras.initializers.Zeros() , 
        activation = 'relu') , 
        Dense(8 , kernel_initializer= tf.keras.initializers.Orthogonal(gain = 1.0 , seed = None) , 
              bias_initializer = tf.keras.initializers.Constant(value = 0.4) , 
              activation = 'relu')
  
])

# Getting the summary of the model 
model.summary()

#### Custom weight and bias initializers 

It's also possible to define your own weight and bias initializers. 

But the initializers must take in two arguments, 
- the `shape` of the tensor to be initialized. 
- the `dtype`.

In [None]:
import tensorflow.keras.backend as K 

In [None]:
# Define a custom initializer 

def my_init(shape , dtype = None):
  return K.random_normal(shape , dtype = dtype)

# Checking how our initializer works
model = Sequential([
  Flatten(input_shape = (28 , 28) ) , 
  Dense(64 , kernel_initializer= my_init)
])

# Summary of the model 
model.summary()

In [None]:
model.weights[:10]

We can even visualize the initialized weights and biases to see the effect of them.

In [None]:
# Filter our the pooling and flatten layers, because they don't have any weights
weight_layers = [layer for layer in model.layers if len(layer.weights) > 0]

weight_layers

In [None]:
# Plot histograms of weight and bias values 

fig , axes = plt.subplots(5 , 2 , figsize = (12 , 16))
fig.subplots_adjust(hspace= 0.5 , wspace = 0.5)

for i , layer in enumerate(weight_layers):
  for j in [0 , 1]:
    axes[i , j].hist(layer.weights[j].numpy().flatten() , align = 'left')
    axes[i , j].set_title(layer.weights[j].name)

### Compiling our model 
We saw how to build a model but to start training the model on our data we need to specify: 
- loss function 
- optimization function 
- a metric 

To do this we use `compile` to specify all the above 3 things in order to get our model ready for training.  

In [None]:
# Let's build a simple binary classification model 

model = Sequential([
  Dense(64 , activation= 'elu' , input_shape = (32,)) , 
  Dense(1 , activation = 'sigmoid')
])

# Now our important step is to compile our model 
model.compile(loss = tf.keras.losses.BinaryCrossentropy(from_logits= True) , 
              optimizer = tf.keras.optimizers.SGD(learning_rate= 0.001 , momentum= 0.9 , 
                                                  nesterov = True) , 
              metrics = [tf.keras.metrics.BinaryAccuracy(threshold= 0.7) , 
                         tf.keras.metrics.MeanAbsoluteError()])

Writing in a object format, it gives us more flexibility to tweak the parameter values. 

In [None]:
# Getting the summary of our model 
model.summary()

In [None]:
# What's stored in the model 
print(model.optimizer)
print(model.loss)
print(model.metrics)

In [None]:
# Printing the learning rate of the optimizer 
model.optimizer.lr

#### **Metrics in Keras**
 
We will explore the different metrics in keras. 

One of the most common metrics used for classification problem in Keras is `accuracy`. 

In [None]:
# Compile the model 
model.compile(loss = 'sparse_categorical_crossentropy' , 
              optimizer = 'Adam' , 
              metrics = ['accuracy'])

We now have a model that uses accuracy as a metric to judge the performance. 

But how is this metric actually calculated? We can break this in two cases. 

**Case 1 - Binary Classification with sigmoid activation function**

We are training a model for binary classification problem with a sigmoid activation function in the output layer (Cat or Dog). 

- Given the input, the model will output a float between 0 and 1. 
- Based on whether the float is less than or greater than the `threshold` of the accuracy (default it's 0.5). 
- We round the float to get the predicted classification from the model (`y_pred`).

This accuracy metric compares, 
- the value `y_pred` on each training examples. 
- with the true output `y_true` will be an one hot encoded vector. 

Atlast the accuracy computes the mean of 𝛿(𝑦(𝑖)𝑝𝑟𝑒𝑑,𝑦(𝑖)𝑡𝑟𝑢𝑒) over all training examples.


In [None]:
# Let's see in code how it's implemented (Sigmoid Function)

y_true = tf.constant([0.0 , 1.0 , 1.0]) # the class 
y_pred = tf.constant([0.4 , 0.8 , 0.3]) # the prediction prob of y_true 

# Accuracy
accuracy = K.mean(K.equal(y_true , K.round(y_pred))) 
print(f'The overall accuracy (taking mean over all the examples): {accuracy} ')


**Case 2 - Categorical Classification** 

Imagine we want to train a model for the classification problem which has more than 1 classes (Dog breeds), we use a activation function called `softmax` in the last layer. 

Given the training examples X(i) the model will output a tensor of probabilities 𝑝1,𝑝2,…𝑝𝑚 according to the model that x(i) falls into each class. 

Here the accuracy metric works a bit differetn rather comparing to the value of y_true, 
- it determines the largest argument in the `y_pred` tensor of one sample. 
- Then compares the index to the index of the maximum value of 𝑦(𝑖)𝑡𝑟𝑢𝑒 to determine the 𝛿(𝑦(𝑖)𝑝𝑟𝑒𝑑,𝑦(𝑖)𝑡𝑟𝑢𝑒). 
- It then computes the accuracy in the same way as for binary classification case. 

*Note*: The accuracy of binary classificaton problems is the same, no matter if we use a sigmoid or softmax activation function to obtain the output. 

In [None]:
# Binary classification with softmax 

y_true = tf.constant([[0.0 , 1.0] , [1.0 , 0.0] , 
                      [1.0 , 0.0] , [0.0 , 1.0]])

y_pred = tf.constant([[0.4 , 0.6] , [0.3 , 0.7] , 
                      [0.05 , 0.95] , [0.33 , 0.67]])

accuracy = K.mean(K.equal(y_true , K.round(y_pred)))
accuracy

In [None]:
# Categorical classification when m > 2 (num_classes > 2)

y_true = tf.constant([[0.0,1.0,0.0,0.0],[1.0,0.0,0.0,0.0],[0.0,0.0,1.0,0.0]])
y_pred = tf.constant([[0.4,0.6,0.0,0.0], [0.3,0.2,0.1,0.4], [0.05,0.35,0.5,0.1]])

# We need to find the maximum index (argmax)
accuracy = K.mean(K.equal(K.argmax(y_true , axis = -1) , K.argmax(y_pred , axis = -1)))
accuracy

Let's compile our model with different accuracy.

In [None]:
# Compile the model with different accuracy 

# Binary Accuracy 
model.compile(optimizer = 'Adam' , 
              loss = 'sparse_categorical_crossentropy' , 
              metrics = [tf.keras.metrics.BinaryAccuracy(threshold = 0.5)])


**Sparse Categorical Accuracy** 

Very similar metric to categorical accuracy with one major difference. 

That is the label `y_true` of each training examples is not expected to be a one hot encoded vector. 

But to be tensor consisting of single integer. This index is compared to the index of the maximum argument `y_pred`. 

In [None]:
# Using Sparse Categorical Accuracy 

model.compile(loss = 'sparse_categorical_crossentropy' , 
              optimizer ='adam' , 
              metrics = [tf.keras.metrics.SparseCategoricalAccuracy()])

**(Sparse) Top k - categorical accuracy**

In top k-categorical accuracy, 
- Instead of computing how often the mdoel correctly predicts the label of a training example. 
- Here the metric computes how often the model has `y_true` in the top-k of if it's prediction. By default **k = 5**

There is two version of it: 
- `tf.keras.metrics.SparseTopKCategoricalAccuracy()`
- `tf.keras.metrics.TopKCategoricalAccuracy()`

In [None]:
# Sparse Categorical 
def sparse_categorical_accuracy(y_true, y_pred ):
    return K.cast( K.equal(K.max(y_true, axis=-1),
                          K.cast(K.argmax(y_pred, axis=-1), K.floatx()) ),
                  K.floatx())

In [None]:
# Top k categorical
def top_k_categorical_accuracy(y_true, y_pred, k=5):
    return K.mean(K.in_top_k(y_pred, K.argmax(y_true, axis=-1), k), axis=-1)

In [None]:
# For the Sparse one 

def sparse_top_k_categorical_accuracy(y_true , y_pred , k= 5):
  return K.mean(K.in_top_k(y_pred , K.cast(K.max(y_true , axis = -1), 
                                           'int32') , k) , axis = 1)

Things to refer: 
- https://keras.io/metrics/
- https://github.com/keras-team/keras/blob/master/keras/metrics.py

### The Fit method 

In [None]:
# Let's build a model and fit it. 
model = Sequential([
  Dense(64 , activation= 'elu' , input_shape = (32 ,)), 
  Dense(100 , activation= 'softmax')
])

# Compile the model 
model.compile(loss = 'categorical_crossentropy' , 
              optimizer = 'rmsprop' , 
              metrics = ['accuracy'])

# Fitting the model 
# model.fit(X_train , y_train)

# X_train (num_samples , num_features)
# y_train (num_samples , num_classes)

In [None]:
# Let's build a model and fit it. 
model = Sequential([
  Dense(64 , activation= 'elu' , input_shape = (32 ,)), 
  Dense(100 , activation= 'softmax')
])

# Compile the model 
model.compile(loss = 'sparse_categorical_crossentropy' , 
              optimizer = 'rmsprop' , 
              metrics = ['accuracy'])

# Fitting the model 
# history = model.fit(X_train , y_train , epochs = 10)

# X_train (num_samples , num_features)
# y_train (num_samples , ) # one dimensional array with lenght == num_samples



`history` --> TensorFlow history object which contains the record and progress of the model training. In terms of loss , metrics , epochs and whatever we pass in.

It's time to work on a real data! 

In [None]:
# Load the Fashion-MNIST dataset 

fashion_mnist_data = tf.keras.datasets.fashion_mnist

# Splitting into sets 
(train_images , train_labels) , (test_images , test_labels) = fashion_mnist_data.load_data()


# Checkin the shapes 
train_images.shape , train_labels.shape , test_images.shape , test_labels.shape

In [None]:
# Building a Convolutional model 

model = Sequential([
  Conv2D(16 , 3 , activation= 'relu' , input_shape = (28 , 28 , 1)), 
  MaxPooling2D(3) , 
  Flatten() , 
  Dense(10 , activation= 'softmax')
])

# Model summary 
model.summary()

In [None]:
# Compiling the model 
acc = tf.keras.metrics.SparseCategoricalAccuracy()
mae = tf.keras.metrics.MeanAbsoluteError()


model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy() , 
              optimizer = tf.keras.optimizers.Adam(learning_rate= 0.005) , 
              metrics = [acc , mae] )

In [None]:
# Model Attributes 
print(model.loss)
print(model.optimizer)
print(model.metrics)
print(model.optimizer.lr)

In [None]:
# Labels for our data 

labels = [
    'T-shirt/top',
    'Trouser',
    'Pullover',
    'Dress',
    'Coat',
    'Sandal',
    'Shirt',
    'Sneaker',
    'Bag',
    'Ankle boot'
]

In [None]:
# Rescaling the values betwene 0 - 1 
train_images = train_images / 255.
test_images = test_images / 255. 

In [None]:
# Display one of the images 
i = 8899
img = train_images[i , : , :]
plt.imshow(img)
print(f'The labels of the image is: {labels[train_labels[i]]}')

In [None]:
tf.rank(train_images), tf.rank(train_images[... , np.newaxis])

In [None]:
tf.rank(train_labels)

In [None]:
tf.config.run_functions_eagerly(True)


In [None]:
# Fitting the model 

history = model.fit(train_images[... , np.newaxis] , train_labels , epochs = 10 , 
                    batch_size = 256)

In [None]:
# Our history dataframe 

import pandas as pd 
df = pd.DataFrame(history.history)
df.head()

In [None]:
# Make a plot for the loss 

loss_plot = df.plot(y = 'loss' , title = 'Loss vs Epochs' , legend = False)
loss_plot.set(xlabel = 'Epochs' , ylabel = 'Loss')

### Evaluate and Predict methods

In [None]:
# Evaluate on the test dataset 

tets_loss , test_accuracy , test_mae =  model.evaluate(test_images[... , np.newaxis] , test_labels)

In [None]:
test_images[np.newaxis,...,np.newaxis].shape

In [None]:
# Make predictions from the model 

random_inx = np.random.choice(test_images.shape[0])

test_image = test_images[random_inx]
plt.imshow(test_image)
plt.show()
print(f'Label: {labels[test_labels[random_inx]]}')

## Validation ,Regularizations and Callbacks 



### Validation Sets
Validation set is used to measure how well our models performing outside the training set.

In [None]:
# Loading the data 

from sklearn.datasets import load_diabetes 
diabetes_dataset = load_diabetes()

In [None]:
# Save the input and target variab;e 

inputs = diabetes_dataset['data']
targets = diabetes_dataset['target']

# Checking the shape 
inputs.shape , targets.shape

In [None]:
# Spread of targets 
min(targets) , max(targets)

In [None]:
# Since there is huge spread in targets,normalizing the target data (will make clearer training curve )

targets = (targets - targets.mean(axis = 0)) / targets.std() 
targets

In [None]:
# Split the data into train and test sets
from sklearn.model_selection import train_test_split
train_data , test_data , train_targets , test_targets = train_test_split(inputs , targets , test_size = 0.2)

train_data.shape , train_targets.shape , test_data.shape , test_targets.shape

**Train a feeforward neural network model** 


In [None]:
# Importing the things we need 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

def get_model():
  model = Sequential([
    Dense(128 , activation= 'relu' , input_shape = (train_data.shape[1], )),
    Dense(128 , activation = 'relu'),
    Dense(128 , activation= 'relu'), 
    Dense(128 , activation='relu'), 
    Dense(128 , activation='relu'), 
    Dense(128 , activation='relu'), 
    Dense(1)
  ])

  return model 

# Instantiating the model 
model = get_model()

# SUmmary of the model 
model.summary()



In [None]:
train_data.shape[1]

In [None]:
# Compile the model 
model.compile(optimizer= 'adam' , 
              loss = 'mse' , 
              metrics = ['mae'])

In [None]:
# Train the model 
history = model.fit(train_data , train_targets , epochs = 100 , 
                    validation_split = 0.15 , batch_size = 64 , 
                    verbose = False)

In [None]:
# Evaluate the model on test data 
model.evaluate(test_data , test_targets)

In [None]:
# Plotting the learning curves 

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Loss vs Epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training' , 'Validation'] , loc = 'upper right')
plt.show()

Hmm.. our model is severly overfitting. Let's in next module how to overcome this!

### Regularizations 

Techniques used to avoid overfitting into the model training and have the effect of constraining the model capacity in preventing overfitting. 

We'll look at: 
- **L2 regularization** -> weight decay in neural network. 
- **L1 regularization** 
- How to use Dropouts in our models 


In [None]:
# Creating model with regularization works for both Dense and Conv layers 

model = Sequential([
  Dense(64 , activation= 'relu' , 
        kernel_regularizer = tf.keras.regularizers.l2(0.001)), 
  
])

# Compile the model 
model.compile(loss = 'binary_crossentropy' , 
              optimizer = 'adadelta' , 
              metrics = ['acc'])

# Fitting 
# model.fit(inputs ,targts , validation_split= 0.2)

In [None]:
# Let's look into l1 regularizers 

model = Sequential([
  Dense(64 , activation = 'relu' , 
        kernel_regularizer = tf.keras.regularizers.l1(0.005)) ,
  Dense(1 , activation = 'sigmoid')
])


In [None]:
# Using both l1 and l2 regularizers

model = Sequential([
  Dense(64 , activation = 'relu' , 
        kernel_regularizer = tf.keras.regularizers.l1_l2(l1 = 0.005 , 
                                                         l2 = 0.001)) ,
  Dense(1 , activation = 'sigmoid')
])


**Regularizer for Bias** 

In [None]:
# Regularizer for bias

model = Sequential([
  Dense(64 , activation = 'relu' , 
        kernel_regularizer = tf.keras.regularizers.l1_l2(l1 = 0.005 ,l2 = 0.001) , 
        bias_regularizer = tf.keras.regularizers.l2(0.001)) ,
  Dense(1 , activation = 'sigmoid')
])


**Dropout layer**

Dropout also has a regularizing effect on the neural network. We can add it as just like another layer. 

The dropout layer accepts an argument called `dropout_rate`, 
- the rate has been set to 0.5. 
- that mean that each weight connection between two dense layers is set to **zero** with probability 0.5. 
This is also known as **Bernoulli Dropout**, since the weights are effectively being multiplied by a bernoulli random variable. 

Each of the weights are randomly dropped out independently from one another and Dropout has also applied independently across each element in the batch at training time.

In [None]:
# Dropout layer
model = Sequential([
  Dense(64 , activation = 'relu' , 
        kernel_regularizer = tf.keras.regularizers.l1_l2(l1 = 0.005 ,l2 = 0.001) , 
        bias_regularizer = tf.keras.regularizers.l2(0.001)) ,
  Dropout(0.5),
  Dense(1 , activation = 'sigmoid')
])


There are certain mode which comes when we use `Dropout` layer,
- Training mode , with dropout. Here the weights will get dropped randomly during training .This happens during the `model.fit()` method.
- Testing mode, no dropout. We stop dropping the weights randomly here. This happens during the methods, 
  - `model.evaluate()`
  - `model.predict()`
These two modes are handled behind the scenes. 

We can even control this two modes later in the course, by having more control over the model. 

In [None]:
# Getting back our overfitting model to fix it 
# Importing the things we need 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense , Dropout
from tensorflow.keras import regularizers

def get_model(wd , rate):
  '''
  wd --> Weight decay 
  rate --> dropout rate 
  '''
  model = Sequential([
    Dense(128 , activation= 'relu' ,
          kernel_regularizer = regularizers.l2(wd), input_shape = (train_data.shape[1], )),
    Dropout(rate),
    Dense(128 , activation = 'relu' , kernel_regularizer = regularizers.l2(wd)),
    Dropout(rate),
    Dense(128 , activation= 'relu' , kernel_regularizer = regularizers.l2(wd)), 
    Dropout(rate),
    Dense(128 , activation='relu', kernel_regularizer = regularizers.l2(wd)), 
    Dropout(rate),
    Dense(128 , activation='relu' , kernel_regularizer = regularizers.l2(wd)), 
    Dropout(rate),
    Dense(128 , activation='relu' , kernel_regularizer = regularizers.l2(wd)), 
    Dropout(rate),
    Dense(1)
  ])

  return model 

# Instantiating the model 
model = get_model(1e-5 , 0.3)

# SUmmary of the model 
model.summary()



In [None]:
# Compile the model 
model.compile(loss = 'mse' , 
              optimizer = 'adam' , 
              metrics = ['mae'])

# Training the model and see the performance 
history = model.fit(train_data , train_targets , validation_split = 0.2 ,
                    epochs = 100 , batch_size = 64)

In [None]:
# Evaluate the performance of the model on test data 
model.evaluate(test_data , test_targets)

In [None]:
# Plotting the learning curves 

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Loss vs Epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training' , 'Validation'] , loc = 'upper right')
plt.show()

Great!! Our loss is less compared to the un-regularized model. That's brilliant! 

Though the overfitting isn't completely fixed, but our regularizer played an significant effect. 