# Advanced Certification Programme in AI and MLOps
## A programme by IISc and TalentSprint
### Assignment 3: Residuals, Filters, CNN

## Learning Objectives:

At the end of the experiment, you will be able to:
1.   Build important blocks for modern CNNs
      - Residual connections
      - Batch normalisation
      - Depthwise separable convolution
2.   Interpret what CNNs learn
      - Visualising activations
      - Visualising filters
      - Visualising heatmaps


## 1. Important building blocks for modern CNNs

Here we will study about 3 important building blocks:
* Residual connection
* Batch normalization
* Depthwise separable convolution



### Residual connections

Why do we need them?

* CNNs can become extremely deep.
* Prone to the vanishing gradient problem

Solution:
* allow gradients to flow through another shortcut




![picture](https://drive.google.com/uc?export=view&id=1gOwZPYnxfCGSsevLCzc7_41apUrkJKI5)

### Setup Steps:

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "" #@param {type:"string"}

In [None]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "" #@param {type:"string"}

In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "M2_AST_03_Modern_CNN_Architectures_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")

    ipython.magic("sx wget https://cdn.iisc.talentsprint.com/AIandMLOps/Datasets/cats_vs_dogs_small.zip")
    ipython.magic("sx unzip '/content/cats_vs_dogs_small.zip'")
    ipython.magic("sx gdown https://drive.google.com/uc?id=1QrATuMoM1o0UKkhc_R4jX2ZI01yOYU-n")

    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://aimlops-iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError
    else:
      return Answer
  except NameError:
    print ("Please answer Question")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")



### Import libraries

In [None]:
import numpy as np  # Import the numpy library, commonly used for numerical operations on arrays and matrices.
import pandas as pd  # Import the pandas library, useful for data manipulation and analysis with DataFrames.
import matplotlib.pyplot as plt  # Import the pyplot module from matplotlib, mainly used for plotting data visualizations.

from tensorflow import keras  # Import the core TensorFlow Keras API, used for building and training neural networks.
from tensorflow.keras import layers  # Import layers from Keras, which provide building blocks for creating neural network architectures.
from tensorflow.keras.utils import plot_model  # Import the plot_model utility from Keras to visualize the architecture of neural networks.

### **Build model with residual connection**

In [None]:
inputs = keras.Input(shape=(32, 32, 3), name="input")  # Define the input layer with shape 32x32x3 (e.g., a 32x32 RGB image).

x = layers.Conv2D(32, 3, activation="relu", padding="same", name="C1")(inputs)
# Apply a 2D convolutional layer (C1) with 32 filters of size 3x3.
# Activation function is ReLU; padding is 'same' to keep output size the same as input.

residual = x  # Store the output of the first convolution layer (C1) as the residual for a skip connection.

x = layers.Conv2D(32, 3, activation="relu", padding="same", name="C2a")(x)
# Apply another 2D convolutional layer (C2a) with the same parameters as C1, transforming the feature maps further.

x = layers.add([x, residual])  # Add the output of C2a to the residual connection, creating a shortcut connection (skip connection).
# Both x and residual have the same shape due to the 'same' padding.

x = layers.Conv2D(32, 3)(x)  # Apply a final 2D convolution layer with 32 filters and a 3x3 kernel, without specifying activation or padding.

model1 = keras.Model(inputs=inputs, outputs=x)  # Define the Keras Model, specifying the inputs and the final output layer.

plot_model(model1, show_shapes=True)  # Plot the model architecture with layer shapes displayed.

In [None]:
model1.summary()  # Print a summary of the model's architecture, showing each layer's details, output shapes, and the total number of parameters.

Residual branch may contain 1 layer to make sure addition is possible, i.e. accomodate sizes.

In [None]:
# Build model with a residual connection that includes an additional layer in the residual branch

inputs = keras.Input(shape=(32, 32, 3), name="input")  # Define the input layer with shape 32x32x3 for an RGB image.
x = layers.Conv2D(32, 3, activation="relu", name="C1")(inputs)
# Apply a 2D convolutional layer (C1) with 32 filters and a 3x3 kernel.
# The activation function is ReLU. This is the first layer of the main path.

residual = x  # Store the output of C1 as the residual connection (skip connection) for later addition.

x = layers.Conv2D(64, 3, activation="relu", padding="same", name="C2l1")(x)
# Apply a 2D convolutional layer (C2l1) with 64 filters and a 3x3 kernel, padding set to 'same' to keep dimensions consistent.

x = layers.Conv2D(64, 3, activation="relu", padding="same", name="C2l2")(x)
# Apply another 2D convolutional layer (C2l2) with the same parameters as C2l1, further transforming the feature maps.

residual = layers.Conv2D(64, 1, name="C2b")(residual)
# Apply a 1x1 convolution (C2b) to the residual to match the shape of `x`.
# The 1x1 filter adjusts the channel dimension to 64 without changing the spatial size.

x = layers.add([x, residual])
# Add the transformed residual branch to `x` to create a shortcut connection (skip connection).
# `x` and `residual` now have the same shape due to the adjustments made.

model2 = keras.Model(inputs=inputs, outputs=x)  # Define the model with specified inputs and outputs.

In [None]:
plot_model(model2, show_shapes=True)  # Visualize the model architecture, displaying each layer's shape and connections.

In [None]:
model2.summary()

**Important**: Add layers of **same** shape !

Q: Can we add layers of shapes (30,30,64) and (30,30,32)?

A: No

Q: But what happens if you have a pooling layer in between?

A: Spatial Dimension reduction due to stride

Solution: Use Strides in the  Conv layer in the skip connection.

### Model with residual connections and a max pool layer in between

In [None]:
# Model with residual connections and a max pooling layer in between

inputs = keras.Input(shape=(32, 32, 3))  # Define the input layer with shape 32x32x3 for an RGB image.

x = layers.Conv2D(32, 3, activation="relu")(inputs)
# Apply a 2D convolutional layer with 32 filters and a 3x3 kernel, using ReLU activation.
# This is the first layer of the main path.

residual = x  # Store the output of the initial convolutional layer as the residual for the skip connection.

x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
# Apply another convolutional layer with 64 filters and a 3x3 kernel.
# Padding is set to 'same' to maintain spatial dimensions.

x = layers.MaxPooling2D(2, padding="same")(x)
# Apply a max pooling layer with a 2x2 pool size, reducing the spatial dimensions by half.
# The 'same' padding ensures dimensions remain consistent at the boundary.

residual = layers.Conv2D(64, 1, strides=2)(residual)
# Apply a 1x1 convolution to the residual connection with a stride of 2, matching the downsampling done by max pooling.
# This adjusts both the depth (to 64 channels) and the spatial dimensions of the residual branch.

x = layers.add([x, residual])
# Add the transformed residual connection to the main path, forming a shortcut connection.
# Both `x` and `residual` now have the same shape due to adjustments.

model3 = keras.Model(inputs=inputs, outputs=x)  # Define the model with the specified input and final output layer.

plot_model(model3, show_shapes=True)  # Plot the model architecture, displaying the shapes of each layer.

In [None]:
model3.summary()

With residual connections, you can build networks of arbitrary depth, without having to worry about vanishing gradients. We will see an example later.

**Intuitions on why residual blocks work:**
*   Shorter path for gradients
*   Easy to learn the identity matrix
*   Ensemble of shallow networks

### **Batch Normalization**
*   Adaptively normalize data even as the mean and variance change over time during training
*   During training, it uses the mean and variance of the current batch of data to normalize samples
*    During inference (when a big enough batch of representative data may not be available), it uses an exponential moving average of the batch-wise mean and variance of the data seen during training.


<div>
<img src="https://miro.medium.com/max/1153/1*xQhPvRh08oKFC63swgWr_w.png" width="500"/>
</div>

Now let's try to calculate the no. of params introduced because of batch normalization in following example.

### Model with a batch normalization layer

In [None]:
# Model with a batch normalization layer

inputs = keras.Input(shape=(32, 32, 3))  # Define the input layer with shape 32x32x3 for an RGB image.

# Because the output of the Conv2D layer gets normalized, the layer doesn’t need its own bias vector
x = layers.Conv2D(32, 3, activation="relu", use_bias=False)(inputs)
# Apply a 2D convolutional layer with 32 filters and a 3x3 kernel, using ReLU activation.
# `use_bias=False` since Batch Normalization will handle normalization, removing the need for a bias term in this layer.

x = layers.BatchNormalization()(x)
# Apply batch normalization to standardize the output of the convolutional layer.
# This layer normalizes the activations, improving training stability and model performance.

x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
# Add another 2D convolutional layer with 64 filters and a 3x3 kernel, using ReLU activation.
# Padding is set to 'same' to maintain spatial dimensions.

model4 = keras.Model(inputs=inputs, outputs=x)  # Define the model with specified input and final output layer.

plot_model(model4, show_shapes=True)  # Plot the model architecture, displaying each layer's shape.

Q: How many parameters does the BN layer introduce in the above model?

A: 128


Q: HOW?

A: Batch Normalization layer introduces **four parameters** per channel. However, only **two parameters**, γ and β, are **learnable/trainable parameters** used to apply scaling and shifting to the transformation. The remaining **two parameters**, moving_mean and moving_variance, are **non-trainable** and are directly calculated from the mean across the batch and saved as part of the state of the Batch Normalization layer.

Here, the number of channels in the preceding layer is 32. Hence, the total number of parameters is equal to 32 * 4 = 128, but out of this, only 128/2 = 64 are trainable.

In [None]:
model4.summary()

**Intuitions:**
*  Batch Normalization is also a (weak) regularization method.
    - increases no. of params
    - but also adds noise ~ data augmentation ~ dropout

### **Depthwise separable convolutions**

* This layer performs a **spatial convolution [Depthwise Conv.]** on each channel of
its input, independently, before mixing output channels via a **pointwise convolution**.

* **Depthwise separable convolution = Depthwise Conv. +  Pointwise Conv.**

* This makes your model smaller and  acts as a strong prior. We impose a strong prior by assuming that spatial patterns and cross-channel patterns can be modeled separately.This is equivalent to separating the learning of spatial features and the learning of channel-wise features.
* Depthwise separable convolution relies on the assumption that spatial locations in intermediate activations
are highly correlated, but different channels are highly independent. So we never use depthwise separable convolution after the input layer. Because RGB channels are **highly correlated**.

<figure>
<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/5/56/RGB_channels_separation.png" width="400"/><img src="https://drive.google.com/uc?export=view&id=1e4h4NdbHRCxB1Oe_eoNhQQ6ZbAERf22K" width="500"/> <figcaption>## RGB channels are highly correlated. ##________________## Depthwise separable convolutions ##</figcaption>

</center>
</figure>


Let's quickly look at the code first

In [None]:
# Importing necessary layers from Keras
inputs = keras.Input(shape=(32, 32, 3))  # Define the input shape of the model (32x32 image with 3 channels, e.g., RGB)
x = layers.Conv2D(32, 3, activation="relu")(inputs)  # Applying a regular Conv2D layer with 32 filters, a 3x3 kernel, and ReLU activation function

# Separable convolution: Depthwise and pointwise convolutions
x = layers.SeparableConv2D(64, 3, activation="relu", padding="same")(x)  # Applying a separable convolution with 64 filters, 3x3 kernel, ReLU activation, and 'same' padding

# Defining the model by specifying input and output
sep_model = keras.Model(inputs=inputs, outputs=x)

# Plotting the model to visualize its architecture with layer shapes
plot_model(sep_model, show_shapes=True)

# Printing a summary of the model, which includes the number of parameters for each layer
sep_model.summary()

# Q: Verify the number of parameters in the separable_conv2D layer.
# A: Explanation of the parameter calculation for the SeparableConv2D layer:
#    - Depthwise convolution: (32 filters * 3x3 kernel) = 32 * (3*3) = 288 parameters (for each channel).
#    - Pointwise convolution: (32 input channels * 1x1 kernel * 64 output filters) = 32 * 1 * 1 * 64 = 2048 parameters.
#    - Biases: Each output filter has a bias term, so the number of biases is 64.
#    Total parameters: 288 (depthwise) + 2048 (pointwise) + 64 (biases) = 2400 parameters.
# Note: In Keras' SeparableConv2D, the bias in the depthwise convolution is not included, which is why it's excluded in the first calculation.

# Considering all biases:
#    Total parameters when including the bias term in the depthwise convolution:
#    - Depthwise bias: 32 (for each input channel).
#    - Total parameters: 288 (depthwise weights) + 32 (depthwise biases) + 2048 (pointwise weights) + 64 (pointwise biases) = 2432.

Let's compare the above with a model where we replace the SeparableConv2D with a Conv2D layer.

In [None]:
# Defining the input shape of the model (32x32 image with 3 channels, e.g., RGB)
inputs = keras.Input(shape=(32, 32, 3))

# Applying a regular Conv2D layer with 32 filters, 3x3 kernel, and ReLU activation function
x = layers.Conv2D(32, 3, activation="relu")(inputs)

# Applying another Conv2D layer with 64 filters, 3x3 kernel, ReLU activation, and 'same' padding
x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)

# Defining the model by specifying input and output
model = keras.Model(inputs=inputs, outputs=x)

# Plotting the model to visualize its architecture with layer shapes
plot_model(model, show_shapes=True)

# Printing a summary of the model, which includes the number of parameters for each layer
model.summary()

# Q: Why does sep_model have much fewer parameters than this model?
# A: The SeparableConv2D layer in sep_model performs depthwise and pointwise convolutions independently:
#    - Depthwise convolution applies a filter for each input channel individually (no mixing across channels).
#    - Pointwise convolution applies a 1x1 filter to combine information across all channels.
# In contrast, the Conv2D layers in this model apply 2D convolutions across all input channels at once, resulting in more parameters.
# This makes the SeparableConv2D layer more parameter-efficient compared to standard Conv2D.

### **A mini Xception-like model**

We'll build a model like the Xception model, but a smaller version.

But first let's see what the actual Xception model looks like.

![picture](https://miro.medium.com/max/833/1*t6qfo9ucYza_lbLfg5-p_w.png)

Q: In middle- flow blocks, what arguments do you give to the SepConv layer ?

A: HW question


Let's use the cats-vs-dogs data and create datasets.

In [None]:
# Defining the root directory path where the dataset is stored
data_dir = '/content/cats_vs_dogs_small'  # This is the main directory containing the dataset

# Defining the path for the training data
train_path = data_dir + '/train'  # Path for the training dataset, assuming the directory has subfolders for images

# Defining the path for the validation data
validation_path = data_dir + '/validation'  # Path for the validation dataset

# Defining the path for the test data
test_path = data_dir + '/test'  # Path for the test dataset

In [None]:
# Importing the utility function to create datasets from a directory
from tensorflow.keras.utils import image_dataset_from_directory

# Loading the training dataset from the 'train' directory
train_dataset = image_dataset_from_directory(
               train_path,  # Path to the training data
               image_size=(180, 180),  # Resizing images to 180x180 pixels
               batch_size=32)  # Setting the batch size to 32 for training

# Loading the validation dataset from the 'validation' directory
validation_dataset = image_dataset_from_directory(
                      validation_path,  # Path to the validation data
                      image_size=(180, 180),  # Resizing images to 180x180 pixels
                      batch_size=32)  # Setting the batch size to 32 for validation

# Loading the test dataset from the 'test' directory
test_dataset = image_dataset_from_directory(
                test_path,  # Path to the test data
                image_size=(180, 180),  # Resizing images to 180x180 pixels
                batch_size=32)  # Setting the batch size to 32 for testing

In [None]:
# Importing necessary libraries from Keras
import keras
from keras import layers

# Defining the input layer with shape 180x180 pixels and 3 color channels (RGB)
inputs = keras.Input(shape=(180, 180, 3))

# Normalizing the pixel values to the range [0, 1] by dividing by 255
x = layers.Rescaling(1./255)(inputs)

# Applying a regular Conv2D layer with 32 filters, a 5x5 kernel, and no bias (filter weights only)
x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False)(x)

# Q: Why not use depth-wise separable convolution here?
# A: RGB channels in input images are highly correlated, so regular convolution is preferred for capturing complex patterns between channels in the initial layers.

# Repeated block structure, common in deep learning models to gradually increase feature map complexity
for size in [32, 64, 128, 256, 512]:  # Loop through different filter sizes (32, 64, 128, 256, 512)
    residual = x  # Storing the current output for skip connection (residual block)

    # Applying BatchNormalization to stabilize learning by normalizing activations
    x = layers.BatchNormalization()(x)

    # ReLU activation for non-linearity
    x = layers.Activation("relu")(x)

    # Applying SeparableConv2D layer for more efficient convolutions with reduced parameters
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    # Applying BatchNormalization and ReLU activation again
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    # Another SeparableConv2D layer
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    # MaxPooling operation to reduce the spatial dimensions (down-sampling)
    x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

    # Skip connection (residual) for the current block
    residual = layers.Conv2D(size, 1, strides=2, padding="same", use_bias=False)(residual)

    # Adding the skip connection to the current output
    x = layers.add([x, residual])

# GlobalAveragePooling2D reduces the spatial dimensions to a single value per feature map
x = layers.GlobalAveragePooling2D()(x)

# Dropout for regularization, reducing overfitting by randomly setting a fraction of input units to 0 during training
x = layers.Dropout(0.5)(x)

# Output layer with a sigmoid activation function, suitable for binary classification
outputs = layers.Dense(1, activation="sigmoid")(x)

# Defining the model with the specified inputs and outputs
model = keras.Model(inputs=inputs, outputs=outputs)

In [None]:
plot_model(model)

In [None]:
# Compiling the model with the specified loss function, optimizer, and evaluation metric
model.compile(
    loss="binary_crossentropy",  # Binary Crossentropy loss function, suitable for binary classification tasks
    optimizer="rmsprop",         # RMSprop optimizer, an adaptive learning rate method that works well for CNNs
    metrics=["accuracy"]         # Accuracy as the evaluation metric to track performance during training
)

# Training the model using the training dataset
history = model.fit(
    train_dataset,               # The training dataset to train the model
    epochs=100,                  # Number of epochs (iterations over the entire dataset)
    validation_data=validation_dataset  # The validation dataset to evaluate the model's performance after each epoch
)

In [None]:
# Converting the training history to a Pandas DataFrame for easier manipulation
data = pd.DataFrame(history.history)

# Plotting the training accuracy ('accuracy') over epochs
plt.plot(range(1, len(data) + 1), data['accuracy'], 'bo', label="Training accuracy")  # 'bo' = blue circles for training accuracy
plt.plot(range(1, len(data) + 1), data['val_accuracy'], 'b', label="Validation accuracy")  # 'b' = blue line for validation accuracy
plt.legend()  # Display the legend to label the curves
plt.xlabel("Epochs")  # Label for the x-axis (epochs)
plt.ylabel("Accuracy")  # Label for the y-axis (accuracy)
plt.show()  # Display the accuracy plot

# Creating a new figure for the loss plot
plt.figure()

# Plotting the training loss ('loss') over epochs
plt.plot(range(1, len(data) + 1), data['loss'], 'bo', label="Training loss")  # 'bo' = blue circles for training loss
plt.plot(range(1, len(data) + 1), data['val_loss'], 'b', label="Validation loss")  # 'b' = blue line for validation loss
plt.legend()  # Display the legend to label the curves
plt.xlabel("Epochs")  # Label for the x-axis (epochs)
plt.ylabel("Loss")  # Label for the y-axis (loss)
plt.show()  # Display the loss plot

## 2. Interpreting what ConvNets learn
 * Visualizing intermediate ConvNetsoutputs (intermediate activations)
 * Visualizing ConvNets filters
 * Visualizing heatmaps of class activation in an image

NOTE:
* We will focus mostly on the concepts and key ideas
* A lot of the code is pre-processing and post-processing. We will not spend time on these parts.
* We will see the part of the code that implements the key ideas.

Pro:
*   Developing ideas
*   Developing thought process

### **Visualizing intermediate activations**

The output of a layer is called its 'activation'(It's the output of the activation function).

These activations can be visualized by plotting the feature maps.

![picture](https://drive.google.com/uc?export=view&id=1vT8e59AYTFRlrrI3C-iUHTctxyhfBiJJ)

We will plot each feature map independently as a 2D image, since they encode relatively indepent features.

In [None]:
# Importing the Keras API from TensorFlow for model building and manipulation
from tensorflow import keras

# Importing the plot_model utility from Keras to visualize the model architecture
from tensorflow.keras.utils import plot_model

# Importing numpy for numerical operations, often used with arrays and matrices in machine learning
import numpy as npD:\Dominic\AI_MLOps\Colab

# Importing matplotlib for plotting graphs and visualizations
import matplotlib.pyplot as plt

# Importing the layers module from Keras to build the model by adding layers such as Conv2D, Dense, etc.
from tensorflow.keras import layers  # <----- Note this

**Model to be used for visualization :** We are going to use a trained model from the previous assignment -  M2_AST_01_Convolutional _Neural_Networks. In that assignment, we created one model with augmentation and saved it through a callback function with the name "convnet_from_scratch_with_augmentation_keras". You can download that model from there and use it by providing the proper path after loading it.

For the sake of simplicity, we have already provided that model and has been downloaded along with the dataset.

In [None]:
# Loading a pre-trained model from a file (assuming the model is saved as 'convnet_from_scratch_with_augmentation_keras.keras')
model = keras.models.load_model('convnet_from_scratch_with_augmentation_keras.keras')

# Visualizing the architecture of the loaded model using plot_model
plot_model(model)  # This will display a graphical representation of the model's layers and structure

In [None]:
# Evaluating the model's performance on the test dataset
test_loss, test_acc = model.evaluate(test_dataset)  # Evaluates the model on test data and returns the loss and accuracy

# Printing the test accuracy to display the result
print(f"Test accuracy is:{test_acc:.3f}")  # Prints the test accuracy rounded to 3 decimal places

**Getting the image &  Preprocessing** that will be passsed inside the model for visualization .

In [None]:
# Downloading the image of a cat from the specified URL and saving it as "cat.jpg"
img_path = keras.utils.get_file(fname="cat.jpg", origin="https://img-datasets.s3.amazonaws.com/cat.jpg")

# Function to preprocess the image into an array suitable for input into a model
def get_img_array(img_path, target_size):
    # Loading the image from the path and resizing it to the target size (180x180)
    img = keras.utils.load_img(img_path, target_size=target_size)

    # Converting the loaded image into a numpy array
    array = keras.utils.img_to_array(img)  # Converts image to a 3D numpy array (height, width, channels)

    # Adding an extra dimension to create a batch of one sample
    # This changes the shape from (height, width, channels) to (1, height, width, channels)
    array = np.expand_dims(array, axis=0)  # The shape is now (1, 180, 180, 3)

    # Returning the processed image array
    return array

# Calling the function to preprocess the downloaded image
img_tensor = get_img_array(img_path, target_size=(180, 180))  # Resize the image to (180, 180)

# Displaying the image
plt.axis("off")  # Disable axis to show the image clearly
plt.imshow(img_tensor[0].astype("uint8"))  # Convert the image tensor back to uint8 for display
plt.show()  # Display the image

In [None]:
# Focus here: Instantiate a model that returns the activations of specific layers

# Initialize lists to store the outputs and names of layers that we are interested in
layer_outputs = []
layer_names = []

# Loop through each layer in the original model to identify Conv2D or MaxPooling2D layers
for layer in model.layers:
    # Check if the layer is a Conv2D or MaxPooling2D layer
    if isinstance(layer, (layers.Conv2D, layers.MaxPooling2D)):
        layer_outputs.append(layer.output)  # Add the output of the layer to the layer_outputs list
        layer_names.append(layer.name)  # Add the layer's name to the layer_names list

# Creating a new model that outputs the activations from the selected layers
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)  # Output will be the activations of Conv2D and MaxPooling2D layers

# Visualize the architecture of the new model that returns activations of the selected layers
plot_model(activation_model)

In [None]:
activation_model.summary()

In [None]:
# Compute the activations of the selected layers (Conv2D and MaxPooling2D) for the input image
activations = activation_model.predict(img_tensor)  # Predict the activations for the input image tensor (batch of 1 image)

# Print the number of outputs (activations) returned by the model
print(f"No. of outputs= {len(activations)}")

# Get the activations of the first layer (Conv2D or MaxPooling2D layer) in the model
first_layer_feature_maps = activations[0]  # activations[0] corresponds to the first layer's activations in the list

# Print the shape of the first layer's activations (feature maps)
print(f"first_layer_activation.shape= {first_layer_feature_maps.shape}")

In [None]:
# Visualize the activation of the first feature map from the first layer

import matplotlib.pyplot as plt

# Visualize the first feature map of the first layer's activations using 'viridis' colormap
plt.matshow(first_layer_feature_maps[0, :, :, 0], cmap="viridis")  # Accessing the 1st feature map (index 0 in the last dimension)

# Display the image
plt.show()

It seems that the filter has detected ______.

Let's look at a feature map after each layer.

In [None]:
# Visualize the activations of the 3rd feature map (index 2) for the first 9 layers

import matplotlib.pyplot as plt

# Loop through the first 9 layers' activations and visualize the 3rd feature map (index 2) for each
for i in range(9):
    # Access the 3rd feature map (index 2) of the i-th layer's activations and visualize it
    plt.matshow(activations[i][0, :, :, 2], cmap="viridis")  # '0' selects the first (and only) image in the batch, '2' selects the 3rd feature map
    #plt.show()


Note the dimensions on the above images. Successive feature maps are actually of smaller dimensions but scaled to be the same size during visualization.

Now let's visualise all the feature maps of all the layers.

In [None]:
# Post-processing code - visualization of every channel in every intermediate activation

images_per_row = 16  # Set the number of images to display per row in the grid

# Iterate over each layer's name and activation output
for layer_name, layer_activation in zip(layer_names, activations):
    n_features = layer_activation.shape[-1]  # Number of feature maps (channels) in the current layer's activation
    size = layer_activation.shape[1]  # Height and width of the feature maps
    n_cols = n_features // images_per_row  # Number of columns in the grid based on the number of feature maps

    # Create a blank display grid to place the feature maps
    display_grid = np.zeros(((size + 1) * n_cols - 1, images_per_row * (size + 1) - 1))

    # Loop through each feature map (channel) and place it in the grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_index = col * images_per_row + row  # Calculate the index of the current feature map
            channel_image = layer_activation[0, :, :, channel_index].copy()  # Get the feature map for the current channel

            # Normalize and adjust the feature map for better visualization
            if channel_image.sum() != 0:  # Avoid division by zero
                channel_image -= channel_image.mean()  # Subtract the mean to center the data
                channel_image /= channel_image.std()  # Normalize by the standard deviation
                channel_image *= 64  # Scale the values for better contrast
                channel_image += 128  # Shift values to a visible range
            channel_image = np.clip(channel_image, 0, 255).astype("uint8")  # Clip values to valid pixel range (0-255)

            # Place the processed feature map into the display grid
            display_grid[
                col * (size + 1): (col + 1) * size + col,
                row * (size + 1): (row + 1) * size + row] = channel_image

    # Scale the grid and plot it
    scale = 1. / size  # Calculate scale factor based on feature map size
    plt.figure(figsize=(scale * display_grid.shape[1], scale * display_grid.shape[0]))  # Set figure size based on grid dimensions
    plt.title(layer_name)  # Set title to the layer name
    plt.grid(False)  # Remove grid lines for clarity
    plt.axis("off")  # Remove axis for better visualization
    plt.imshow(display_grid, aspect="auto", cmap="viridis")  # Display the feature maps with the 'viridis' colormap

* The first layer acts as a collection of various edge detectors.

* As you go deeper, the activations become increasingly abstract and less visually interpretable. They begin to encode higher-level concepts such as “cat ear” and “cat eye.”

* The sparsity of the activations increases with the depth of the layer: in the first layer, almost all filters are activated by the input image, but in the following layers, more and more filters are blank. This means the pattern encoded by the filter isn’t found in the input image

### **Visualising ConvNet filters**
*   Pick a filter
*   Ask the question: What kind of an input image will excite the filter?
*   What should the input image be so that you see a (yellow) feature map?
*   In other words, we want to visualize those patterns in the input image that the filter picks up and results in high (yellow) values in the feature map.


In [None]:
# Instantiating the Xception convolutional base

model = keras.applications.xception.Xception( weights="imagenet", include_top=False)

**Key Points**:

1. `keras.applications.xception.Xception`:

* This is a pre-defined model in Keras based on the Xception architecture, which is a deep convolutional neural network (CNN) model. It is designed for image classification tasks.

* Xception stands for "Extreme Inception" and is a more advanced version of the Inception architecture, using depthwise separable convolutions.

2. `weights="imagenet"`:

* This argument loads the pre-trained weights from the ImageNet dataset. ImageNet is a large dataset of labeled images used for training deep learning models for image classification tasks. The pre-trained weights help the model generalize well to new data without having to train it from scratch.

3. `include_top=False`:

* This argument tells Keras not to include the top fully connected layers (also called the "classification head") of the Xception model.
* By setting `include_top=False`, you obtain only the convolutional base of the model, which is typically used for feature extraction. The model can then be fine-tuned or adapted for a new task (e.g., adding custom layers for classification).

In [None]:
# Q: Printing the names of  conv and sepConv layers in Xception

for layer in model.layers:
    if isinstance(layer, (keras.layers.Conv2D, keras.layers.SeparableConv2D)):
        print(layer.name)

In [None]:
# Creating a feature extractor model

# Define the name of the layer to extract features from
layer_name = "block3_sepconv1"  # This is the name of the specific layer in the Xception model.

# Get the actual layer object by its name from the model
layer = model.get_layer(name=layer_name)  # Retrieves the layer object by its name.

# Create a new model that takes the same input as the original model, but outputs the specified layer's output
feature_extractor = keras.Model(inputs=model.input, outputs=layer.output)
# The feature_extractor model will output the activations (features) from the 'block3_sepconv1' layer.

# Display the summary of the feature extractor model
feature_extractor.summary()  # Prints the summary of the new model, showing its layers and the shape of the output.


Q: What is the last layer ?

Q: How many filters does block3_sepconv1 have?

Q: Why are there so many Nones in the shapes?

In [None]:
# Using the feature extractor

activation = feature_extractor(keras.applications.xception.preprocess_input(img_tensor))
# Image is preprocessed specific to Inception model before passing inside the feature_extractor

Here comes the key idea:
*   Define an objective function: mean pixel value of feature map
*   Use gradient "Ascent" on the "input image space" to maximize this objective
Here's an analogy: (drawing)

In [None]:
import tensorflow as tf

# Define a function to compute the loss for a specific filter's activation
def compute_loss(image, filter_index):  # Q: How many indices do we have? A: 256
    # Get the activations from the feature_extractor model for the input image
    activation = feature_extractor(image)

    # Extract the activation corresponding to the selected filter
    # We slice out the boundaries (2 pixels from each side) to avoid edge effects
    filter_activation = activation[:, 2:-2, 2:-2, filter_index]  # Leaving out the boundaries

    # Return the mean activation value for the selected filter
    return tf.reduce_mean(filter_activation)

In [None]:
# Loss maximization via stochastic gradient ascent

@tf.function  # This decorator compiles the function into a graph for optimized execution
def gradient_ascent_step(image, filter_index, learning_rate):
    # Use TensorFlow's GradientTape to record the operations for gradient computation
    with tf.GradientTape() as tape:
        tape.watch(image)  # Watch the image tensor to track its gradients
        loss = compute_loss(image, filter_index)  # Compute the loss based on the filter activation

    # Compute the gradient of the loss with respect to the image
    grads = tape.gradient(loss, image)    # Q: Is the gradient a vector or scalar? A: vector
    grads = tf.math.l2_normalize(grads)    # Normalize the gradient to avoid exploding gradients

    # Update the image in the direction of the gradient (gradient ascent)
    image += learning_rate * grads          # Q: What makes this gradient "ascent"? A: plus sign

    return image

In [None]:
# Function to generate filter visualizations

img_width = 200  # Width of the generated image for filter visualization
img_height = 200  # Height of the generated image for filter visualization

def generate_filter_pattern(filter_index):
    iterations = 30  # Number of iterations to perform gradient ascent
    learning_rate = 10.  # Learning rate for the gradient ascent update
    # Initialize a random image with pixel values between 0.4 and 0.6
    image = tf.random.uniform(
        minval=0.4,  # Lower bound for random initialization
        maxval=0.6,  # Upper bound for random initialization
        shape=(1, img_width, img_height, 3)  # Shape of the image (1 image, 200x200 pixels, 3 channels)
    )

    # Perform gradient ascent for 'iterations' steps
    for i in range(iterations):
        image = gradient_ascent_step(image, filter_index, learning_rate)  # Update the image to maximize filter activation

    return image[0].numpy()  # Return the final image as a NumPy array (drop the batch dimension)

In [None]:
# Utility function to convert a tensor into a valid image

def deprocess_image(image):
    # Subtract the mean of the image to center the pixel values around zero
    image -= image.mean()

    # Normalize the pixel values to have a standard deviation of 1
    image /= image.std()

    # Multiply by 64 to scale the pixel values
    image *= 64

    # Add 128 to shift the pixel values back to a standard range
    image += 128

    # Clip the pixel values to stay within the valid range [0, 255] for display
    image = np.clip(image, 0, 255).astype("uint8")

    # Crop the borders to remove unwanted pixels (usually artifacts from the gradient ascent)
    image = image[25:-25, 25:-25, :]

    return image

# Visualizing the filter pattern generated for filter index 2
plt.axis("off")  # Turn off axis to focus on the image itself
plt.imshow(deprocess_image(generate_filter_pattern(filter_index=2)))  # Generate and display the processed image


In [None]:
# Post-processing - Just visualization
# Generating a grid of all filter response patterns in a layer

# List to hold all processed filter images
all_images = []
for filter_index in range(64):  # Looping through the filters in the layer
    print(f"Processing filter {filter_index}")

    # Generate and deprocess the filter pattern for the current filter index
    image = deprocess_image(
        generate_filter_pattern(filter_index)
    )

    # Append the processed image to the list
    all_images.append(image)

# Defining parameters for the grid layout
margin = 5  # Space between the filter images
n = 8  # Number of filters in each row and column (8x8 grid)
cropped_width = img_width - 25 * 2  # Cropping width of the image to remove boundaries
cropped_height = img_height - 25 * 2  # Cropping height of the image to remove boundaries
width = n * cropped_width + (n - 1) * margin  # Total width of the grid
height = n * cropped_height + (n - 1) * margin  # Total height of the grid

# Initialize an empty image array to stitch the filter patterns together
stitched_filters = np.zeros((width, height, 3))

# Stitches the filter images into a grid layout
for i in range(n):  # Loop over rows
    for j in range(n):  # Loop over columns
        image = all_images[i * n + j]  # Select the filter image
        # Place the image at the appropriate location in the grid
        stitched_filters[
            (cropped_width + margin) * i : (cropped_width + margin) * i + cropped_width,
            (cropped_height + margin) * j : (cropped_height + margin) * j + cropped_height,
            :
        ] = image

# Save the stitched filter grid as an image
keras.utils.save_img(
    f"filters_for_layer_{layer_name}.png", stitched_filters)


In [None]:
# plt.figure(figsize=(40,40))
# plt.matshow(stitched_filters)

for i in [0,8,16,32]:
  plt.figure()
  plt.imshow((all_images[i]))

![picture](https://drive.google.com/uc?export=view&id=1bwf3RIEp9yNTICbf1f5FWg9bX1H5BGNm)

![picture](https://drive.google.com/uc?export=view&id=1VMtNw4qCs4BoN7d9Us8tNEtiKK_J4Csd)

![picture](https://drive.google.com/uc?export=view&id=1eXCejZ3bZP1rBwtLzUQ9RMO9N0tuqDCP)



###  **Visualizing heatmaps of class activation**

* Visualise which parts of a given image led a ConvNet to its final classification decision
* Such techniques are called **class activation map** (CAM) visualisation
* Produce heatmaps of class activation over input images.

Example:

![picture](https://drive.google.com/uc?export=view&id=1z1XY2GoYq_tEXhTXNMNT9qT3Yp2ic9ZR)

First let's understand the idea behind CAM:

(Remember: The CNN is already trained. Now we are just visualising aspects of the trained CNN)




![picture](https://drive.google.com/uc?export=view&id=1-Kn8BsHj1rPr61NGdRy822o1fvQuxW3l)


Note:
*   One set of optimal weights for one class



Now let's look at an improved version of CAM:


*   Grad-CAM



![picture](https://drive.google.com/uc?export=view&id=10QgdjWzPejhmCXLvzoXpf3_ySYx2BVCj)

Key Take-away:


*   Need the weights. Q: What are these weights?
    - make up a weighted sum of featurmaps to get a heat map
*   Can learn them through a new sub-problem- CAM
*   Can compute them directly through gradients- grad-CAM



In [None]:
# Loading the Xception network with pretrained weights

model = keras.applications.xception.Xception(weights="imagenet")

In [None]:
# model.summary()

In [None]:
# Preprocessing an input image for Xception

# Download the image from the specified URL and save it locally to a temporary file
img_path = keras.utils.get_file(fname="cat.jpg",  # The filename to save the image as
                                origin="https://img-datasets.s3.amazonaws.com/cat.jpg")  # URL of the image to download

# Function to load, resize, and preprocess the image for Xception input
def get_img_array(img_path, target_size):
    # Load the image from the given path and resize it to the target size (e.g., 299x299 for Xception)
    img = keras.utils.load_img(img_path, target_size=target_size)

    # Convert the loaded image into a NumPy array with shape (height, width, channels)
    array = keras.utils.img_to_array(img)

    # Add an extra dimension at the start of the array to represent the batch (shape becomes: (1, height, width, channels))
    array = np.expand_dims(array, axis=0)

    # Preprocess the image to match the Xception model's input requirements (normalization, centering, etc.)
    array = keras.applications.xception.preprocess_input(array)  # Preprocess the image for Xception model

    # Return the preprocessed image array ready for model input
    return array

# Call the function with the image path and the target size of (299, 299) as required by Xception
img_array = get_img_array(img_path, target_size=(299, 299))  # Resize the image to 299x299 and preprocess it

In [None]:
# Predicting the top three labels

# Use the model to make a prediction on the preprocessed image (img_array)
preds = model.predict(img_array)  # The model returns a prediction, typically the class probabilities

# Decode the predictions to map the class indices to human-readable labels
# 'decode_predictions' converts the model's output (predicted class probabilities) to actual class labels
# It also returns the top predictions along with their probabilities
print(keras.applications.xception.decode_predictions(preds, top=3)[0])  # Print the top 3 predicted labels and their probabilities

In [None]:
# printing out the top label
np.argmax(preds[0])

In [None]:
# Setting up a model that returns the last convolutional output

# The layer name for the last convolutional layer in the Xception model
last_conv_layer_name = "block14_sepconv2_act"

# Specifying the layers that follow the convolutional layers (for classification)
classifier_layer_names = ["avg_pool", "predictions"]

# Retrieving the last convolutional layer from the model by its name
last_conv_layer = model.get_layer(last_conv_layer_name)

# Creating a new model that takes the original model's input and outputs the last convolutional layer's output
last_conv_layer_model = keras.Model(model.inputs, last_conv_layer.output)

# (Optional) Plot the model architecture up to the last convolutional layer to visualize the structure
# plot_model(last_conv_layer_model)  # Uncomment to visualize the model

In [None]:
# last_conv_layer_model.summary()

In [None]:
# Reapplying the classifier on top of the last convolutional output

# Create an input layer with the shape of the last convolutional layer's output
classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])

# Set the initial input tensor to the classifier_input
x = classifier_input

# Sequentially apply the classifier layers to the input (avg_pool and predictions layers)
for layer_name in classifier_layer_names:
    x = model.get_layer(layer_name)(x)  # Apply each layer from the original model

# Create a new model that takes the classifier_input and outputs the final predictions after applying the classifier layers
classifier_model = keras.Model(classifier_input, x)

In [None]:
# Retrieving the gradients of the top predicted class

import tensorflow as tf

# Use GradientTape to record the operations for automatic differentiation
with tf.GradientTape() as tape:
    # Pass the input image through the last convolutional layer model
    last_conv_layer_output = last_conv_layer_model(img_array)

    # Watch the output of the last convolutional layer to compute the gradient
    tape.watch(last_conv_layer_output)

    # Pass the convolutional layer output through the classifier model to get predictions
    preds = classifier_model(last_conv_layer_output)

    # Get the index of the top predicted class
    top_pred_index = tf.argmax(preds[0])  # Find the index of the highest predicted class

    # Extract the value of the top predicted class
    top_class_channel = preds[:, top_pred_index]

# Calculate the gradient of the top predicted class with respect to the output feature map of the last convolutional layer
grads = tape.gradient(top_class_channel, last_conv_layer_output)  # This computes the gradient w.r.t. feature maps

![picture](https://drive.google.com/uc?export=view&id=10QgdjWzPejhmCXLvzoXpf3_ySYx2BVCj)

In [None]:
# Gradient pooling and channel-importance weighting

# take an average of the gradients across all spatial dimensions (height, width, channels) to get channel importance weights
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2)).numpy()  # Computing the mean of gradients across the spatial dimensions (height, width) and channels

# Convert the last convolutional layer output to a numpy array for further manipulation
last_conv_layer_output = last_conv_layer_output.numpy()[0]  # We take the first (and only) image in the batch

# Multiply each feature map by its corresponding importance weight (pooled gradients)
for i in range(pooled_grads.shape[-1]):  # Loop through each channel (feature map)
    last_conv_layer_output[:, :, i] *= pooled_grads[i]  # Element-wise multiplication of feature map with corresponding weight

# Compute the weighted sum of the feature maps across all channels to generate the heatmap
heatmap = np.mean(last_conv_layer_output, axis=-1)  # Average the weighted feature maps across all channels to get a 2D heatmap

In [None]:
# Heatmap post-processing

# Apply ReLU (Rectified Linear Unit) to the heatmap to remove negative values, as we are only interested in positive contributions
heatmap = np.maximum(heatmap, 0)  # Negative values are replaced with 0, preserving only the positive areas that contributed to the prediction.

# Normalize the heatmap by dividing by its maximum value so that the heatmap values are in the range [0, 1]
heatmap /= np.max(heatmap)  # Scaling the heatmap so that the maximum value becomes 1

# Display the heatmap using matplotlib
plt.matshow(heatmap)  # Visualize the heatmap with a color map
plt.axis('off')  # Hide axis labels for better visualization
plt.show()  # Show the plot

In [None]:
# Superimposing the heatmap on the original image

import matplotlib.cm as cm  # Importing colormap utilities from matplotlib

# Load the original image from the file path
img = keras.utils.load_img(img_path)  # Load image from the given path
img = keras.utils.img_to_array(img)  # Convert the image into a NumPy array for processing

# Convert the heatmap values to the range [0, 255] for visualization
heatmap = np.uint8(255 * heatmap)  # Scale heatmap values from [0, 1] to [0, 255]

# Get the 'jet' colormap (a gradient from blue to red) to color the heatmap
jet = cm.get_cmap("jet")  # Get the 'jet' colormap from matplotlib
jet_colors = jet(np.arange(256))[:, :3]  # Get RGB values (without alpha channel)
jet_heatmap = jet_colors[heatmap]  # Map the heatmap values to the 'jet' colormap

# Convert the heatmap to an image
jet_heatmap = keras.utils.array_to_img(jet_heatmap)  # Convert array to a PIL image
jet_heatmap = jet_heatmap.resize((img.shape[1], img.shape[0]))  # Resize heatmap to match the original image dimensions
jet_heatmap = keras.utils.img_to_array(jet_heatmap)  # Convert back to a NumPy array

# Superimpose the heatmap on the original image with transparency
superimposed_img = jet_heatmap * 0.4 + img  # Blend the heatmap and original image (0.4 transparency)
superimposed_img = keras.utils.array_to_img(superimposed_img)  # Convert back to a PIL image

# Save the resulting superimposed image to a file
save_path = "cat.jpg"  # Path to save the superimposed image
superimposed_img.save(save_path)  # Save the superimposed image to disk

In [None]:
plt.matshow(superimposed_img)

Let's see more gradCAM results

Some good readings:
* [blog](https://towardsdatascience.com/understand-your-algorithm-with-grad-cam-d3b62fce353#:~:text=Gradient%2Dweighted%20Class%20Activation%20Mapping,regions%20in%20the%20image%20for)

## Self-Practice Problem:
Solve an image classification problem on the cats-vs-dogs dataset by training 'mini-Exception-like' model based on the instructions given below:


1.  Set the global random seed to 42.
2.  We are using a **cat-vs-dogs** dataset here. You will have to download it using instruction.

  Download the data through the following command in your notebook

`!wget -qq https://cdn.iisc.talentsprint.com/AIandMLOps/Datasets/cats_vs_dogs_small.zip`

`!unzip -qq '/content/cats_vs_dogs_small.zip'`

  Use the  image_dataset_from_directory utility from tensorflow.keras.utils to make appropriate datasets. (0)
3.  Building the Model based on this [model_summary](https://indianinstituteofscience-my.sharepoint.com/:t:/g/personal/rohitc1_iisc_ac_in/EZ36t8eQFu9MrBnPjwueKcABD-2_8AZyDpyJ3vJEqUqlLQ?e=njNCGf) and its corresponding [model_plot](https://indianinstituteofscience-my.sharepoint.com/:i:/g/personal/rohitc1_iisc_ac_in/EU2WCnpqi8BEtfzltqI2vc4B5OFx53lMwn2tv6gqMebTig?e=lSsbz6). Ensure that you follow the trailing instructions:(16)

      i).   For the initial layers of model mentioned in this [summary](https://indianinstituteofscience-my.sharepoint.com/:i:/g/personal/ksumanth_iisc_ac_in/EYDi7MfYgkRGvb-NqKKuGiABon9CyOUMEiffHac1sXyEsg?e=gYNdVB), random flip (horizontal), random rotation of 0.1, random zoom of 0.2, rescaling by 1./255, and set kernel_size = 5 and use_bias=False in the convolution layer.

      ii).  Define a block of following layers:

          *   Batch Normalization layer
          *   Activation layer with relu as activation function
          *   Depth wise separable layer (kernel size = 3)
          *   Batch Normalization layer
          *   Activation layer with relu as activation function
          *   Depth wise separable layer (kernel size = 5)
          *   Batch Normalization layer
          *   Activation layer with relu as activation function
          *   Depth wise separable layer (kernel size = 7)
          *   Maxpool2D layer (poolsize =3, stride=2)
          *   Convolution layer
          *   'Add layer' due to a residual connection. Infer connection points of the skip connection from the model summary and model plot.

          **Infer unspecified arguments from the summary**

      iii). The block defined in (ii) repeats 4 times. Note that in each repitition, the number of filters changes. Infer this from the model plot/summary.

      iv).  The last two layers are GlobalAveragePooling and Dense layers. The dense layer is the output layer (infer the number of neurons and the activation function).
4. Compile model with rmsprop as an optimizer with appropriate loss and metric for this respective problem.
5. Fit the model with a batch_size of 32 for 20 epochs. (Don't use EarlyStopping callback). Use the validation dataset from the data you downloaded. We have specified a small no. of epochs because training may take time. Try running colab on GPU by going to Edit > Notebook accelerator > Hardware Accelerator > GPU.
6. Return the history as a DataFrame. Show loss and accuracy for training and validation through appropriate plots.
7. Evaluate the Model on test dataset from the data you downloaded.

**Note**:

1. If you are using any parameter values or arguments apart from the ones mentioned or the ones that you must infer, state explicitly where and why you are using them.

2. Also verify that the total no. of params of your model are the same as that mentioned in model_summary txt file given to you


### Please answer the questions below to complete the experiment:




In [None]:
#@title  Depth-wise separable convolution layer having 2 filters with size 3X3 is applied on an image of size 7X7X3. What is the count of trainable parameters in this layer? Consider all biases.{run: "auto", form-width: "500px", display-mode: "form" }
Answer = "" #@param ["", "38", "33", "56", "60"]

In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "" #@param ["","Yes", "No"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

In [None]:
def calculate_conv_output(n, p, f, s):
    """
    Calculate the output shape of a convolution operation.

    Parameters:
    n (int): Size of the input (e.g., width or height of the input image).
    p (int): Padding applied to the input.
    f (int): Size of the convolutional filter (kernel size).
    s (int): Stride of the convolution.

    Returns:
    int: Size of the output (width or height after the convolution).
    """
    return ((n + 2 * p - f) // s) + 1

# Example usage
n = 6  # Size of the input
p = 0  # Padding
f = 3  # Filter size
s = 1  # Stride

output_size = calculate_conv_output(n, p, f, s)
print(f"The output size after the convolution is: {output_size}")


The output size after the convolution is: 4


In [None]:
def calculate_conv_params(f, n_c_prev, n_f):
    """
    Calculate the number of parameters in a convolutional layer.

    Parameters:
    f (int): Size of the filter (kernel size, e.g., 3 for a 3x3 filter).
    n_c_prev (int): Number of channels in the previous layer.
    n_f (int): Number of filters in the current layer.

    Returns:
    int: Total number of parameters in the convolutional layer.
    """
    # The formula: (f^2 * n_c_prev + 1) * n_f
    return (f * f * n_c_prev + 1) * n_f

# Example usage
f = 3       # Filter size (e.g., 3x3 filter)
n_c_prev = 3  # Number of channels in the previous layer
n_f = 2       # Number of filters in the current layer

total_params = calculate_conv_params(f, n_c_prev, n_f)
print(f"The total number of parameters in the convolutional layer is: {total_params}")


The total number of parameters in the convolutional layer is: 56


In [None]:
f= 3
n_c_prev = 1
n_f = 32
total_params = calculate_conv_params(f, n_c_prev, n_f)
print(f"The total number of parameters in the convolutional layer is: {total_params}")


The total number of parameters in the convolutional layer is: 320


In [None]:
# import libraries
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
# Define the input layer with a shape of 28x28 pixels and 1 color channel (grayscale image)
inputs = keras.Input(shape=(39, 39, 3))

# First convolutional layer
# - 32 filters, each of size 3x3
# - ReLU activation function for non-linearity
x = layers.Conv2D(filters=10, kernel_size=3, activation="relu")(inputs)

# Second convolutional layer
# - 64 filters, each of size 3x3
# - ReLU activation function
x = layers.Conv2D(filters=20, kernel_size=5, activation="relu", strides=2)(x)

# Third convolutional layer
# - 128 filters, each of size 3x3
# - ReLU activation function
x = layers.Conv2D(filters=40, kernel_size=5, activation="relu", strides=2)(x)

# Second pooling layer
# - Max pooling with a 2x2 pool size, further reducing spatial dimensions by half (from 10x10 to 5x5)
#x = layers.MaxPooling2D(pool_size=2)(x)

# Flatten the output of the last convolutional layer
# This converts the 3D feature map into a 1D vector to be fed into the dense layer
x = layers.Flatten()(x)

# Output layer
# - 10 units for 10 classes (e.g., digits 0–9 in a digit classification task)
# - Softmax activation function for multi-class classification
outputs = layers.Dense(10, activation="softmax")(x)

# Create the model
model_no_max_pool = keras.Model(inputs=inputs, outputs=outputs)

In [None]:
# Print a summary of the model's architecture
model_no_max_pool.summary()