<a href="https://colab.research.google.com/github/Sanjay030303/Full-Stack-Data-Science-2023/blob/main/DL_ASSIGNMENT_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. **What are the advantages of a CNN over a fully connected DNN for image classification?**
   - CNNs use convolutional layers to capture spatial hierarchies in images, reducing parameters and computation. This makes them more efficient and better at recognizing patterns in images than fully connected DNNs.

2. **Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and "same" padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels. What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance? What about when training on a mini-batch of 50 images?**
   - Total number of parameters: \[ \text{Layer 1: } (3 \times 3 \times 3 \times 100) + 100 = 2800 \]
\[ \text{Layer 2: } (3 \times 3 \times 100 \times 200) + 200 = 180200 \]
\[ \text{Layer 3: } (3 \times 3 \times 200 \times 400) + 400 = 720400 \]
\[ \text{Total: } 2800 + 180200 + 720400 = 903400 \]

   - RAM for a single instance:
\[ \text{Feature maps: } (200 \times 300 \times 3) + (100 \times 100 \times 100) + (50 \times 50 \times 200) + (25 \times 25 \times 400) = 180000 + 1000000 + 500000 + 250000 = 1930000 \text{ elements} \]
\[ \text{Total bytes: } 1930000 \times 4 = 7720000 \text{ bytes} \approx 7.72 \text{ MB} \]

   - RAM for mini-batch of 50 images:
\[ 7.72 \text{ MB} \times 50 = 386 \text{ MB} \]

3. **If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?**
   - Reduce the batch size.
   - Use mixed precision training.
   - Reduce the size or number of layers.
   - Use gradient checkpointing.
   - Offload some computations to the CPU.

4. **Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?**
   - Max pooling layers reduce the spatial dimensions, decrease the number of parameters, and help make the network invariant to small translations in the input.

5. **When would you want to add a local response normalization layer?**
   - Local response normalization is useful when you want to encourage competition for large activations in a region, which can help in cases like image recognition to enhance feature maps with high activations.

6. **Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet, and Xception?**
   - **AlexNet:** Introduced ReLU activation, dropout, data augmentation, and used GPUs for training.
   - **GoogLeNet:** Introduced the Inception module, which uses multiple filter sizes at each layer.
   - **ResNet:** Introduced residual connections to solve the vanishing gradient problem.
   - **SENet:** Introduced squeeze-and-excitation blocks to recalibrate channel-wise feature responses.
   - **Xception:** Used depthwise separable convolutions to improve efficiency.

7. **What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?**
   - A fully convolutional network (FCN) uses only convolutional layers, including for the final output. Convert a dense layer into a convolutional layer by using a \(1 \times 1\) convolution with the same number of filters as the number of units in the dense layer.

8. **What is the main technical difficulty of semantic segmentation?**
   - The main difficulty is achieving precise pixel-level classification, as it requires detailed spatial information and balancing accuracy with computational efficiency.

9. **Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.**
   ```python
   import tensorflow as tf
   from tensorflow.keras import layers, models

   # Load and preprocess MNIST data
   (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
   train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
   test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255

   # Build the CNN model
   model = models.Sequential([
       layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
       layers.MaxPooling2D((2, 2)),
       layers.Conv2D(64, (3, 3), activation='relu'),
       layers.MaxPooling2D((2, 2)),
       layers.Conv2D(64, (3, 3), activation='relu'),
       layers.Flatten(),
       layers.Dense(64, activation='relu'),
       layers.Dense(10, activation='softmax')
   ])

   # Compile and train the model
   model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
   model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_split=0.1)

   # Evaluate the model
   test_loss, test_acc = model.evaluate(test_images, test_labels)
   print(f'Test accuracy: {test_acc}')
   ```

10. **Use transfer learning for large image classification, going through these steps:**
    - **a. Create a training set containing at least 100 images per class.**
      - You can use TensorFlow Datasets like `tfds.load('cats_vs_dogs', split='train[:10%]')` for a small sample.
    - **b. Split it into a training set, a validation set, and a test set.**
      - Use `tf.data.Dataset` to split the data, e.g., `train_size = 0.8 * dataset_size`.
    - **c. Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation.**
      - ```python
        def preprocess(image, label):
            image = tf.image.resize(image, [224, 224])
            image = image / 255.0
            return image, label

        train_dataset = train_dataset.map(preprocess).batch(32).shuffle(buffer_size=1000)
        val_dataset = val_dataset.map(preprocess).batch(32)
        test_dataset = test_dataset.map(preprocess).batch(32)
        ```
    - **d. Fine-tune a pretrained model on this dataset.**
      - ```python
        base_model = tf.keras.applications.MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')
        base_model.trainable = False

        model = models.Sequential([
            base_model,
            layers.GlobalAveragePooling2D(),
            layers.Dense(1, activation='sigmoid')
        ])

        model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
        model.fit(train_dataset, validation_data=val_dataset, epochs=10)

        base_model.trainable = True
        model.compile(optimizer=tf.keras.optimizers.Adam(1e-5), loss='binary_crossentropy', metrics=['accuracy'])
        model.fit(train_dataset, validation_data=val_dataset, epochs=10)

        test_loss, test_acc = model.evaluate(test_dataset)
        print(f'Test accuracy: {test_acc}')
        ```