In [None]:
Advantages of CNN over Fully Connected DNN for Image Classification:

Local Connectivity: CNNs leverage local connectivity, focusing on small regions of the input at a time. This allows them to capture local features effectively, which is crucial for image analysis.
Parameter Sharing: CNNs use shared weights (kernels) across different parts of the image. This reduces the number of parameters, making them more efficient and capable of learning translation-invariant features.
Hierarchical Feature Extraction: CNNs learn hierarchical features, starting with simple features like edges and gradually combining them to represent more complex patterns and objects.
Spatial Hierarchies: CNNs maintain spatial hierarchies, which means they preserve the spatial relationships between pixels in the feature maps.
Effective with Large Inputs: CNNs can handle large input images efficiently by using sparse connections and shared weights.
Total Number of Parameters and RAM Usage:

Each 3x3 kernel in a convolutional layer has 9 weights (parameters).
The lowest layer has 100 feature maps, the middle layer has 200, and the top layer has 400.
Total parameters = (100 + 200 + 400) * 9 = 6300 parameters.
For a single instance prediction with 32-bit floats, the RAM usage for the model is approximately 24 KB (6300 parameters * 4 bytes/parameter).
For training on a mini-batch of 50 images, you'd need approximately 1.2 MB of RAM (24 KB * 50 instances).
Solutions for GPU Out of Memory:

Reduce Batch Size: Use smaller mini-batches during training.
Reduce Model Complexity: Decrease the number of layers, neurons, or parameters.
Lower Resolution: Resize input images to a smaller resolution.
Use Mixed Precision: Use lower-precision floating-point formats (e.g., float16) for model weights.
Gradient Accumulation: Accumulate gradients over multiple mini-batches before updating weights.
Max Pooling vs. Convolutional Layer with Same Stride:

Max pooling layers downsample feature maps by selecting the maximum value in a pooling window.
A convolutional layer with the same stride can downsample as well, but it doesn't have the invariance property that max pooling offers.
Max pooling introduces translation invariance by selecting the maximum value, making it robust to small translations of features. A convolutional layer with stride doesn't inherently have this property.
Local Response Normalization Layer:

Local Response Normalization (LRN) layers are used to introduce lateral inhibition among neurons in the same feature map. They normalize the responses in a local neighborhood, promoting competition between neighboring neurons.
LRN is used to enhance contrast between local features and improve the network's ability to generalize to different lighting conditions and patterns.
It was used in some older architectures but has been largely replaced by batch normalization in modern networks.
Innovations in AlexNet, GoogLeNet, ResNet, SENet, and Xception:

AlexNet: Introduced the concept of deep convolutional neural networks for image classification and won the ImageNet competition in 2012. It featured techniques like ReLU activation, dropout, and data augmentation.
GoogLeNet (Inception): Introduced the Inception architecture with inception modules, which allow multiple filter sizes and paths in parallel. It aims to capture features at multiple scales effectively.
ResNet (Residual Network): Introduced residual connections, which help in training very deep networks by allowing gradients to flow directly through shortcuts. This enabled training of networks with hundreds of layers.
SENet (Squeeze-and-Excitation Network): Introduced attention mechanisms at the channel level, allowing the network to focus on informative channels and suppress less useful ones, improving feature representation.
Xception: Introduced depth-wise separable convolutions, which separate spatial and channel-wise convolutions. This architecture is computationally efficient and achieves high accuracy with fewer parameters.
Fully Convolutional Network (FCN):

A fully convolutional network is a neural network architecture designed for dense pixel-wise prediction tasks like semantic segmentation.
You can convert a dense (fully connected) layer into a convolutional layer by making its kernel size equal to the spatial dimensions of the input feature map. This allows the dense layer to be applied at all spatial locations, effectively making it a convolutional operation.
Main Difficulty of Semantic Segmentation:

The main difficulty in semantic segmentation is maintaining fine-grained spatial details while learning high-level semantic information.
It requires the network to recognize objects, understand their boundaries, and segment them pixel-wise in the input image.
Combining local and global context information effectively without losing spatial details is challenging. Many modern architectures use skip connections, dilated convolutions, and attention mechanisms to address this issu

In [None]:
Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Build the CNN model
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print("Test accuracy:", test_acc)


In [None]:
10. Use transfer learning for large image classification, going through these steps:
a. Create a training set containing at least 100 images per class. For example, you could
classify your own pictures based on the location (beach, mountain, city, etc.), or
alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).
b. Split it into a training set, a validation set, and a test set.
c. Build the input pipeline, including the appropriate preprocessing operations, and
optionally add data augmentation.
d. Fine-tune a pretrained model on this dataset.

import tensorflow as tf

# Define data paths
train_data_dir = r'C:\Users\Naveen\Documents\train\cnn'
validation_data_dir = r'C:\Users\Naveen\Documents\val\cnn'
test_data_dir = r'C:\Users\Naveen\Documents\test\cnn'

# Define image dimensions
img_height, img_width = 224, 224
batch_size = 32

# Data augmentation
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.experimental.preprocessing.Rescaling(1./255),
    tf.keras.layers.experimental.preprocessing.RandomRotation(0.2),
    tf.keras.layers.experimental.preprocessing.RandomZoom(0.2),
])

# Create data generators
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    preprocessing_function=data_augmentation,
    validation_split=0.2  # Split a portion for validation
)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    subset='training'  # Use the training split
)

validation_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    subset='validation'  # Use the validation split
)

test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    preprocessing_function=data_augmentation
)

test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
)


from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import Sequential, layers

# Load the pre-trained MobileNetV2 model (excluding top classification layer)
base_model = MobileNetV2(input_shape=(img_height, img_width, 3),
                         include_top=False,
                         weights='imagenet')

# Freeze the base layers
for layer in base_model.layers:
    layer.trainable = False

# Add a new classification head
model = Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(1024, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation='softmax')  # num_classes is the number of your classes
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(train_generator,
                    epochs=10,  # Adjust as needed
                    validation_data=validation_generator)
