### Disadvantages of Convolutional Neural Networks (CNNs):
* Computationally expensive to train and require a lot of memory.
* Can be prone to overfitting if not enough data or proper regularization is used.
* Requires large amounts of labeled data.
* Interpretability is limited, it’s hard to understand what the network has learned.

### Different CNN arcitectures

1. LeNet (1998) - 6 Level conv n/w
2. AlexNet (2012)
3. ZFNet (2013)
4. GoogleNet/Inception (2014)
5. VGGNet (2014)
6. Resnet (2015)
7. DenseNet

In [28]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, MaxPooling2D
 
# Load the digits dataset
digits = load_digits()
X, y = digits.images, digits.target

In [30]:
# Preprocessing
# Normalizing pixel values
X = X / 16.0
 
# Reshaping the data to fit the model
# CNN in Keras requires an extra dimension at the end for channels,
# and the digits images are grayscale so it's just 1 channel
X = X.reshape(-1, 8, 8, 1)

# Convert labels to categorical (one-hot encoding)
y = to_categorical(y)

X.shape, y.shape

((1797, 8, 8, 1), (1797, 10))

In [32]:
# Splitting into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, y_train.shape, X_test.shape

((1437, 8, 8, 1), (1437, 10), (360, 8, 8, 1))

In [33]:
# Building a simple CNN model
model = Sequential()
model.add(Conv2D(64, kernel_size=3, activation='relu', input_shape=(8, 8, 1))) 
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dense(10, activation='softmax'))  # 10 classes for digits 0-9


In [14]:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn

In [34]:
# Compiling the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
 
# Training the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10)
 
# Evaluate the model
accuracy = model.evaluate(X_test, y_test)[1]


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [46]:
np.argmax(y_test[0])

6

In [44]:
np.argmax(model.predict(X_test)[0])



0.8526012

## CNN using pytorch

In [None]:
class Net(nn.Module):   
    def __init__(self):
        super(Net, self).__init__()

        self.cnn_layers = nn.Sequential(
            # Defining a 2D convolution layer
            nn.Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(4),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Defining another 2D convolution layer
            nn.Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(4),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.linear_layers = nn.Sequential(
            nn.Linear(4 * 7 * 7, 10)
        )

    # Defining the forward pass    
    def forward(self, x):
        x = self.cnn_layers(x)
        x = x.view(x.size(0), -1)
        x = self.linear_layers(x)
        return x

In [None]:
# defining the model
model = Net()
# defining the optimizer
optimizer = nn.Adam(model.parameters(), lr=0.07)
# defining the loss function
criterion = nn.CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
    model = model.cuda()
    criterion = criterion.cuda()
    
print(model)

In [None]:
# defining the number of epochs
n_epochs = 25
# empty list to store training losses
train_losses = []
# empty list to store validation losses
val_losses = []

def train(epoch):
    model.train()
    tr_loss = 0
    # getting the training set
    x_train, y_train = nn.Variable(train_x), nn.Variable(train_y)
    # getting the validation set
    x_val, y_val = nn.Variable(val_x), nn.Variable(val_y)
    # converting the data into GPU format
    if torch.cuda.is_available():
        x_train = x_train.cuda()
        y_train = y_train.cuda()
        x_val = x_val.cuda()
        y_val = y_val.cuda()

    # clearing the Gradients of the model parameters
    optimizer.zero_grad()
    
    # prediction for training and validation set
    output_train = model(x_train)
    output_val = model(x_val)

    # computing the training and validation loss
    loss_train = criterion(output_train, y_train)
    loss_val = criterion(output_val, y_val)
    train_losses.append(loss_train)
    val_losses.append(loss_val)

    # computing the updated weights of all the model parameters
    loss_train.backward()
    optimizer.step()
    tr_loss = loss_train.item()
    if epoch%2 == 0:
        # printing the validation loss
        print('Epoch : ',epoch+1, '\t', 'loss :', loss_val)

In [None]:

# training the model
for epoch in range(n_epochs):
    train(epoch)

In [None]:

# plotting the training and validation loss
plt.plot(train_losses, label='Training loss')
plt.plot(val_losses, label='Validation loss')
plt.legend()
plt.show()

In [None]:

# prediction for training set
with torch.no_grad():
    output = model(train_x.cuda())
    
softmax = torch.exp(output).cpu()
prob = list(softmax.numpy())
predictions = np.argmax(prob, axis=1)

# accuracy on training set
accuracy_score(train_y, predictions)

### VGG Architecture (Visual Geometric group)

* It is very slow to train (the original VGG model was trained on Nvidia Titan GPU for 2–3 weeks).
* The size of VGG-16 trained imageNet weights is 528 MB. So, it takes quite a lot of disk space and bandwidth which makes it inefficient.
* 138 million parameters lead to exploding gradients problem.

### ResNet

* solve the problem of the vanishing/exploding gradient,
* The skip connection connects activations of a  layer to further layers by skipping some layers in between. 
* advantage of adding this type of skip connection is that if any layer hurt the performance of architecture then it will be skipped by regularization.
* Pool added only at the start and end of the Resnet


## DenseNet

* the output of each layer is connected to the input of every subsequent layers to help with vanishing or exploding gradient . 
* All convolutions in a dense block are ReLU-activated and use batch normalization.