In the previous lesson, we learned how to use Keras API to build deep neural networks for MINST dataset which has gray-scale images of handwritten single digits between 0 and 9. In this lesson, we will use CIFAR-10 dataset to build a CNN image classifier. CIFAR-10 dataset contains 50,000 training data and 10,000 testing image data with 10 different labels, the image size in CIFAR-10 is 32 × 32 × 3. This dataset comes with Keras library. We will train a Convolutional Neural Network to predict the class of each image. The Python code for the task is as follows: First, we need to import all the necessary modules and libraries.

In [6]:
# from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import keras
from keras.models import Sequential
from keras.utils import to_categorical
from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten

In [7]:
from keras.datasets import cifar10

(train_images, train_labels),(test_images, test_labels) = cifar10.load_data()
                 
###### print training sample shape, test sample shape and total number of classes present in CIFAR-10.

print('Training data shape : ', train_images.shape, train_labels.shape)

print('Testing data shape : ', test_images.shape, test_labels.shape)

# Find the unique numbers from the train labels
classes = np.unique(train_labels)
nClasses = len(classes)
print('Total number of outputs : ', nClasses)
print('Output classes : ', classes)

Training data shape :  (50000, 32, 32, 3) (50000, 1)
Testing data shape :  (10000, 32, 32, 3) (10000, 1)
Total number of outputs :  10
Output classes :  [0 1 2 3 4 5 6 7 8 9]


In [10]:
###### reshape input image into a format for training and testing sets
nRows,nCols,nDims = train_images.shape[1:]
train_data = train_images.reshape(train_images.shape[0], nRows, nCols, nDims)
test_data = test_images.reshape(test_images.shape[0], nRows, nCols, nDims)
input_shape = (nRows, nCols, nDims)

train_data = train_data.astype('float32')
test_data = test_data.astype('float32')

###### normalize the data   
train_data /= 255
test_data /= 255
###### convert all labels into one-hot vector 
train_labels_one_hot = to_categorical(train_labels)
test_labels_one_hot = to_categorical(test_labels)

print('Original label 0 : ', train_labels[0])
print(' One-hot encoding : ', train_labels_one_hot[0])

Original label 0 :  [6]
 One-hot encoding :  [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]


Now we are going to create our CNN model by following layers pattern shown in figure 9:

Layer Pattern of ConvNet Architecture [(conv) x 2 , pooling] x 3, Fc, Fc)

In [11]:
def create_model():
    model = Sequential()
    # 32 filters of window size 3x3 for the first two layers
    model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=input_shape))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(nClasses, activation='softmax'))
    
    return model

model1 = create_model()
model1.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model1.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
# fit the model on training data
batch_size = 256
epochs = 50
history = model1.fit(train_data, train_labels_one_hot, batch_size=batch_size, epochs=epochs, verbose=1, 
                   validation_data=(test_data, test_labels_one_hot))

# evaluate the model on testing data
model1.evaluate(test_data, test_labels_one_hot)

NameError: name 'model1' is not defined

In [34]:
from tensorflow.keras import layers, models, optimizers

# Define a lightweight residual block
def small_residual_block(x, filters, downsample=False):
    stride = 2 if downsample else 1

    # Shortcut connection
    shortcut = x
    x = layers.Conv2D(filters, (3, 3), strides=stride, padding='same', use_bias=False)(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    x = layers.Conv2D(filters, (3, 3), strides=1, padding='same', use_bias=False)(x)
    x = layers.BatchNormalization()(x)

    # Downsample shortcut if needed
    if downsample:
        shortcut = layers.Conv2D(filters, (1, 1), strides=stride, padding='same', use_bias=False)(shortcut)
        shortcut = layers.BatchNormalization()(shortcut)

    x = layers.add([x, shortcut])
    x = layers.ReLU()(x)
    return x

# Define a smaller ResNet for CIFAR-10
def build_lightweight_resnet(input_shape=(32, 32, 3), num_classes=10):
    inputs = layers.Input(shape=input_shape)
    x = layers.Conv2D(32, (3, 3), strides=1, padding='same', use_bias=False)(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    # Create a simplified ResNet-like architecture with fewer filters and layers
    x = small_residual_block(x, 32)
    x = small_residual_block(x, 32)
    x = small_residual_block(x, 64, downsample=True)
    x = small_residual_block(x, 64)
    x = small_residual_block(x, 128, downsample=True)
    x = small_residual_block(x, 128)

    # Classification head
    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    model = models.Model(inputs, outputs)
    return model

model2 = build_lightweight_resnet()
model2.compile(optimizer=optimizers.Adam(learning_rate=0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model2.summary()

In [35]:
# fit the model on training data
batch_size = 64
epochs = 20
history = model2.fit(
    train_data, 
    train_labels_one_hot, 
    batch_size=batch_size, 
    epochs=epochs, 
    verbose=1, 
    validation_data=(test_data, test_labels_one_hot))

# evaluate the model on testing data
model2.evaluate(test_data, test_labels_one_hot)

Epoch 1/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 181ms/step - accuracy: 0.4571 - loss: 1.4984 - val_accuracy: 0.3725 - val_loss: 2.3932
Epoch 2/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m151s[0m 192ms/step - accuracy: 0.7060 - loss: 0.8246 - val_accuracy: 0.6063 - val_loss: 1.2428
Epoch 3/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 184ms/step - accuracy: 0.7840 - loss: 0.6235 - val_accuracy: 0.5540 - val_loss: 1.6407
Epoch 4/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 184ms/step - accuracy: 0.8300 - loss: 0.4919 - val_accuracy: 0.7017 - val_loss: 0.8876
Epoch 5/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m145s[0m 185ms/step - accuracy: 0.8605 - loss: 0.4057 - val_accuracy: 0.7434 - val_loss: 0.7516
Epoch 6/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m146s[0m 187ms/step - accuracy: 0.8870 - loss: 0.3265 - val_accuracy: 0.7361 - val_loss: 0.8900
Epoc

[1.5494877099990845, 0.7391999959945679]

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torchvision.models import mobilenet_v2

# Check if MPS is available and use it
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

print(f'Using {device} device')

# Transformations for CIFAR-10
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

# Load CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

# Modify MobileNetV2 for CIFAR-10
model = mobilenet_v2(weights=None)  # Start with an untrained MobileNetV2
model.classifier[1] = nn.Linear(model.last_channel, 10)  # Adjust final layer for CIFAR-10 (10 classes)
model = model.to(device)

# Define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function
def train(model, loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    for inputs, labels in loader:
        inputs, labels = inputs.to(device), labels.to(device)

        # Forward pass
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward and optimize
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    return running_loss / len(loader)

# Testing function
def test(model, loader, criterion, device):
    model.eval()
    correct = 0
    total = 0
    test_loss = 0.0
    with torch.no_grad():
        for inputs, labels in loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            test_loss += loss.item()

            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    accuracy = 100 * correct / total
    return test_loss / len(loader), accuracy

# Train and evaluate the model
num_epochs = 20
for epoch in range(num_epochs):
    train_loss = train(model, trainloader, criterion, optimizer, device)
    test_loss, test_accuracy = test(model, testloader, criterion, device)
    print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss: {train_loss:.4f}, Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%')


1.4%

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100.0%


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
Epoch [1/20], Train Loss: 2.0282, Test Loss: 1.8024, Test Accuracy: 33.21%
Epoch [2/20], Train Loss: 1.7015, Test Loss: 1.5760, Test Accuracy: 41.68%
Epoch [3/20], Train Loss: 1.5527, Test Loss: 1.4802, Test Accuracy: 45.84%
Epoch [4/20], Train Loss: 1.4581, Test Loss: 1.3958, Test Accuracy: 49.25%
Epoch [5/20], Train Loss: 1.3577, Test Loss: 1.2703, Test Accuracy: 54.03%
Epoch [6/20], Train Loss: 1.2711, Test Loss: 1.1938, Test Accuracy: 57.68%
Epoch [7/20], Train Loss: 1.1959, Test Loss: 1.1374, Test Accuracy: 60.14%
Epoch [8/20], Train Loss: 1.1309, Test Loss: 1.0992, Test Accuracy: 60.50%
Epoch [9/20], Train Loss: 1.0761, Test Loss: 1.0786, Test Accuracy: 60.98%
Epoch [10/20], Train Loss: 1.0284, Test Loss: 1.0264, Test Accuracy: 63.66%
Epoch [11/20], Train Loss: 0.9886, Test Loss: 0.9653, Test Accuracy: 65.69%
Epoch [12/20], Train Loss: 0.9592, Test Loss: 0.9269, Test Accuracy: 67.87%
Epoch [1

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torchvision.models import resnet18

# Check if MPS (Apple Silicon GPU) or CUDA (NVIDIA GPU) is available
device = torch.device("mps" if torch.backends.mps.is_available() else ("cuda" if torch.cuda.is_available() else "cpu"))

# Load CIFAR-10 dataset with transformations
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=4)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False, num_workers=4)

# Load a ResNet-18 model and modify for CIFAR-10 (10 output classes)
model = resnet18(weights=None)  # No pretrained weights
model.fc = nn.Linear(model.fc.in_features, 10)  # Change final layer for CIFAR-10 classes

# If using multiple GPUs (not applicable to Apple Silicon MPS as of latest support)
# model = nn.DataParallel(model)

# Move model to device
model = model.to(device)

# Define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function
def train(model, loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    for inputs, labels in loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    return running_loss / len(loader)

# Testing function
def test(model, loader, criterion, device):
    model.eval()
    correct = 0
    total = 0
    test_loss = 0.0
    with torch.no_grad():
        for inputs, labels in loader:
            inputs, labels = inputs.to(device), labels.to(device)CN
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            test_loss += loss.item()

            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    accuracy = 100 * correct / total
    return test_loss / len(loader), accuracy

# Training loop
num_epochs = 20
for epoch in range(num_epochs):
    train_loss = train(model, trainloader, criterion, optimizer, device)
    test_loss, test_accuracy = test(model, testloader, criterion, device)
    print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss: {train_loss:.4f}, Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%')


Files already downloaded and verified
Files already downloaded and verified
Epoch [1/20], Train Loss: 1.5724, Test Loss: 1.3800, Test Accuracy: 51.36%
Epoch [2/20], Train Loss: 1.1948, Test Loss: 1.1422, Test Accuracy: 59.30%
Epoch [3/20], Train Loss: 1.0300, Test Loss: 1.0684, Test Accuracy: 62.96%
Epoch [4/20], Train Loss: 0.9237, Test Loss: 0.8670, Test Accuracy: 69.69%
Epoch [5/20], Train Loss: 0.8449, Test Loss: 0.8781, Test Accuracy: 69.89%
Epoch [6/20], Train Loss: 0.7866, Test Loss: 0.7904, Test Accuracy: 72.74%
Epoch [7/20], Train Loss: 0.7360, Test Loss: 0.7425, Test Accuracy: 74.80%
Epoch [8/20], Train Loss: 0.6988, Test Loss: 0.7322, Test Accuracy: 74.74%
Epoch [9/20], Train Loss: 0.6611, Test Loss: 0.6731, Test Accuracy: 76.97%
Epoch [10/20], Train Loss: 0.6298, Test Loss: 0.6781, Test Accuracy: 76.81%
Epoch [11/20], Train Loss: 0.6008, Test Loss: 0.6799, Test Accuracy: 76.57%
Epoch [12/20], Train Loss: 0.5786, Test Loss: 0.6354, Test Accuracy: 77.86%
Epoch [13/20], Train 

Inspired by Mohamad Kharsa's idea: improve the accuracy of ML algorithm by increase the model complexity in both feature engineering section and classification section while control the sideeffect of overfitting by increasing the 

It worked!

I tried another way to increase the complexity: rather than increase the filter size, increase the kernel size instead, here's what i found:

Conv2D([filter size], ([kernel size], [kernel size]), ...)
Model	filter size (each block)	kernel size (each block)	Training Accuracy (After all epochs)	Test Accuracy
Model2	64/128/512	3/3/3	0.9498	0.8737
Model3	64/128/512	3/5/7	0.9838	0.8635
Model4	64/64/128	3/5/7	0.9836	0.8629
Result: By increasing kernel size, only training accuracy can be improved slightly but no evident increase in test accuracy. Enlarging the kernel size brings another drawback:  drastically increase the training time. 

For details, welcome to my colab notebook: https://colab.research.google.com/drive/1_iUi-xb3ht8v3CYIZtI2nZuTWDY2hX2P?usp=sharing

I asked ChatGPT: 

Why increase size of Conv2D's kernel size only dramatically increase the training time while no effect on accuracy of the model?
Here's the answer, which makes sense: 
Increasing the kernel size in convolutional layers can dramatically increase training time while often having limited or no effect on model accuracy, especially on datasets like CIFAR-10, due to the following reasons:

1. Higher Computational Cost with Larger Kernels
Convolutional Complexity: The computational complexity of a convolutional layer scales with the square of the kernel size. A 3x3 kernel has 9 weights, but a 5x5 kernel has 25 weights (nearly 3 times more), and a 7x7 kernel has 49 weights. This increase in weights requires more multiplications and additions, significantly slowing down training.
Increased Memory Usage: Larger kernels mean more parameters, which requires more memory and may slow down data transfer between memory and processing units. This adds further delays, especially on hardware with limited memory bandwidth.
2. Diminishing Returns on Accuracy for Small Images (e.g., CIFAR-10)
Image Size: CIFAR-10 images are only 32x32 pixels. Larger kernels (like 5x5 or 7x7) are too large relative to this image size, resulting in excessive spatial filtering with minimal benefit. A 3x3 kernel is usually sufficient to capture useful features, such as edges and textures, without excessive parameter growth.
Receptive Field: Stacking multiple 3x3 convolutions achieves a larger effective receptive field while keeping the parameter count and computational load manageable. For example, two 3x3 convolutions have a receptive field of 5x5, but with fewer parameters and computational cost than a single 5x5 kernel.
3. Overfitting Risk Without Adding Informative Features
Excess Parameters: Larger kernels increase the number of parameters but don't necessarily add more meaningful information for smaller images. This can lead to overfitting as the model learns noise or redundant patterns, which don't generalize well.
Feature Redundancy: With small datasets and image sizes, smaller kernels (e.g., 3x3) often capture the necessary spatial information more efficiently. Larger kernels might end up learning redundant or less specific patterns, offering no extra value in terms of accuracy.
Summary
In short, larger convolution kernels increase the model’s complexity, computational cost, and training time without enhancing performance on small image datasets. For CIFAR-10, using a 3x3 kernel strikes a good balance, offering efficient feature extraction and reasonable model size, which generally results in both faster training and better generalization.

Above answer also explained why state-of-art architecture, such as RESNET, Mobile net, also can't perform as expect on cifar10