# Convolutional Neural Networks on CIFAR-10

## Context and Motivation
In this course, neural networks are not treated as black boxes but as architectural components whose design choices affect performance, scalability, and interpretability. This assignment focuses on convolutional layers as a concrete example of how inductive bias is introduced into learning systems. Rather than following a recipe, we analyze and experiment with a convolutional architecture using a real dataset (CIFAR-10).

## Learning Objectives
- Understand the role and intuition behind convolutional layers
- Analyze architectural decisions (kernel size, depth, stride, padding)
- Compare convolutional and fully connected models
- Perform a minimal exploratory data analysis (EDA)
- Communicate architectural and experimental decisions clearly

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Dataset Selection and Justification
CIFAR-10 is a standard benchmark dataset for image classification consisting of small natural images. It is appropriate for convolutional networks because it exhibits spatial locality, local patterns (edges, textures), and translation invariance, which align well with the inductive bias of convolutional layers.

In [None]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

## Exploratory Data Analysis (EDA)
We inspect dataset size, image shape, and class distribution.

In [None]:
print('Training samples:', x_train.shape)
print('Test samples:', x_test.shape)
unique, counts = np.unique(y_train, return_counts=True)
pd.DataFrame({'Class': unique.flatten(), 'Count': counts})

In [None]:
class_names = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']
plt.figure(figsize=(8,4))
for i in range(10):
    plt.subplot(2,5,i+1)
    plt.imshow(x_train[i])
    plt.title(class_names[y_train[i][0]])
    plt.axis('off')
plt.tight_layout()
plt.show()

## Baseline Model (Fully Connected)
A simple non-convolutional network serves as a reference point.

In [None]:
fc_model = models.Sequential([
    layers.Flatten(input_shape=(32,32,3)),
    layers.Dense(256, activation='relu'),
    layers.Dense(10, activation='softmax')
])
fc_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
fc_model.summary()

In [None]:
fc_history = fc_model.fit(x_train, y_train, epochs=5, validation_split=0.1, verbose=2)
fc_test_loss, fc_test_acc = fc_model.evaluate(x_test, y_test, verbose=0)

## Convolutional Model Design
We design a simple CNN using small kernels (3×3), ReLU activations, and max pooling.

In [None]:
cnn_3x3 = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', padding='same', input_shape=(32,32,3)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu', padding='same'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])
cnn_3x3.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
cnn_3x3.summary()

In [None]:
cnn_history = cnn_3x3.fit(x_train, y_train, epochs=5, validation_split=0.1, verbose=2)
cnn_test_loss, cnn_test_acc = cnn_3x3.evaluate(x_test, y_test, verbose=0)

## Controlled Experiment: Kernel Size (5×5)
We keep all components fixed and only change the kernel size.

In [None]:
cnn_5x5 = models.Sequential([
    layers.Conv2D(32, (5,5), activation='relu', padding='same', input_shape=(32,32,3)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (5,5), activation='relu', padding='same'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])
cnn_5x5.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
cnn_5x5.summary()

In [None]:
cnn5_history = cnn_5x5.fit(x_train, y_train, epochs=5, validation_split=0.1, verbose=2)
loss_5x5, acc_5x5 = cnn_5x5.evaluate(x_test, y_test, verbose=0)

## Results Summary

In [None]:
results = pd.DataFrame({
    'Model': ['Fully Connected', 'CNN 3x3', 'CNN 5x5'],
    'Test Accuracy': [fc_test_acc, cnn_test_acc, acc_5x5]
})
results

## Interpretation and Architectural Reasoning
Convolutional models outperform the fully connected baseline due to their inductive bias toward spatial locality and translation invariance. Smaller kernels (3×3) achieve comparable or better performance with fewer parameters. Convolution is less suitable for data without spatial structure or stationarity.