# HW11 on Data Science course of Sharif University of Technology
## Created by: Mohammad Mahdi Hossein Beiky     SI: 400100995
## GitHub URL: https://github.com/Mmhb1382/Data-Science-HW11.git

### Don't bother yourself to run any of following scripts, it takes decades.
---

# Preparing Tasks: Downloading the Fashion-MNIST dataset from kaggle

In [4]:
"""
 1) Point the Kaggle API to the folder where your kaggle.json lives
 2) Download and unzip the Fashion‑MNIST dataset from Kaggle into a local data/ folder
 3) Load the train and test CSV files into pandas DataFrames and sanity‑check their shapes
"""

# Step 1: Tell the Kaggle CLI where to find your API token
import os
os.environ['KAGGLE_CONFIG_DIR'] = r'C:\PyCharm\HW11'  # adjust if your path is different

# Step 2: Download and unzip the dataset from Kaggle
!kaggle datasets download \
    --unzip \
    --force \
    -d zalando-research/fashionmnist \
    -p ./data

# Step 3: Read the CSV files into pandas
import pandas as pd

df_train = pd.read_csv('data/fashion-mnist_train.csv')
df_test  = pd.read_csv('data/fashion-mnist_test.csv')

# Quick sanity check
print("Training data shape:", df_train.shape)
print("Test data shape:    ", df_test.shape)


Dataset URL: https://www.kaggle.com/datasets/zalando-research/fashionmnist
License(s): other
Downloading fashionmnist.zip to ./data




  0%|          | 0.00/68.8M [00:00<?, ?B/s]
100%|##########| 68.8M/68.8M [00:00<00:00, 1.82GB/s]


Training data shape: (60000, 785)
Test data shape:     (10000, 785)


# Preparing Tasks: Preprocessing

In [15]:
"""
 1) Extract pixel data and labels from our train/test DataFrames
 2) Reshape the flat 784‑pixel vectors into (28,28,1) image tensors
 3) Normalize pixel values from [0,255] to [0,1]
 4) One‑hot encode the integer labels for 10 classes
 5) Compute per‑pixel mean/std on the training set and standardize both sets
"""

import numpy as np
from tensorflow.keras.utils import to_categorical

# Step 1: Pull apart pixels versus labels
X_train_flat = df_train.drop('label', axis=1).values
y_train      = df_train['label'].values
X_test_flat  = df_test.drop('label', axis=1).values
y_test       = df_test['label'].values

# Step 2: Turn each 784‑vector into a 28×28×1 image
X_train = X_train_flat.reshape(-1, 28, 28, 1).astype('float32')
X_test  = X_test_flat.reshape(-1, 28, 28, 1).astype('float32')

# Step 3: Scale pixels to the [0,1] range
X_train /= 255.0
X_test  /= 255.0

# Step 4: One‑hot encode our labels
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test  = to_categorical(y_test,  num_classes)

# Step 5: Standardize based on training‑set statistics
mean = np.mean(X_train, axis=0, keepdims=True)
std  = np.std(X_train,  axis=0, keepdims=True) + 1e-7  # avoid div0
X_train = (X_train - mean) / std
X_test  = (X_test  - mean) / std


# Task 1: Three CNN Variants with 3‑Fold Cross‑Validation

In [16]:
"""
 1) Define three different ConvNet layouts (options A, B, C)
 2) For each: run 3‑fold CV, collect fold accuracies
 3) Print per‑fold and average for each option
"""

from sklearn.model_selection import KFold
from tensorflow.keras import Sequential
from tensorflow.keras.layers import (
    Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
)
import numpy as np

# 3 model “options” to try:
configs = {
    'A': {  # baseline 2-layer small kernels
        'layers': [
            dict(filters=32, kernel_size=(3,3)),
            dict(filters=64, kernel_size=(3,3)),
        ]
    },
    'B': {  # larger kernels
        'layers': [
            dict(filters=32, kernel_size=(5,5)),
            dict(filters=64, kernel_size=(5,5)),
        ]
    },
    'C': {  # extra conv layer + smaller dense head
        'layers': [
            dict(filters=32, kernel_size=(3,3)),
            dict(filters=64, kernel_size=(3,3)),
            dict(filters=128, kernel_size=(3,3)),
        ],
        'dense_units': 64
    }
}

# Full dataset ready from the “Loading & Preprocessing Fashion‑MNIST” section
X_full = X_train
y_full = y_train

for name, cfg in configs.items():
    print(f"\n=== Option {name} ===")
    fold_accs = []

    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    for fold, (tr_idx, val_idx) in enumerate(kf.split(X_full), start=1):
        # split out fold
        X_tr, X_val = X_full[tr_idx], X_full[val_idx]
        y_tr, y_val = y_full[tr_idx], y_full[val_idx]

        # build model
        model = Sequential([ Input(shape=X_tr.shape[1:]) ])
        # add augmentation if you like, e.g. data_augmentation
        for layer_cfg in cfg['layers']:
            model.add(Conv2D(**layer_cfg, activation='relu'))
            model.add(MaxPooling2D((2,2)))
        model.add(Flatten())
        # use custom dense size if provided
        dense_units = cfg.get('dense_units', 128)
        model.add(Dense(dense_units, activation='relu'))
        model.add(Dropout(0.5))
        model.add(Dense(num_classes, activation='softmax'))

        model.compile(optimizer='adam',
                      loss='categorical_crossentropy',
                      metrics=['accuracy'])

        # train on raw arrays (no ImageDataGenerator)
        model.fit(
            X_tr, y_tr,
            epochs=10,
            batch_size=32,
            verbose=0
        )

        # eval this fold
        _, acc = model.evaluate(X_val, y_val, verbose=0)
        print(f"Fold {fold} accuracy: {acc:.4f}")
        fold_accs.append(acc)

    avg = np.mean(fold_accs)
    print(f"Option {name} avg accuracy: {avg:.4f}")



=== Option A ===
Fold 1 accuracy: 0.9014
Fold 2 accuracy: 0.9140
Fold 3 accuracy: 0.9104
Option A avg accuracy: 0.9086

=== Option B ===
Fold 1 accuracy: 0.9085
Fold 2 accuracy: 0.9083
Fold 3 accuracy: 0.9024
Option B avg accuracy: 0.9064

=== Option C ===
Fold 1 accuracy: 0.8850
Fold 2 accuracy: 0.8896
Fold 3 accuracy: 0.8871
Option C avg accuracy: 0.8873


#### Fold 1 accuracy: 0.9014
#### Fold 2 accuracy: 0.9140
#### Fold 3 accuracy: 0.9104
#### Option A avg accuracy: 0.9086
---
#### Fold 1 accuracy: 0.9085
#### Fold 2 accuracy: 0.9083
#### Fold 3 accuracy: 0.9024
#### Option B avg accuracy: 0.9064
---
#### Fold 1 accuracy: 0.8850
#### Fold 2 accuracy: 0.8896
#### Fold 3 accuracy: 0.8871
#### Option C avg accuracy: 0.8873

# Task 2: Tuning Kernel Size, Stride, Pooling Size & Pooling Stride

In [17]:
"""
 1) Define four variants that each adjust one convolution/pooling parameter
     • Variant A: increase kernel size to (5,5)
     • Variant B: set convolution stride to 2
     • Variant C: increase pooling window to (3,3)
     • Variant D: set pooling stride to (1,1)
 2) For each variant, run 3‑fold CV and compute average validation accuracy
"""

from sklearn.model_selection import KFold
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
import numpy as np

# Variants to test
variants = {
    'A_kernel5x5':    {'kernel_size': (5,5), 'conv_stride': 1, 'pool_size': (2,2), 'pool_stride': 2},
    'B_convStride2':  {'kernel_size': (3,3), 'conv_stride': 2, 'pool_size': (2,2), 'pool_stride': 2},
    'C_pool3x3':      {'kernel_size': (3,3), 'conv_stride': 1, 'pool_size': (3,3), 'pool_stride': 3},
    'D_poolStride1':  {'kernel_size': (3,3), 'conv_stride': 1, 'pool_size': (2,2), 'pool_stride': 1},
}

# Full dataset ready from the “Loading & Preprocessing Fashion‑MNIST” section
X_full, y_full = X_train, y_train

for name, cfg in variants.items():
    print(f"\n=== Variant {name} ===")
    fold_accs = []
    kf = KFold(n_splits=3, shuffle=True, random_state=42)

    for fold_idx, (tr_idx, val_idx) in enumerate(kf.split(X_full), start=1):
        # Split data for this fold
        X_tr, X_val = X_full[tr_idx], X_full[val_idx]
        y_tr, y_val = y_full[tr_idx], y_full[val_idx]

        # Build model with variant-specific settings
        model = Sequential([
            Input(shape=X_tr.shape[1:]),
            Conv2D(32,
                   cfg['kernel_size'],
                   strides=cfg['conv_stride'],
                   activation='relu'),
            MaxPooling2D(cfg['pool_size'], strides=cfg['pool_stride']),
            Conv2D(64,
                   cfg['kernel_size'],
                   strides=cfg['conv_stride'],
                   activation='relu'),
            MaxPooling2D(cfg['pool_size'], strides=cfg['pool_stride']),
            Flatten(),
            Dense(128, activation='relu'),
            Dropout(0.5),
            Dense(num_classes, activation='softmax')
        ])
        model.compile(
            optimizer='adam',
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )

        # Train and evaluate
        model.fit(X_tr, y_tr, epochs=10, batch_size=32, verbose=0)
        _, acc = model.evaluate(X_val, y_val, verbose=0)
        print(f"Fold {fold_idx} accuracy: {acc:.4f}")
        fold_accs.append(acc)

    avg_acc = np.mean(fold_accs)
    print(f"Variant {name} average CV accuracy: {avg_acc:.4f}")



=== Variant A_kernel5x5 ===
Fold 1 accuracy: 0.9025
Fold 2 accuracy: 0.9069
Fold 3 accuracy: 0.9047
Variant A_kernel5x5 average CV accuracy: 0.9047

=== Variant B_convStride2 ===
Fold 1 accuracy: 0.8647
Fold 2 accuracy: 0.8658
Fold 3 accuracy: 0.8627
Variant B_convStride2 average CV accuracy: 0.8644

=== Variant C_pool3x3 ===
Fold 1 accuracy: 0.8880
Fold 2 accuracy: 0.9005
Fold 3 accuracy: 0.8899
Variant C_pool3x3 average CV accuracy: 0.8928

=== Variant D_poolStride1 ===
Fold 1 accuracy: 0.9182
Fold 2 accuracy: 0.9229
Fold 3 accuracy: 0.9141
Variant D_poolStride1 average CV accuracy: 0.9184


#### Fold 1 accuracy: 0.9025
#### Fold 2 accuracy: 0.9069
#### Fold 3 accuracy: 0.9047
#### Variant A_kernel5x5 average CV accuracy: 0.9047
---
#### Fold 1 accuracy: 0.8647
#### Fold 2 accuracy: 0.8658
#### Fold 3 accuracy: 0.8627
#### Variant B_convStride2 average CV accuracy: 0.8644
---
#### Fold 1 accuracy: 0.8880
#### Fold 2 accuracy: 0.9005
#### Fold 3 accuracy: 0.8899
#### Variant C_pool3x3 average CV accuracy: 0.8928
---
#### Fold 1 accuracy: 0.9182
#### Fold 2 accuracy: 0.9229
#### Fold 3 accuracy: 0.9141
#### Variant D_poolStride1 average CV accuracy: 0.9184

# Task 3: Data Augmentation with ImageDataGenerator

In [22]:
"""
 1) Define three different augmentation pipelines (Options A, B, C)
 2) For each pipeline, run 3‑fold CV
 3) For each fold, build the same 2‑conv CNN
 4) Use ImageDataGenerator.flow() to get augmented batches
 5) Train with model.train_on_batch(...) instead of model.fit()
 6) Evaluate on raw (X_val, y_val) arrays
 7) Report per‑fold and average accuracy for each option
"""

from sklearn.model_selection import KFold
from tensorflow.keras.models    import Sequential
from tensorflow.keras.layers    import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

# Three augmentation configurations to try
aug_configs = {
    'A': {
        'rotation_range': 15,
        'width_shift_range': 0.1,
        'height_shift_range': 0.1,
        'horizontal_flip': True
    },
    'B': {
        'rotation_range': 30,
        'zoom_range': 0.2,
        'width_shift_range': 0.1,
        'height_shift_range': 0.1
    },
    'C': {
        'shear_range': 0.2,
        'zoom_range': 0.2,
        'horizontal_flip': True
    }
}

# Full training arrays from the “Loading & Preprocessing Fashion‑MNIST” section
X_full, y_full = X_train, y_train

for name, params in aug_configs.items():
    print(f"\n=== Augmentation Option {name} ===")
    fold_accs = []

    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    for fold_idx, (tr_idx, val_idx) in enumerate(kf.split(X_full), start=1):
        print(f"–– Fold {fold_idx} ––")
        X_tr, X_val = X_full[tr_idx], X_full[val_idx]
        y_tr, y_val = y_full[tr_idx], y_full[val_idx]

        # Build baseline 2‑conv CNN
        model = Sequential([
            Input(shape=X_tr.shape[1:]),
            Conv2D(32, (3,3), activation='relu'),
            MaxPooling2D((2,2)),
            Conv2D(64, (3,3), activation='relu'),
            MaxPooling2D((2,2)),
            Flatten(),
            Dense(128, activation='relu'),
            Dropout(0.5),
            Dense(num_classes, activation='softmax')
        ])
        model.compile(
            optimizer='adam',
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )

        # Create an augmenting generator for this fold
        train_datagen = ImageDataGenerator(**params)
        train_gen     = train_datagen.flow(X_tr, y_tr, batch_size=32)
        steps_per_epoch = len(train_gen)

        # Manually train for N epochs over augmented batches
        epochs = 10
        for epoch in range(epochs):
            for step in range(steps_per_epoch):
                Xb, yb = next(train_gen)                   # use next() instead of .next()
                model.train_on_batch(Xb, yb)

        # Evaluate on the raw validation arrays
        loss, acc = model.evaluate(X_val, y_val, verbose=0)
        print(f"Fold {fold_idx} accuracy: {acc:.4f}\n")
        fold_accs.append(acc)

    avg_acc = np.mean(fold_accs)
    print(f"Option {name} average CV accuracy: {avg_acc:.4f}")



=== Augmentation Option A ===
–– Fold 1 ––
Fold 1 accuracy: 0.8745

–– Fold 2 ––
Fold 2 accuracy: 0.8778

–– Fold 3 ––
Fold 3 accuracy: 0.8769

Option A average CV accuracy: 0.8764

=== Augmentation Option B ===
–– Fold 1 ––
Fold 1 accuracy: 0.8615

–– Fold 2 ––
Fold 2 accuracy: 0.8719

–– Fold 3 ––
Fold 3 accuracy: 0.8486

Option B average CV accuracy: 0.8606

=== Augmentation Option C ===
–– Fold 1 ––
Fold 1 accuracy: 0.8881

–– Fold 2 ––
Fold 2 accuracy: 0.8936

–– Fold 3 ––
Fold 3 accuracy: 0.8860

Option C average CV accuracy: 0.8892


#### Fold 1 accuracy: 0.8745
#### Fold 2 accuracy: 0.8778
#### Fold 3 accuracy: 0.8769
#### Option A average CV accuracy: 0.8764
---
#### Fold 1 accuracy: 0.8615
#### Fold 2 accuracy: 0.8719
#### Fold 3 accuracy: 0.8486
#### Option B average CV accuracy: 0.8606
---
#### Fold 1 accuracy: 0.8881
#### Fold 2 accuracy: 0.8936
#### Fold 3 accuracy: 0.8860
#### Option C average CV accuracy: 0.8892

# Task 4: Transfer Learning with VGG19 and ResNet50

In [26]:
"""
 1) Expand our (28×28×1) grayscale images to RGB and resize to 32×32
 2) Pre‑load VGG19 and ResNet50 “no top” weights from Keras so they cache once
 3) For each model, run 3‑fold CV:
    • Load base with weights='imagenet', include_top=False
    • Freeze the base, attach a new head (GlobalAvgPool → Dense(256) → Dropout → Dense(10))
    • Compile, train for 5 epochs (verbose=1), evaluate
 4) Print per‑fold and average validation accuracy for each model
"""

import numpy as np
import tensorflow as tf
from sklearn.model_selection import KFold
from tensorflow.keras.applications import VGG19, ResNet50
from tensorflow.keras.models       import Model
from tensorflow.keras.layers       import Input, GlobalAveragePooling2D, Dense, Dropout

# Step 1: RGB expand and resize once
X_rgb     = np.repeat(X_train, 3, axis=-1)              # (n,28,28,3)
X_resized = tf.image.resize(X_rgb, [32,32]).numpy()     # (n,32,32,3)
y_full    = y_train

# Step 2: Pre‑cache both sets of weights (downloads happen here)
_ = VGG19   (weights='imagenet', include_top=False, input_shape=(32,32,3))
_ = ResNet50(weights='imagenet', include_top=False, input_shape=(32,32,3))

# Step 3: Set up cross‑validation
kf = KFold(n_splits=3, shuffle=True, random_state=42)
models = {
    'VGG19':    VGG19,
    'ResNet50': ResNet50
}

for name, Constructor in models.items():
    print(f"\n=== Transfer Learning: {name} ===")
    fold_accs = []

    for fold_idx, (tr, val) in enumerate(kf.split(X_resized), start=1):
        # split data
        X_tr, X_val = X_resized[tr], X_resized[val]
        y_tr, y_val = y_full[tr],    y_full[val]

        # load and freeze the base
        base = Constructor(weights='imagenet',
                           include_top=False,
                           input_shape=(32,32,3))
        base.trainable = False

        # attach new classification head
        inputs = Input(shape=(32,32,3))
        x = base(inputs, training=False)
        x = GlobalAveragePooling2D()(x)
        x = Dense(256, activation='relu')(x)
        x = Dropout(0.5)(x)
        outputs = Dense(num_classes, activation='softmax')(x)
        model = Model(inputs, outputs)

        model.compile(optimizer='adam',
                      loss='categorical_crossentropy',
                      metrics=['accuracy'])

        # train & validate
        print(f"--- {name} fold {fold_idx} ---")
        model.fit(X_tr, y_tr,
                  epochs=5,            # reduced for speed
                  batch_size=32,
                  verbose=1,
                  validation_data=(X_val, y_val))
        _, acc = model.evaluate(X_val, y_val, verbose=0)
        print(f"{name} fold {fold_idx} accuracy: {acc:.4f}\n")
        fold_accs.append(acc)

    avg_acc = np.mean(fold_accs)
    print(f"{name} average 3‑fold CV accuracy: {avg_acc:.4f}")


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 0us/step

=== Transfer Learning: VGG19 ===
--- VGG19 fold 1 ---
Epoch 1/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m149s[0m 118ms/step - accuracy: 0.6964 - loss: 0.8783 - val_accuracy: 0.8132 - val_loss: 0.5096
Epoch 2/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m174s[0m 139ms/step - accuracy: 0.8034 - loss: 0.5386 - val_accuracy: 0.8191 - val_loss: 0.4716
Epoch 3/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m174s[0m 139ms/step - accuracy: 0.8164 - loss: 0.4967 - val_accuracy: 0.8357 - val_loss: 0.4449
Epoch 4/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m184s[0m 147ms/step - accuracy: 0.8275 - loss: 0.4631 - val_accuracy: 0.8341 - val_loss: 0.4391
Epoch 5/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━

#### VGG19 fold 1 accuracy: 0.8421
#### VGG19 fold 2 accuracy: 0.8475
#### VGG19 fold 3 accuracy: 0.8446

#### VGG19 average 3‑fold CV accuracy: 0.8447
---
#### ResNet50 fold 1 accuracy: 0.7790
#### ResNet50 fold 2 accuracy: 0.7857
#### ResNet50 fold 3 accuracy: 0.7858

#### ResNet50 average 3‑fold CV accuracy: 0.7835

# Task 5: Effects of Receptive Field Size in Convolution Layers

In convolutional neural networks, the “window size” or kernel size determines each layer’s **receptive field**—the region of the input image that influences a single output activation. Adjusting this size has the following effects:

---

## 1. Increasing the receptive field (larger kernels)

- **Broader context per layer**
  A 5×5 filter sees a 5×5 patch at once, capturing larger patterns or textures in a single convolution.

- **Fewer layers to cover large areas**
  Large kernels allow the network to “zoom out” more quickly, which can help when distinguishing global shapes.

- **Higher parameter count & compute cost**
  A 5×5 kernel has 25 weights per channel versus 9 for a 3×3, increasing memory and FLOPs—and raising overfitting risk if data are limited.

- **Smoothed / blurred features**
  Aggregating over a larger region can wash out fine details or high‑frequency signals.

---

## 2. Decreasing the receptive field (smaller kernels)

- **Fine‑grained detail retention**
  A 3×3 filter focuses closely on edges, corners, and small textures.

- **Greater non‑linearity per effective field**
  Stacking two 3×3 layers yields an effective 5×5 receptive field **plus** two activation functions, enabling richer feature learning with fewer total parameters.

- **Deeper networks needed for global context**
  To “see” a 7×7 or larger region you must stack more layers, which can slow training and complicate optimization.

- **Parameter efficiency**
  Modern architectures (VGG, ResNet, EfficientNet) favor 3×3 almost everywhere: they build up large receptive fields via depth rather than expensive wide filters.

---

## 3. Practical trade‑offs (as seen in our variants)

- **3×3 kernels (Variant A)**
  Lightweight and fast, but requires multiple layers to capture larger patterns.

- **5×5 kernels (Variant B)**
  Captures broader context immediately, but at ~3× the parameters of a 3×3 layer and with potential loss of small details.

- **Three 3×3 layers (Variant C)**
  Achieves an effective 7×7 field **plus** extra non‑linear transforms, often outperforming a single 5×5 layer while remaining more efficient.

---

### Why it matters

- **Depth + small kernels** is generally best for expressive power, regularization, and training stability.
- **Large kernels** can be useful in early layers or very low‑resolution inputs, but should be applied sparingly.

State‑of‑the‑art CNNs almost universally use 3×3 filters, reserving larger windows only where global context is essential.
