# AlexNet Implementation - Train on ImageNet 2012
- Source Literature: [AlexNet Paper](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)
- Data Loader Notebook: `AlexNet_Data_Loader.ipynb`
- Vanialla Model and Fused Model Prototypes: `AlexNet_Prototype_Model.ipynb`

## Dataset Directory Structure & Requirements

Enough HDD/SSD space is required for the following:
- Downloading Raw Dataset - 156.8 GB
- Convert to TFRecord and Store - 155.9 GB
- Total Storage Required - 312.7 GB

An SSD is recommended and a Mechanical HDD should be avoided since it will slow down the data loader significantly. 

### Dataset Download

ImageNet Download Link: [Download ImageNet Dataset](https://image-net.org/download-images)

- Download Train Images (Required): `ILSVRC2012_img_train.tar` - Size 137.7 GB
- Download Val Images (Required): `ILSVRC2012_img_val.tar` - Size 6.3 GB
- Download Train Images (Optional): `ILSVRC2012_img_test.tar` - Size 12.7 GB

### Raw/Source Dataset Directory Structure
Download the dataset from the above link and put it in the folder like shown:
```
imagenet2012/
├── ILSVRC2012_img_test.tar
├── ILSVRC2012_img_train.tar
└── ILSVRC2012_img_val.tar
```

### Processed/Destination Dataset Directory Structure
Create another folder and create the folders `data`, `download` & `extracted` like shown:
```
imagenet/
├── data/
├── downloaded/
└── extracted/
```

In [1]:
import os

os.environ["TF_FORCE_GPU_ALLOW_GROWTH"] = "true"
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"

import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import numpy as np
from pprint import pprint

plt.rcParams["figure.figsize"] = 30, 30

# ImageNet Data Loader

In [2]:
class ImageNetDataLoader:
    def __init__(
        self,
        source_data_dir: str,
        dest_data_dir: str,
        split: str = "train",
        image_dims: tuple = (224, 224),
        num_classes=1000
    ) -> None:
        """
        __init__
        - Instance Variable Initialization
        - Download and Set Up Dataset (One Time Operation)
        - Use TFDS to Load and convert the ImageNet Dataset
        
        Args:
            source_data_dir (str): Path to Downloaded tar files
            dest_data_dir (str): Path to the location where the dataset will be unpacked
            spliit (str): Split to load as. Eg. train, test, train[:80%]. Defaults to "train"
            image_dims (tuple, optional): Image Dimensions (width & height). Defaults to (224, 224).
            num_classes (int): Number of Classes contained in this dataset. Defaults to 1000
        """
        
        # Constants
        self.NUM_CLASSES=num_classes
        self.BATCH_SIZE = None
        self.NUM_CHANNELS = 3
        self.LABELS = []
        self.LABELMAP = {}
        self.AUTOTUNE = tf.data.experimental.AUTOTUNE
        self.WIDTH, self.HEIGHT = image_dims    
        
        # Download Config
        download_config = tfds.download.DownloadConfig(
            extract_dir=os.path.join(dest_data_dir, 'extracted'),
            manual_dir=source_data_dir
        )

        download_and_prepare_kwargs = {
            'download_dir': os.path.join(dest_data_dir, 'downloaded'),
            'download_config': download_config,
        }
        
        # TFDS Data Loader (This step also performs dataset conversion to TFRecord)
        self.dataset, self.info = tfds.load(
            'imagenet2012', 
            data_dir=os.path.join(dest_data_dir, 'data'),         
            split=split, 
            shuffle_files=True, 
            download=True, 
            as_supervised=True,
            with_info=True,
            download_and_prepare_kwargs=download_and_prepare_kwargs
        )
    
    
    def preprocess_image(self, image, label):
        """
        preprocess_image
        
        Process the image and label to perform the following operations:
        - Min Max Scale the Image (Divide by 255)
        - Convert the numerical values of the lables to One Hot Encoded Format
        - Resize the image to 224, 224
        
        Args:
            image (Image Tensor): Raw Image
            label (Tensor): Numeric Labels 1, 2, 3, ...
        Returns:
            tuple: Scaled Image, One-Hot Encoded Label
        """
        image = tf.cast(image, tf.uint8)
        image = tf.image.resize(image, [self.HEIGHT, self.WIDTH])
        image = image / tf.math.reduce_max(image)
        label = tf.one_hot(indices=label, depth=self.NUM_CLASSES)
        return image, label
    
    @tf.function
    def augment_batch(self, image, label) -> tuple:
        """
        augment_batch
        Image Augmentation for Training:
        - Random Contrast
        - Random Brightness
        - Random Hue (Color)
        - Random Saturation
        - Random Horizontal Flip
        - Random Reduction in Image Quality
        - Random Crop
        Args:
            image (Tensor Image): Raw Image
            label (Tensor): Numeric Labels 1, 2, 3, ...
        Returns:
            tuple: Augmented Image, Numeric Labels 1, 2, 3, ...
        """
        if tf.random.normal([1]) < 0:
            image = tf.image.random_contrast(image, 0.2, 0.9)
        if tf.random.normal([1]) < 0:
            image = tf.image.random_brightness(image, 0.2)
        if self.NUM_CHANNELS == 3 and tf.random.normal([1]) < 0:
            image = tf.image.random_hue(image, 0.3)
        if self.NUM_CHANNELS == 3 and tf.random.normal([1]) < 0:
            image = tf.image.random_saturation(image, 0, 15)
        
        image = tf.image.random_flip_left_right(image)
        image = tf.image.random_jpeg_quality(image, 10, 100)

        return image, label
    
    def get_dataset_size(self) -> int:
        """
        get_dataset_size
        Get the Dataset Size (Number of Images)
        Returns:
            int: Total Number of images in Dataset
        """
        return len(self.dataset)
    
    def get_num_steps(self) -> int:
        """
        get_num_steps
        Get the Number of Steps Required per Batch for Training
        Raises:
            AssertionError: Dataset Generator needs to be Initialized First
        Returns:
            int: Number of Steps Required for Training Per Batch
        """
        if self.BATCH_SIZE is None:
            raise AssertionError(
                f"Batch Size is not Initialized. Call this method only after calling: {self.dataset_generator}"
            )
        num_steps = self.get_dataset_size() // self.BATCH_SIZE + 1
        return num_steps
    
    def dataset_generator(self, batch_size=32, augment=False):
        """
        dataset_generator
        Create the Data Loader Pipeline and Return a Generator to Generate Datsets
        Args:
            batch_size (int, optional): Batch Size. Defaults to 32.
            augment (bool, optional): Enable/Disable Augmentation. Defaults to False.
        Returns:
            Tf.Data Generator: Dataset Generator
        """
        self.BATCH_SIZE = batch_size

        dataset = self.dataset.apply(tf.data.experimental.ignore_errors())

        dataset = dataset.shuffle(batch_size * 2)
        dataset = dataset.repeat()
               
        if augment:
            dataset = dataset.map(self.augment_batch, num_parallel_calls=self.AUTOTUNE)
        
        dataset = dataset.map(self.preprocess_image, num_parallel_calls=self.AUTOTUNE)
        
        dataset = dataset.batch(batch_size)
        dataset = dataset.prefetch(buffer_size=self.AUTOTUNE)

        return dataset
    
    def visualize_batch(self, augment=True) -> None:
        """
        visualize_batch
        Dataset Sample Visualization
        - Supports Augmentation
        - Automatically Adjusts for Grayscale Images
        Args:
            augment (bool, optional): Enable/Disable Augmentation. Defaults to True.
        """
        if self.NUM_CHANNELS == 1:
            cmap = "gray"
        else:
            cmap = "viridis"

        dataset = self.dataset_generator(batch_size=36, augment=augment)
        image_batch, label_batch = next(iter(dataset))
        image_batch, label_batch = (
            image_batch.numpy(),
            label_batch.numpy(),
        )

        for n in range(len(image_batch)):
            ax = plt.subplot(6, 6, n + 1)
            plt.imshow(image_batch[n], cmap=cmap)
            plt.title(np.argmax(label_batch[n]))
            plt.axis("off")
        plt.show()

# Fused AlexNet Architecture

In [3]:
# Create AlexNet Model (Fused)

# Input Layer
inputs = tf.keras.Input(shape=(224, 224, 3), name="alexnet_input")

# Layer 1 - Convolutions
l1 = tf.keras.layers.Conv2D(filters=96, kernel_size=11, strides=4, padding="same")(inputs)
l1 = tf.keras.layers.BatchNormalization()(l1)
l1 = tf.keras.layers.ReLU()(l1)
l1 = tf.keras.layers.MaxPooling2D(pool_size=3, strides=2)(l1)

# Layer 2 - Convolutions
l2 = tf.keras.layers.Conv2D(filters=256, kernel_size=5, strides=1, padding="same")(l1)
l2 = tf.keras.layers.BatchNormalization()(l2)
l2 = tf.keras.layers.ReLU()(l2)
l2 = tf.keras.layers.MaxPooling2D(pool_size=3, strides=2)(l2)

# Layer 3 - Convolutions
l3 = tf.keras.layers.Conv2D(filters=384, kernel_size=3, strides=1, padding="same")(l2)
l3 = tf.keras.layers.ReLU()(l3)

# Layer 4 - Convolutions
l4 = tf.keras.layers.Conv2D(filters=384, kernel_size=3, strides=1, padding="same")(l3)
l4 = tf.keras.layers.ReLU()(l4)

# Layer 5 - Convolutions
l5 = tf.keras.layers.Conv2D(filters=256, kernel_size=3, strides=1, padding="same")(l4)
l5 = tf.keras.layers.ReLU()(l5)
l5 = tf.keras.layers.MaxPooling2D(pool_size=3, strides=2)(l5)

# Layer 6 - Dense
l6_pre = tf.keras.layers.Flatten()(l5)

l6 = tf.keras.layers.Dense(units=4096)(l6_pre)
l6 = tf.keras.layers.ReLU()(l6)
l6 = tf.keras.layers.Dropout(rate=0.5)(l6)

# Layer 7 - Dense
l7 = tf.keras.layers.Dense(units=4096)(l6)
l7 = tf.keras.layers.ReLU()(l7)
l7 = tf.keras.layers.Dropout(rate=0.5)(l7)

# Layer 8 - Dense
l8 = tf.keras.layers.Dense(units=1000)(l7)
l8 = tf.keras.layers.Softmax(dtype=tf.float32, name="alexnet_output")(l8)

alexnet = tf.keras.models.Model(inputs=inputs, outputs=l8)

# Training Callbacks
- Early Stopping - Stop Training if Validation Accuracy Stops Increasing
- Model Checkpoint - Save Model Weights (Model) if Validation Accuracy improves
- Reduce Learning Rate - As per the Author Specifications, Learning Rate is Reduced by a factor of 10 if loss validation accuracy does not improve
- Tensorboard - Log the Training and Validation Metrics on Tensorboard for Training Analysis

In [4]:
# Callbacks
early_stop_cb = tf.keras.callbacks.EarlyStopping(
    monitor="val_categorical_accuracy",
    min_delta=0,
    patience=10,
    verbose=1,
    mode="auto",
    baseline=None,
    restore_best_weights=True,
)
model_ckpt_cb = tf.keras.callbacks.ModelCheckpoint(
    "weights/alexnet.{epoch:02d}-{val_categorical_accuracy:.2f}-{val_loss:.2f}.h5",
    monitor="val_categorical_accuracy",
    verbose=0,
    save_best_only=True,
    save_weights_only=False,
    mode="auto",
    save_freq="epoch",
)
reduce_lr_cb = tf.keras.callbacks.ReduceLROnPlateau(
    monitor="val_categorical_accuracy",
    factor=0.1,
    patience=2,
    verbose=0,
    mode="auto",
    min_delta=0.0001,
    cooldown=0,
    min_lr=10e-8,
)
tensorboard_cb = tf.keras.callbacks.TensorBoard(
    log_dir="tb_logs/",
    histogram_freq=2,
    write_graph=True,
    write_images=False,
    update_freq="epoch",
    profile_batch=2,
    embeddings_freq=2,
    embeddings_metadata=None,
)
callbacks = [early_stop_cb, model_ckpt_cb, reduce_lr_cb, tensorboard_cb]

# Training and Validation Metrics

### Legend:
- **TP** - True Positive
- **FP** - False Positive
- **TN** - True Negative
- **FN** - False Negative

### Metrices:
- **Categorical Accuracy** = $\frac{TP+FP}{TP+FP+TN+FN}$
- **Precision** = $\frac{TP}{TP+FP}$
- **Recall** = $\frac{TP}{TP+FN}$
- **F1 Score** = $2*\frac{Precision*Recall}{Precision+Recall}$

In [5]:
# Metrics
metrics = [
    tf.keras.metrics.CategoricalAccuracy(),
    tf.keras.metrics.FalseNegatives(),
    tf.keras.metrics.FalsePositives(),
    tf.keras.metrics.Precision(),
    tf.keras.metrics.Recall(),
    tfa.metrics.F1Score(num_classes=1000)
]

# Mixed Precision Training
Enable Mixed Precision Training for Supported GPUs to utilize the optimized Tensor Cores for Matrix Operations. 

Read About Tensor Cores Here: [Tensor Cores - Nvidia Developer](https://www.nvidia.com/en-in/data-center/tensor-cores/)

In [6]:
# Mixed Precision
tf.keras.mixed_precision.set_global_policy("mixed_float16")

INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: GeForce RTX 2080 Ti, compute capability 7.5


# Initialize Data Loader for Training & Validation

In [7]:
# Constants
BATCH_SIZE = 128


# Init Data Loaders
train_data_loader = ImageNetDataLoader(
        source_data_dir = "/mnt/data/pycodes/Dataset/imagenet2012",
        dest_data_dir = "/home/ani/Documents/datasets/imagenet",
        split = "train",
        image_dims = (224, 224),
)

val_data_loader = ImageNetDataLoader(
        source_data_dir = "/mnt/data/pycodes/Dataset/imagenet2012",
        dest_data_dir = "/home/ani/Documents/datasets/imagenet",
        split = "validation",
        image_dims = (224, 224),
)

train_generator = train_data_loader.dataset_generator(batch_size=BATCH_SIZE, augment=False)
val_generator = val_data_loader.dataset_generator(batch_size=BATCH_SIZE, augment=False)

train_steps = train_data_loader.get_num_steps()
val_steps = val_data_loader.get_num_steps()

# Compile Model and Start Training

In [None]:
# Constants
EPOCHS = 200

# Compile & Train
alexnet.compile(
    loss=tf.keras.losses.CategoricalCrossentropy(),
    optimizer=tf.keras.optimizers.SGD(
        learning_rate=0.01, momentum=0.9, nesterov=False, name='SGD'
    ),
    metrics=metrics,
)

history = alexnet.fit(
    epochs=EPOCHS,
    x=train_generator,
    steps_per_epoch=train_steps,
    validation_data=val_generator,
    validation_steps=val_steps,
    callbacks=callbacks
)

Epoch 1/200