**Quick observation (AI)**:

Some assistance for figuring out the keras/tensorflow API was provided by ChatGPT. Optimization of the parameters, however, was manual. There was also assistance to write the checkpoints/callbacks used in training. Everything else related to the model itself was written based on the original paper [here](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf) and different implementations across the web for ResNet-18 + some fine tuning. (e.g Data Augmentation)

In [1]:
%%bash
pip install --upgrade pip
pip install numpy matplotlib keras tensorflow[and-cuda]

Traceback (most recent call last):
  File "/home/jovyan/.local/lib/python3.8/site-packages/pip/_vendor/msgpack/__init__.py", line 15, in <module>
    from ._cmsgpack import Packer, unpackb, Unpacker
ModuleNotFoundError: No module named 'pip._vendor.msgpack._cmsgpack'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/pip", line 8, in <module>
    sys.exit(main())
  File "/home/jovyan/.local/lib/python3.8/site-packages/pip/_internal/cli/main.py", line 78, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
  File "/home/jovyan/.local/lib/python3.8/site-packages/pip/_internal/commands/__init__.py", line 114, in create_command
    module = importlib.import_module(module_path)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
 

Error while terminating subprocess (pid=354): 


    from pip._internal.cli.req_command import (
  File "/home/jovyan/.local/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 18, in <module>
    from pip._internal.index.collector import LinkCollector
  File "/home/jovyan/.local/lib/python3.8/site-packages/pip/_internal/index/collector.py", line 38, in <module>
    from pip._internal.network.session import PipSession
  File "/home/jovyan/.local/lib/python3.8/site-packages/pip/_internal/network/session.py", line 33, in <module>
    from pip._vendor.cachecontrol import CacheControlAdapter as _BaseCacheControlAdapter
  File "/home/jovyan/.local/lib/python3.8/site-packages/pip/_vendor/cachecontrol/__init__.py", line 13, in <module>
    from pip._vendor.cachecontrol.adapter import CacheControlAdapter
  File "/home/jovyan/.local/lib/python3.8/site-packages/pip/_vendor/cachecontrol/adapter.py", line 14, in <module>
    from pip._vendor.cachecontrol.controller import PERMANENT_REDIRECT_STATUSES, CacheController
  File "/home

In [1]:
#imports
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from keras.datasets import cifar10
from keras.layers import Conv2D, AveragePooling2D, MaxPooling2D, GlobalAveragePooling2D, Dense, Flatten, BatchNormalization, Add, Input, ReLU
from keras.models import Sequential
from keras.utils import to_categorical
from keras.regularizers import l2
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, LearningRateScheduler
from keras.models import Model
from keras.optimizers import Adam, SGD
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model

2025-01-08 11:36:27.655754: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-08 11:36:27.876533: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-08 11:36:29.645148: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-01-08 11:

In [2]:
# to use gpus on jupyter lab
tf.debugging.set_log_device_placement(False)
gpus = tf.config.list_physical_devices('GPU')
index_to_use = [0, 1] # add more depending on the server
device_names = [f'/GPU:{i}' for i in index_to_use]
strategy = tf.distribute.MirroredStrategy(devices=device_names)

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')


2025-01-08 11:36:36.751494: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short; LD_LIBRARY_PATH: /usr/local/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-01-08 11:36:36.751516: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)


In [4]:
#if using only CPU
tf.debugging.set_log_device_placement(False)
# Disable all GPUs
tf.config.set_visible_devices([], 'GPU')

In [3]:
#CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
print(train_images.shape)
train_images = train_images.reshape(train_images.shape[0], 32, 32, 3) #ensure shape 32 W x 32 H x 3 channels for each image
test_images = test_images.reshape(test_images.shape[0], 32, 32, 3)

#range 0-1
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

#One-hot encoding labels
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)

(50000, 32, 32, 3)


In [8]:
# functional API:
# Define ResNetBlock as a function
def ResNetBlock(x, n_filters, kernel_size=(3, 3), kernel_init='HeNormal', downsample=False):
    strides = [2, 1] if downsample else [1, 1]
    
    # Residual connection - if downsampling, apply to the original input to match shapes
    # for the shortcut connection (kernel size = 1 basically just merges RGB channels)
    if downsample:
        res = Conv2D(n_filters, kernel_size=(1, 1), strides=2, padding='same', kernel_initializer=kernel_init)(x)  # Apply downsampling to original input
        res = BatchNormalization()(res)
    else:
        res = x  # When not downsampling, residual is just the output of the block
    
    # First convolution
    x = Conv2D(n_filters, kernel_size, strides=strides[0], padding='same', kernel_initializer=kernel_init)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)
        
    # Second convolution
    x = Conv2D(n_filters, kernel_size, strides=strides[1], padding='same', kernel_initializer=kernel_init)(x)
    x = BatchNormalization()(x)
    
    # Add the residual connection (skip connection)
    x = Add()([x, res])
    x = ReLU()(x)
    
    return x

# Create the ResNet18 model using the functional API
def ResNet18(input_shape=(32, 32, 3), n_classes=10):
    input_tensor = Input(shape=input_shape)
    
    # Initial part
    x = Conv2D(64, (3, 3), strides=1, padding='same', kernel_initializer='HeNormal')(input_tensor)
    x = BatchNormalization()(x)
    x = ReLU()(x)
    
    # Blocks - 2 x 2 blocks x 4 stages of convolution layers
    x = ResNetBlock(x, 64, downsample=False)  # 64 filters, no downsampling
    x = ResNetBlock(x, 64, downsample=False)
    
    x = ResNetBlock(x, 128, downsample=True)  # 128 filters, with downsampling
    x = ResNetBlock(x, 128, downsample=False)
    
    x = ResNetBlock(x, 256, downsample=True)  # 256 filters, with downsampling
    x = ResNetBlock(x, 256, downsample=False)
    
    x = ResNetBlock(x, 512, downsample=True)  # 512 filters, with downsampling
    x = ResNetBlock(x, 512, downsample=False)

    # Final part
    #x = GlobalAveragePooling2D()(x) replaced by 2 lines below
    x = AveragePooling2D(pool_size=(4, 4), strides=(4, 4), padding="valid")(x)
    x = Flatten()(x)
    
    output = Dense(n_classes, activation='softmax')(x)
    
    # Create the complete model
    model = Model(inputs=input_tensor, outputs=output)
    
    return model

In [9]:
with strategy.scope():
    batch_size = 128
    #loosely following the paper: : 4 pixels are padded on each side,
    #and a 32×32 crop is randomly sampled from the padded
    #image or its horizontal flip. In this case we dont pad but sample from 4 possible shifts (for each direction)

    datagen = ImageDataGenerator(
                featurewise_center=False,
                samplewise_center=False,
                featurewise_std_normalization=False,
                samplewise_std_normalization=False,
                zca_whitening=False,
                width_shift_range=4, #4 pixel padding
                height_shift_range=4,
                horizontal_flip=True,  # randomly flip images horizontally
                vertical_flip=False,
            )
    print('Data Augmentation...')
    train_gen = datagen.flow(train_images, train_labels, batch_size=batch_size)

    #Build model, set optimizations
    model = ResNet18()
    #model = load_model("....h5"), IF WANT TO CONTINUE TRAINING
    
    model.build(input_shape=(None, 32, 32, 3)) #Cifar-10 shape
    model.summary()    
    #opt = Adam(learning_rate=1e-2) 
    opt = SGD(learning_rate=0.1, momentum=0.9, decay=1e-4)
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

    # Initially ES was used but to ensure the maximum possible accuracy we ended up training for a lot of epochs anyway
    #es = EarlyStopping(patience=20, restore_best_weights=True, monitor="val_accuracy")
    checkpoint = ModelCheckpoint(
        filepath='avg_pool_model.h5', #change here saved model name    
        monitor='val_accuracy',      
        save_best_only=True,         
        mode='max',                  
        verbose=True
    )
    
    def lr_schedule(epoch, lr):
        if(epoch % 100 == 0):
            new_lr = 0.1 * lr
            print("Learning rate is", new_lr)
            return new_lr
        return lr

    lr_scheduler = LearningRateScheduler(lr_schedule)

    #fit and evaluate

    history = model.fit(train_gen,
               batch_size=batch_size,
               epochs=200,
               verbose=1,
               validation_data=(test_images, test_labels),
               callbacks=[checkpoint]) #add/remove lr scheduler here

Data Augmentation...
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).

'\n    history = model.fit(train_gen,\n               batch_size=batch_size,\n               epochs=200,\n               verbose=1,\n               validation_data=(test_images, test_labels),\n               callbacks=[checkpoint]) #add/remove lr scheduler here\n'

In [6]:
model.evaluate(test_images, test_labels, 128)



[0.48611411452293396, 0.9275000095367432]

In [None]:
model.save("avg_pool_model.h5")