# Intro to AI-driven Science on Supercomputers
## Week 3 Homework

#### Dan Horner (danhorner@berkeley.edu)¶
---

# Improving CIFAR-10 dataset classification with CNNs

## CIFAR-10 data set
In this homework, we use the CIFAR-10 data set, which contains 32x32 color images from 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

The original training image data (x_train_orig) is a 3rd-order tensor of size (50000, 32, 32), i.e. it consists of 50000 images of size 32x32 pixels, while y_train_orig is a 50000-dimensional vector containing the correct classes ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') for each training sample.

Since we are trying to evaluate performance of different models, we will be using a validation data set taken from the original training data set (80% train & 20% validation.).

## Set Up and Data Import


In [1]:
%matplotlib inline

import tensorflow as tf

import numpy as np

from sklearn.model_selection import train_test_split
np.random.seed(1)

import matplotlib.pyplot as plt
import time

from image_dataset_loader import load

In [2]:
# Data import copied from in-class notebook, and adapted for Train/Validation/Test

(x_train_orig, y_train_orig), (x_test, y_test) = load('../03_neural_networks_tensorflow/cifar10', ['train', 'test'])

x_train_orig = x_train_orig.astype(np.float32)
x_test  = x_test.astype(np.float32)

# Normalize values [0-1]
x_train_orig /= 255.
x_test  /= 255.

y_train_orig = y_train_orig.astype(np.int32)
y_test  = y_test.astype(np.int32)

#Train / validation split
x_train_i, x_val, y_train_i, y_val = train_test_split(x_train_orig, y_train_orig, test_size=0.2)

print('CIFAR-10 data loaded: train:',len(x_train_orig),'test:',len(x_test))
print('X_train:', x_train_i.shape)
print('y_train:', y_train_i.shape)
print('X_val:', x_val.shape)
print('y_val:', y_val.shape)
print('X_test:', x_test.shape)
print('y_test:', y_test.shape)

CIFAR-10 data loaded: train: 50000 test: 10000
X_train: (40000, 32, 32, 3)
y_train: (40000,)
X_val: (10000, 32, 32, 3)
y_val: (10000,)
X_test: (10000, 32, 32, 3)
y_test: (10000,)


---

## Model Training

### Function Definitions

In [44]:
class CIFAR10Classifier(tf.keras.models.Model):

    def __init__(self, activation=tf.nn.tanh, dropout=(0.25, 0.50), hl = (32, 64, 128)):
        tf.keras.models.Model.__init__(self)

        self.conv_1 = tf.keras.layers.Conv2D(hl[0], [3, 3], activation='relu')
        self.conv_2 = tf.keras.layers.Conv2D(hl[1], [3, 3], activation='relu')
        self.pool_3 = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))
        self.drop_4 = tf.keras.layers.Dropout(dropout[0])
        self.dense_5 = tf.keras.layers.Dense(hl[2], activation='relu')
        self.drop_6 = tf.keras.layers.Dropout(dropout[1])
        self.dense_7 = tf.keras.layers.Dense(10, activation='softmax')

    def call(self, inputs):

        x = self.conv_1(inputs)
        x = self.conv_2(x)
        x = self.pool_3(x)
        x = self.drop_4(x)
        x = tf.keras.layers.Flatten()(x)
        x = self.dense_5(x)
        x = self.drop_6(x)
        x = self.dense_7(x)

        return x

In [45]:
def train_network_concise(_x_train, _y_train, _batch_size, _n_training_epochs, _lr, _dropout, _hl):
    model = CIFAR10Classifier(dropout = _dropout, hl =_hl)
    model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=['accuracy'])    
    history = model.fit(_x_train, _y_train, batch_size=_batch_size, epochs=_n_training_epochs, verbose=2)
    
    return history, model

In [47]:
epochs = 3
bs_vals = (128,)#(128, 256, 512)
lr_vals = (0.1,)#(0.01, 0.05, 0.1)
do_vals = (0.10, 0.25, 0.50)
hl_vals = (32, 64)

n_runs = len(bs_vals) * len(lr_vals) * len(hl_vals) * len(hl_vals) * len(hl_vals)

runs_li = []
i = 0
for bs in bs_vals:
    for lr in lr_vals:
        for do0 in (0.50,):
            for do1 in (0.25,):
                for hl0 in hl_vals:
                    for hl1 in hl_vals:
                        for hl2 in hl_vals:
                            do = (do0, do1)
                            hl = (hl0, hl1, hl2)
                            i += 1
                            print('Hyperparameter run:', i, '/', n_runs)
                            print(bs, lr, do, hl)
                            history_i, model_i = train_network_concise(x_train_i, y_train_i, bs, epochs, lr, do, hl)
                            scores = model_i.evaluate(x_val, y_val, verbose=0)
                            print(scores)
                            runs_li.append({'scores': scores, 'model': model_i, 'bs': bs, 'lr': lr, 'do': do, 'hl': hl})
                
                

Hyperparameter run: 1 / 8
128 0.1 (0.5, 0.25) (32, 32, 32)
Epoch 1/3
313/313 - 32s - loss: 1.7650 - accuracy: 0.3538 - 32s/epoch - 103ms/step
Epoch 2/3
313/313 - 31s - loss: 1.4859 - accuracy: 0.4606 - 31s/epoch - 100ms/step
Epoch 3/3
313/313 - 30s - loss: 1.3795 - accuracy: 0.5073 - 30s/epoch - 97ms/step
[1.2472456693649292, 0.5641999840736389]
Hyperparameter run: 2 / 8
128 0.1 (0.5, 0.25) (32, 32, 64)
Epoch 1/3
313/313 - 32s - loss: 1.7040 - accuracy: 0.3764 - 32s/epoch - 102ms/step
Epoch 2/3
313/313 - 31s - loss: 1.3908 - accuracy: 0.4999 - 31s/epoch - 98ms/step
Epoch 3/3
313/313 - 31s - loss: 1.2619 - accuracy: 0.5507 - 31s/epoch - 100ms/step
[1.1363106966018677, 0.6004999876022339]
Hyperparameter run: 3 / 8
128 0.1 (0.5, 0.25) (32, 64, 32)
Epoch 1/3
313/313 - 48s - loss: 1.8064 - accuracy: 0.3336 - 48s/epoch - 152ms/step
Epoch 2/3
313/313 - 46s - loss: 1.4924 - accuracy: 0.4547 - 46s/epoch - 148ms/step
Epoch 3/3
313/313 - 47s - loss: 1.3745 - accuracy: 0.4961 - 47s/epoch - 149ms/s

In [32]:
i = 0

i_best = 0
val_acc_best = 0
for r in runs_li:
    val_acc = r['scores'][1]
    if(val_acc > val_acc_best):
        i_best = i
        val_acc_best = val_acc
    i += 1
print(i_best, val_acc_best)
print(runs_li[i_best]['bs'])
print(runs_li[i_best]['lr'])
print(runs_li[i_best]['do'])
print(runs_li[i_best]['hl'])

scores_test = model_i.evaluate(x_test, y_test, verbose=0)
print(scores_test[1])

25 0.6464999914169312
128
0.1
(0.5, 0.25)
0.6040999889373779


In [108]:
def compute_accuracy(model, x, y_true):
    y_pred = np.argmax(model.predict(x), axis = 1)
    N = y_pred.shape[0]
    acc = (y_true == y_pred).sum() / N
    return acc

0.6083

### Function Definitions


In [54]:
def compute_loss(y_true, y_pred):
    # if labels are integers, use sparse categorical crossentropy
    # network's final layer is softmax, so from_logtis=False
    scce = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
   
    return scce(y_true, y_pred)  

In [47]:
def forward_pass(model, batch_data, y_true):
    y_pred = model(batch_data)
    loss = compute_loss(y_true, y_pred)
    return loss

In [48]:
# Here is a function that will manage the training loop for us:

def train_loop(batch_size, n_training_epochs, model, opt):
    
    @tf.function()
    def train_iteration(data, y_true, model, opt):
        with tf.GradientTape() as tape:
            loss = forward_pass(model, data, y_true)

        trainable_vars = model.trainable_variables

        # Apply the update to the network (one at a time):
        grads = tape.gradient(loss, trainable_vars)

        opt.apply_gradients(zip(grads, trainable_vars))
        return loss

    for i_epoch in range(n_training_epochs):
        print("beginning epoch %d" % i_epoch)
        start = time.time()

        epoch_steps = int(40000/batch_size)
        dataset.shuffle(40000) # Shuffle the whole dataset in memory
        batches = dataset.batch(batch_size=batch_size, drop_remainder=True)
        
        for i_batch, (batch_data, y_true) in enumerate(batches):
            batch_data = tf.reshape(batch_data, [-1, 32, 32, 3])
            loss = train_iteration(batch_data, y_true, model, opt)
        
        end = time.time()
        print("took %1.1f seconds for epoch #%d" % (end-start, i_epoch))
    

# Homework: improve the accuracy of this model

Update this notebook to ensure more accuracy. How high can it be raised? Changes like increasing the number of epochs, altering the learning weight, altering the number of neurons the hidden layer, chnaging the optimizer, etc. could be made directly in the notebook. You can also change the model specification by expanding the network's layer. The current notebook's training accuracy is roughly 58.69%, although it varies randomly.