# Image Classification Using CIFAR-10 dataset

The CIFAR-10 (Canadian Institute For Advanced Research) dataset contains 60,000 32x32 color images. Each image is labeled with one of the following 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. There are 50000 training images and 10000 test images. 

# Table of Contents

This notebook has 5 parts.  You will practice TensorFlow on three different levels of abstraction.

1. Part I, Preparation: load the CIFAR-10 dataset.
2. Part II, Barebone TensorFlow: **Abstraction Level 1**, we will work directly with low-level TensorFlow graphs. 
3. Part III, Keras Model API: **Abstraction Level 2**, we will use `tf.keras.Model` to define arbitrary neural network architecture. 
4. Part IV, Keras Sequential + Functional API: **Abstraction Level 3**, we will use `tf.keras.Sequential` to define a linear feed-forward network very conveniently, and then explore the functional libraries for building unique and uncommon models that require more flexibility.
5. Part V, Tuning: Experiment with different architectures, activation functions, weight initializations, optimizers, hyperparameters, regularizations or other advanced features. Your goal is to get accuracy as high as possible on CIFAR-10 (without using convolutional layers).

We will discuss Keras in more detail later in the notebook.

Here is a table of comparison:

| API           | Flexibility | Convenience |
|---------------|-------------|-------------|
| Barebone      | High        | Low         |
| `tf.keras.Model`     | High        | Medium      |
| `tf.keras.Sequential` | Low         | High        |

# Part I: Preparation

First, we load the CIFAR-10 dataset. The downloading might take a couple minutes the first time you do it, but the files should stay cached after that. 

The `tf.keras.datasets` package in TensorFlow provides prebuilt utility functions for loading many common datasets. For the purposes of this assignment we will write our own code to preprocess the data and iterate through it in minibatches. The `tf.data` package in TensorFlow provides tools for automating this process, but working with this package adds extra complication and is beyond the scope of this notebook. However using `tf.data` can be much more efficient than the simple approach used in this notebook, so you should consider using it for your project.

In [3]:
import os
#import tensorflow as tf
#importing the required project dependencies
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
from tensorflow.python.framework import ops


import numpy as np
import math
import timeit
import matplotlib.pyplot as plt
from datetime import datetime
from keras import regularizers

from keras.preprocessing.image import ImageDataGenerator
from scipy import stats



%matplotlib inline

TensorFlow 2.x selected.


Using TensorFlow backend.


In [4]:
print(tf.__version__)

2.0.0-rc2


In [5]:
cifar10 = tf.keras.datasets.cifar10.load_data()

def load_cifar10(num_training=49000, num_validation=1000, num_test=10000):
    """
    Fetch the CIFAR-10 dataset from the web and perform preprocessing to prepare
    it for the two-layer neural net classifier.
    """
    # Load the raw CIFAR-10 dataset and use appropriate data types and shapes
    (X_train, y_train), (X_test, y_test) = cifar10
    X_train = np.asarray(X_train, dtype=np.float32)
    print("X_trian shape", X_train.shape)
    y_train = np.asarray(y_train, dtype=np.int32).flatten()
    X_test = np.asarray(X_test, dtype=np.float32)
    y_test = np.asarray(y_test, dtype=np.int32).flatten()

    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean pixel and divide by std
    mean_pixel = X_train.mean(axis=(0, 1, 2), keepdims=True)
    std_pixel = X_train.std(axis=(0, 1, 2), keepdims=True)
    X_train = (X_train - mean_pixel) / std_pixel
    X_val = (X_val - mean_pixel) / std_pixel
    X_test = (X_test - mean_pixel) / std_pixel

    return X_train, y_train, X_val, y_val, X_test, y_test
  
def load_datagen(x_train, y_train):
  train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
  x = 0;
  for x_batch, y_batch in train_datagen.flow(x_train, y_train, batch_size=32):
      x_new_train =np.concatenate((x_train, x_batch))
      y_new_train = np.concatenate((y_train, y_batch))
      x = x + 1     
      if(x > 200):
        break
  x_new_train = np.asarray(x_new_train, dtype = np.int32)
  y_new_train = np.asarray(y_new_train, dtype = np.int32).flatten()
  return x_new_train, y_new_train

# Invoke the above function to get our data.
NHW = (0, 1, 2)
X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape, y_train.dtype)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
X_trian shape (50000, 32, 32, 3)
Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,) int32
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)


In [0]:
class Dataset(object):
    def __init__(self, X, y, batch_size, shuffle=False):
        """
        Construct a Dataset object to iterate over data X and labels y
        
        Inputs:
        - X: Numpy array of data, of any shape
        - y: Numpy array of labels, of any shape but with y.shape[0] == X.shape[0]
        - batch_size: Integer giving number of elements per minibatch
        - shuffle: (optional) Boolean, whether to shuffle the data on each epoch
        """
        assert X.shape[0] == y.shape[0], 'Got different numbers of data and labels'
        self.X, self.y = X, y
        self.batch_size, self.shuffle = batch_size, shuffle

    def __iter__(self):
        N, B = self.X.shape[0], self.batch_size
        idxs = np.arange(N)
        if self.shuffle:
            np.random.shuffle(idxs)
        return iter((self.X[i:i+B], self.y[i:i+B]) for i in range(0, N, B))


train_dset = Dataset(X_train, y_train, batch_size=64, shuffle=True)
val_dset = Dataset(X_val, y_val, batch_size=64, shuffle=False)
test_dset = Dataset(X_test, y_test, batch_size=64)

In [7]:
# We can iterate through a dataset like this:
for t, (x, y) in enumerate(train_dset):
    print(t, x.shape, y.shape)
    if t > 5: break

0 (64, 32, 32, 3) (64,)
1 (64, 32, 32, 3) (64,)
2 (64, 32, 32, 3) (64,)
3 (64, 32, 32, 3) (64,)
4 (64, 32, 32, 3) (64,)
5 (64, 32, 32, 3) (64,)
6 (64, 32, 32, 3) (64,)


You can optionally **use GPU by setting the flag to True below**. It's not neccessary to use a GPU for this assignment; if you are working on Google Cloud then we recommend that you do not use a GPU, as it will be significantly more expensive.

In [8]:
# Set up some global variables
USE_GPU = False

if USE_GPU:
    device = '/device:GPU:0'
else:
    device = '/cpu:0'

# Constant to control how often we print when training models
print_every = 100

print('Using device: ', device)

Using device:  /cpu:0


# Part II: Barebones TensorFlow
TensorFlow comes with various high-level APIs which make it very convenient to define and train neural networks; we will cover some of these constructs in Part III and Part IV of this notebook. In this section, we will start by building a model with basic TensorFlow constructs to help you better understand what's going on under the hood of the higher-level APIs.

**"Barebones Tensorflow" is important to understanding the building blocks of TensorFlow, but much of it involves concepts from TensorFlow 1.x.** We will be working with legacy modules such as `tf.Variable`.

Therefore, please read and understand the differences between legacy (1.x) TF and the new (2.0) TF.

### Historical background on TensorFlow 1.x

TensorFlow 1.x is primarily a framework for working with **static computational graphs**. Nodes in the computational graph are Tensors which will hold n-dimensional arrays when the graph is run; edges in the graph represent functions that will operate on Tensors when the graph is run to actually perform useful computation.

Before Tensorflow 2.0, we had to configure the graph into two phases. There are plenty of tutorials online that explain this two-step process. The process generally looks like the following for TF 1.x:
1. **Build a computational graph that describes the computation that you want to perform**. This stage doesn't actually perform any computation; it just builds up a symbolic representation of your computation. This stage will typically define one or more `placeholder` objects that represent inputs to the computational graph.
2. **Run the computational graph many times.** Each time the graph is run (e.g. for one gradient descent step) you will specify which parts of the graph you want to compute, and pass a `feed_dict` dictionary that will give concrete values to any `placeholder`s in the graph.

### The new paradigm in Tensorflow 2.0
Now, with Tensorflow 2.0, we can simply adopt a functional form that is more Pythonic and similar in spirit to PyTorch and direct Numpy operation. Instead of the 2-step paradigm with computation graphs, making it (among other things) easier to debug TF code. You can read more details at https://www.tensorflow.org/guide/eager.

The main difference between the TF 1.x and 2.0 approach is that the 2.0 approach doesn't make use of `tf.Session`, `tf.run`, `placeholder`, `feed_dict`. To get more details of what's different between the two version and how to convert between the two, check out the official migration guide: https://www.tensorflow.org/alpha/guide/migration_guide

Later, in the rest of this notebook we'll focus on this new, simpler approach.

### TensorFlow warmup: Flatten Function

We can see this in action by defining a simple `flatten` function that will reshape image data for use in a fully-connected network.

In TensorFlow, data for convolutional feature maps is typically stored in a Tensor of shape N x H x W x C where:

- N is the number of datapoints (minibatch size)
- H is the height of the feature map
- W is the width of the feature map
- C is the number of channels in the feature map

This is the right way to represent the data when using convolutional neural networks (we will explore CNNs in a future assignment). When we use fully connected linear/affine layers to process the image, however, we want each datapoint to be represented by a single vector -- it's no longer useful to segregate the different channels, rows, and columns of the data. So, we use a "flatten" operation to collapse the `H x W x C` values per representation into a single long vector. 

Notice the `tf.reshape` call has the target shape as `(N, -1)`, meaning it will reshape/keep the first dimension to be N, and then infer as necessary what the second dimension is in the output, so we can collapse the remaining dimensions from the input properly.

**NOTE**: TensorFlow and PyTorch differ on the default Tensor layout; TensorFlow uses N x H x W x C but PyTorch uses N x C x H x W.

In [0]:
def flatten(x):
    """    
    Input:
    - TensorFlow Tensor of shape (N, D1, ..., DM)
    
    Output:
    - TensorFlow Tensor of shape (N, D1 * ... * DM)
    """
    N = tf.shape(x)[0]
    return tf.reshape(x, (N, -1))

In [0]:
def test_flatten():
    # Construct concrete values of the input data x using numpy
    x_np = np.arange(24).reshape((2, 3, 4))
    print('x_np:\n', x_np, '\n')
    # Compute a concrete output value.
    x_flat_np = flatten(x_np)
    print('x_flat_np:\n', x_flat_np, '\n')

test_flatten()

x_np:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]] 

x_flat_np:
 tf.Tensor(
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]], shape=(2, 12), dtype=int64) 



### Barebones TensorFlow: Define a Two-Layer Network
We will now implement our first neural network with TensorFlow: a fully-connected ReLU network with two hidden layers and no biases on the CIFAR10 dataset. For now we will use only low-level TensorFlow operators to define the network; later we will see how to use the higher-level abstractions provided by `tf.keras` to simplify the process.

We will define the forward pass of the network in the function `two_layer_fc`; this will accept TensorFlow Tensors for the inputs and weights of the network, and return a TensorFlow Tensor for the scores. 

After defining the network architecture in the `two_layer_fc` function, we will test the implementation by checking the shape of the output.

**It's important that you read and understand this implementation.**

In [0]:
def two_layer_fc(x, params):
    """
    A fully-connected neural network; the architecture is:
    fully-connected layer -> ReLU -> fully connected layer.
    Note that we only need to define the forward pass here; TensorFlow will take
    care of computing the gradients for us.
    
    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.

    Inputs:
    - x: A TensorFlow Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of TensorFlow Tensors giving weights for the
      network, where w1 has shape (D, H) and w2 has shape (H, C).
    
    Returns:
    - scores: A TensorFlow Tensor of shape (N, C) giving classification scores
      for the input data x.
    """
    w1, w2 = params                   # Unpack the parameters
    x = flatten(x)                    # Flatten the input; now x has shape (N, D)
    h = tf.nn.relu(tf.matmul(x, w1))  # Hidden layer: h has shape (N, H)
    scores = tf.matmul(h, w2)         # Compute scores of shape (N, C)
    return scores

In [0]:
def two_layer_fc_test():
    hidden_layer_size = 42

    # Scoping our TF operations under a tf.device context manager 
    # lets us tell TensorFlow where we want these Tensors to be
    # multiplied and/or operated on, e.g. on a CPU or a GPU.
    with tf.device(device):        
        x = tf.zeros((64, 32, 32, 3))
        w1 = tf.zeros((32 * 32 * 3, hidden_layer_size))
        w2 = tf.zeros((hidden_layer_size, 10))
        print(w1.shape)
        print(w2.shape)
        print(x.shape)

        # Call our two_layer_fc function for the forward pass of the network.
        scores = two_layer_fc(x, [w1, w2])

    print(scores.shape)

two_layer_fc_test()

(3072, 42)
(42, 10)
(64, 32, 32, 3)
(64, 10)


In [0]:
def three_layer_fc(x, params):
    """
    A fully-connected neural network; the architecture is:
    fully-connected layer -> ReLU -> fully connected layer.
    Note that we only need to define the forward pass here; TensorFlow will take
    care of computing the gradients for us.
    
    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.

    Inputs:
    - x: A TensorFlow Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of TensorFlow Tensors giving weights for the
      network, where w1 has shape (D, H) and w2 has shape (H, C).
    
    Returns:
    - scores: A TensorFlow Tensor of shape (N, C) giving classification scores
      for the input data x.
    """
    w1, w2, w3 = params                   # Unpack the parameters
    x = flatten(x)                    # Flatten the input; now x has shape (N, D)
    h1 = tf.nn.relu(tf.matmul(x, w1))  # Hidden layer: h has shape (N, H)
    h2 = tf.nn.relu(tf.matmul(h1, w2))
    scores = tf.matmul(h2, w3)         # Compute scores of shape (N, C)
    return scores

In [0]:
def three_layer_fc_test():
    hidden_layer_size = 42

    # Scoping our TF operations under a tf.device context manager 
    # lets us tell TensorFlow where we want these Tensors to be
    # multiplied and/or operated on, e.g. on a CPU or a GPU.
    with tf.device(device):        
        x = tf.zeros((64, 32, 32, 3))
        w1 = tf.zeros((32 * 32 * 3, hidden_layer_size))        
        w2 = tf.zeros((hidden_layer_size, hidden_layer_size))
        w3 = tf.zeros((hidden_layer_size, 10))
        
        # Call our two_layer_fc function for the forward pass of the network.
        scores = three_layer_fc(x, [w1, w2, w3])

    print(scores.shape)

three_layer_fc_test()

(64, 10)


In [0]:
def three_layer_convnet(x, params):
    """
    A three-layer convolutional network with the architecture described above.
    
    Inputs:
    - x: A TensorFlow Tensor of shape (N, H, W, 3) giving a minibatch of images
    - params: A list of TensorFlow Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: TensorFlow Tensor of shape (KH1, KW1, 3, channel_1) giving
        weights for the first convolutional layer.
      - conv_b1: TensorFlow Tensor of shape (channel_1,) giving biases for the
        first convolutional layer.
      - conv_w2: TensorFlow Tensor of shape (KH2, KW2, channel_1, channel_2)
        giving weights for the second convolutional layer
      - conv_b2: TensorFlow Tensor of shape (channel_2,) giving biases for the
        second convolutional layer.
      - fc_w: TensorFlow Tensor giving weights for the fully-connected layer.
        Can you figure out what the shape should be?
      - fc_b: TensorFlow Tensor giving biases for the fully-connected layer.
        Can you figure out what the shape should be?
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
    ############################################################################
    # TODO: Implement the forward pass for the three-layer ConvNet.            #
    ############################################################################
    paddings = tf.constant([[0,0], [2,2], [2,2], [0,0]])
    x = tf.pad(x, paddings, 'CONSTANT')
    conv1 = tf.nn.conv2d(x, conv_w1, strides=[1,1,1,1], padding="VALID")+conv_b1

    relu1 = tf.nn.relu(conv1)
    
    paddings = tf.constant([[0,0], [1,1], [1,1], [0,0]])
    conv1 = tf.pad(conv1, paddings, 'CONSTANT')
    conv2 = tf.nn.conv2d(conv1, conv_w2, strides=[1,1,1,1], padding="VALID")+conv_b2
    relu2 = tf.nn.relu(conv2)
    
    relu2 = flatten(relu2)
    scores = tf.matmul(relu2, fc_w) + fc_b
    ############################################################################
    #                              END OF YOUR CODE                            #
    ############################################################################
    return scores

In [0]:
def three_layer_convnet_test():
   # ops.reset_default_graph()   

    with tf.device(device):
        x = tf.zeros((74, 32, 32, 3))
        conv_w1 = tf.zeros((5, 5, 3, 6))
        conv_b1 = tf.zeros((6,))
        conv_w2 = tf.zeros((3, 3, 6, 9))
        conv_b2 = tf.zeros((9,))
        fc_w = tf.zeros((32 * 32 * 9, 10))
        fc_b = tf.zeros((10,))
        params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
        scores = three_layer_convnet(x, params)
        print(scores.shape)

with tf.device('/cpu:0'):
    three_layer_convnet_test()

(74, 10)


### Barebones TensorFlow: Training Step

We now define the `training_step` function performs a single training step. This will take three basic steps:

1. Compute the loss
2. Compute the gradient of the loss with respect to all network weights
3. Make a weight update step using (stochastic) gradient descent.


We need to use a few new TensorFlow functions to do all of this:
- For computing the cross-entropy loss we'll use `tf.nn.sparse_softmax_cross_entropy_with_logits`: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/nn/sparse_softmax_cross_entropy_with_logits

- For averaging the loss across a minibatch of data we'll use `tf.reduce_mean`:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/reduce_mean

- For computing gradients of the loss with respect to the weights we'll use `tf.GradientTape` (useful for Eager execution):  https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape

- We'll mutate the weight values stored in a TensorFlow Tensor using `tf.assign_sub` ("sub" is for subtraction): https://www.tensorflow.org/api_docs/python/tf/assign_sub 


In [0]:
def training_step(model_fn, x, y, params, learning_rate):
    with tf.GradientTape() as tape:
        scores = model_fn(x, params) # Forward pass of the model
        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=scores)
        total_loss = tf.reduce_mean(loss)
        grad_params = tape.gradient(total_loss, params)

        # Make a vanilla gradient descent step on all of the model parameters
        # Manually update the weights using assign_sub()
        for w, grad_w in zip(params, grad_params):
            w.assign_sub(learning_rate * grad_w)
                        
        return total_loss

In [0]:
def train_part2(model_fn, init_fn, learning_rate):
    """
    Train a model on CIFAR-10.
    
    Inputs:
    - model_fn: A Python function that performs the forward pass of the model
      using TensorFlow; it should have the following signature:
      scores = model_fn(x, params) where x is a TensorFlow Tensor giving a
      minibatch of image data, params is a list of TensorFlow Tensors holding
      the model weights, and scores is a TensorFlow Tensor of shape (N, C)
      giving scores for all elements of x.
    - init_fn: A Python function that initializes the parameters of the model.
      It should have the signature params = init_fn() where params is a list
      of TensorFlow Tensors holding the (randomly initialized) weights of the
      model.
    - learning_rate: Python float giving the learning rate to use for SGD.
    """
    
    
    params = init_fn()  # Initialize the model parameters            
        
    for t, (x_np, y_np) in enumerate(train_dset):
        # Run the graph on a batch of training data.
        loss = training_step(model_fn, x_np, y_np, params, learning_rate)
        
        # Periodically print the loss and check accuracy on the val set.
        if t % print_every == 0:
            print('Iteration %d, loss = %.4f' % (t, loss))
            check_accuracy(val_dset, x_np, model_fn, params)

In [0]:
def check_accuracy(dset, x, model_fn, params):
    """
    Check accuracy on a classification model, e.g. for validation.
    
    Inputs:
    - dset: A Dataset object against which to check accuracy
    - x: A TensorFlow placeholder Tensor where input images should be fed
    - model_fn: the Model we will be calling to make predictions on x
    - params: parameters for the model_fn to work with
      
    Returns: Nothing, but prints the accuracy of the model
    """
    num_correct, num_samples = 0, 0
    for x_batch, y_batch in dset:
        scores_np = model_fn(x_batch, params).numpy()
        y_pred = scores_np.argmax(axis=1)
        num_samples += x_batch.shape[0]
        num_correct += (y_pred == y_batch).sum()
    acc = float(num_correct) / num_samples
    print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

### Barebones TensorFlow: Initialization
We'll use the following utility method to initialize the weight matrices for our models using Kaiming's normalization method.

[1] He et al, *Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
*, ICCV 2015, https://arxiv.org/abs/1502.01852

In [0]:
def create_matrix_with_kaiming_normal(shape):
    if len(shape) == 2:
        fan_in, fan_out = shape[0], shape[1]
    elif len(shape) == 4:
        fan_in, fan_out = np.prod(shape[:3]), shape[3]
    return tf.keras.backend.random_normal(shape) * np.sqrt(2.0 / fan_in)

### Barebones TensorFlow: Train a Two-Layer Network
We are finally ready to use all of the pieces defined above to train a two-layer fully-connected network on CIFAR-10.

We just need to define a function to initialize the weights of the model, and call `train_part2`.

Defining the weights of the network introduces another important piece of TensorFlow API: `tf.Variable`. A TensorFlow Variable is a Tensor whose value is stored in the graph and persists across runs of the computational graph; however unlike constants defined with `tf.zeros` or `tf.random_normal`, the values of a Variable can be mutated as the graph runs; these mutations will persist across graph runs. Learnable parameters of the network are usually stored in Variables.

Without any hyperparameter tuning, you should achieve validation accuracies above 40% after one epoch of training.

In [0]:
def two_layer_fc_init():
    """
    Initialize the weights of a two-layer network, for use with the
    two_layer_network function defined above. 
    You can use the `create_matrix_with_kaiming_normal` helper!
    
    Inputs: None
    
    Returns: A list of:
    - w1: TensorFlow tf.Variable giving the weights for the first layer
    - w2: TensorFlow tf.Variable giving the weights for the second layer
    """
    
    hidden_layer_size = 4000
    w1 = tf.Variable(create_matrix_with_kaiming_normal((3 * 32 * 32, 4000)))
    w2 = tf.Variable(create_matrix_with_kaiming_normal((4000, 10)))
    return [w1, w2]

learning_rate = 1e-2
train_part2(two_layer_fc, two_layer_fc_init, learning_rate)

Iteration 0, loss = 2.7569
Got 126 / 1000 correct (12.60%)
Iteration 100, loss = 1.8546
Got 392 / 1000 correct (39.20%)


KeyboardInterrupt: ignored

In [0]:
def three_layer_fc_init():
    """
    Initialize the weights of a three-layer network, for use with the
    two_layer_network function defined above. 
    You can use the `create_matrix_with_kaiming_normal` helper!
    
    Inputs: None
    
    Returns: A list of:
    - w1: TensorFlow tf.Variable giving the weights for the first layer
    - w2: TensorFlow tf.Variable giving the weights for the second layer
    """
    
    hidden_layer_size = 4000
    w1 = tf.Variable(create_matrix_with_kaiming_normal((3 * 32 * 32, 4000)))
    w2 = tf.Variable(create_matrix_with_kaiming_normal((4000, 4000)))
    w3 = tf.Variable(create_matrix_with_kaiming_normal((4000, 10)))
    return [w1, w2, w3]

learning_rate = 1e-2
train_part2(three_layer_fc, three_layer_fc_init, learning_rate)

Iteration 0, loss = 2.9402
Got 140 / 1000 correct (14.00%)
Iteration 100, loss = 1.8079
Got 420 / 1000 correct (42.00%)
Iteration 200, loss = 1.4307
Got 425 / 1000 correct (42.50%)
Iteration 300, loss = 1.7175
Got 402 / 1000 correct (40.20%)
Iteration 400, loss = 1.6054
Got 453 / 1000 correct (45.30%)
Iteration 500, loss = 1.7504
Got 458 / 1000 correct (45.80%)
Iteration 600, loss = 1.7125
Got 472 / 1000 correct (47.20%)
Iteration 700, loss = 1.6645
Got 443 / 1000 correct (44.30%)


In [0]:
def three_layer_conv_init():
    """
    Initialize the weights of a three-layer network, for use with the
    two_layer_network function defined above. 
    You can use the `create_matrix_with_kaiming_normal` helper!
    
    Inputs: None
    
    Returns: A list of:
    - w1: TensorFlow tf.Variable giving the weights for the first layer
    - w2: TensorFlow tf.Variable giving the weights for the second layer
    """
    conv_w1 = tf.Variable(create_matrix_with_kaiming_normal([5, 5, 3, 32]))
    conv_b1 = tf.Variable(np.zeros([32]), dtype=tf.float32)
    conv_w2 = tf.Variable(create_matrix_with_kaiming_normal([3, 3, 32, 16]))
    conv_b2 = tf.Variable(np.zeros([16]), dtype=tf.float32)
    fc_w = tf.Variable(create_matrix_with_kaiming_normal([32*32*16,10]))
    fc_b = tf.Variable(np.zeros([10]), dtype=tf.float32)
    params = (conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b)
    return params

learning_rate = 3e-3
train_part2(three_layer_convnet, three_layer_conv_init, learning_rate)

Iteration 0, loss = 3.5214
Got 119 / 1000 correct (11.90%)


KeyboardInterrupt: ignored

# Part III: Keras Model Subclassing API

Implementing a neural network using the low-level TensorFlow API is a good way to understand how TensorFlow works, but it's a little inconvenient - we had to manually keep track of all Tensors holding learnable parameters. This was fine for a small network, but could quickly become unweildy for a large complex model.

Fortunately TensorFlow 2.0 provides higher-level APIs such as `tf.keras` which make it easy to build models out of modular, object-oriented layers. Further, TensorFlow 2.0 uses eager execution that evaluates operations immediately, without explicitly constructing any computational graphs. This makes it easy to write and debug models, and reduces the boilerplate code.

In this part of the notebook we will define neural network models using the `tf.keras.Model` API. To implement your own model, you need to do the following:

1. Define a new class which subclasses `tf.keras.Model`. Give your class an intuitive name that describes it, like `TwoLayerFC` or `ThreeLayerConvNet`.
2. In the initializer `__init__()` for your new class, define all the layers you need as class attributes. The `tf.keras.layers` package provides many common neural-network layers, like `tf.keras.layers.Dense` for fully-connected layers and `tf.keras.layers.Conv2D` for convolutional layers. Under the hood, these layers will construct `Variable` Tensors for any learnable parameters. **Warning**: Don't forget to call `super(YourModelName, self).__init__()` as the first line in your initializer!
3. Implement the `call()` method for your class; this implements the forward pass of your model, and defines the *connectivity* of your network. Layers defined in `__init__()` implement `__call__()` so they can be used as function objects that transform input Tensors into output Tensors. Don't define any new layers in `call()`; any layers you want to use in the forward pass should be defined in `__init__()`.

After you define your `tf.keras.Model` subclass, you can instantiate it and use it like the model functions from Part II.

### Keras Model Subclassing API: Two-Layer Network

Here is a concrete example of using the `tf.keras.Model` API to define a two-layer network. There are a few new bits of API to be aware of here:

We use an `Initializer` object to set up the initial values of the learnable parameters of the layers; in particular `tf.initializers.VarianceScaling` gives behavior similar to the Kaiming initialization method we used in Part II. You can read more about it here: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/initializers/VarianceScaling

We construct `tf.keras.layers.Dense` objects to represent the two fully-connected layers of the model. In addition to multiplying their input by a weight matrix and adding a bias vector, these layer can also apply a nonlinearity for you. For the first layer we specify a ReLU activation function by passing `activation='relu'` to the constructor; the second layer uses softmax activation function. Finally, we use `tf.keras.layers.Flatten` to flatten the output from the previous fully-connected layer.

In [0]:
class TwoLayerFC(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(TwoLayerFC, self).__init__()        
        initializer = tf.initializers.VarianceScaling(scale=2.0)
        self.fc1 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(num_classes, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
    
    def call(self, x, training=False):
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x


def test_TwoLayerFC():
    """ A small unit test to exercise the TwoLayerFC model above. """
    input_size, hidden_size, num_classes = 50, 42, 10
    x = tf.zeros((64, input_size))
    model = TwoLayerFC(hidden_size, num_classes)
    with tf.device(device):
        scores = model(x)
        print(scores.shape)
        
test_TwoLayerFC()

(64, 10)


In [0]:
class ThreeLayerFC(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(ThreeLayerFC, self).__init__()        
        initializer = tf.initializers.VarianceScaling(scale=2.0)
        self.fc1 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(num_classes, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(num_classes, activation='sigmoid',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
    
    def call(self, x, training=False):
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

def test_ThreeLayerFC():
    """ A small unit test to exercise the TwoLayerFC model above. """
    input_size, hidden_size, num_classes = 50, 42, 10
    x = tf.zeros((64, input_size))
    model = ThreeLayerFC(hidden_size, num_classes)
    with tf.device(device):
        scores = model(x)
        print(scores)
        
test_ThreeLayerFC()  

tf.Tensor(
[[0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
  0.5014939  0.48024526 0.5195698  0.4236802 ]
 [0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
  0.5014939  0.48024526 0.5195698  0.4236802 ]
 [0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
  0.5014939  0.48024526 0.5195698  0.4236802 ]
 [0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
  0.5014939  0.48024526 0.5195698  0.4236802 ]
 [0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
  0.5014939  0.48024526 0.5195698  0.4236802 ]
 [0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
  0.5014939  0.48024526 0.5195698  0.4236802 ]
 [0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
  0.5014939  0.48024526 0.5195698  0.4236802 ]
 [0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
  0.5014939  0.48024526 0.5195698  0.4236802 ]
 [0.49180257 0.5180142  0.48530337 0.5062271  0.53295815 0.54027903
 

In [0]:
class ThreeLayerConv(tf.keras.Model):
    def __init__(self, channel_1, channel_2, num_classes):
        super(ThreeLayerConv, self).__init__()
        ########################################################################
        # TODO: Implement the __init__ method for a three-layer ConvNet. You   #
        # should instantiate layer objects to be used in the forward pass.     #
        ########################################################################   
        initializer = tf.initializers.VarianceScaling(scale=2.0)       
        self.conv1 = tf.keras.layers.Conv2D(channel_1, [5,5], [1,1], padding='valid',
                                      kernel_initializer=initializer,
                                      activation=tf.nn.relu)
        self.conv2 = tf.keras.layers.Conv2D(channel_2, [3,3], [1,1], padding='valid',
                                      kernel_initializer=initializer,
                                      activation=tf.nn.relu)
        self.fc = tf.keras.layers.Dense(num_classes, kernel_initializer=initializer)
        ########################################################################
        #                           END OF YOUR CODE                           #
        ########################################################################
        
    def call(self, x, training=None):
        scores = None
        ########################################################################
        # TODO: Implement the forward pass for a three-layer ConvNet. You      #
        # should use the layer objects defined in the __init__ method.         #
        ########################################################################
        padding = tf.constant([[0,0],[2,2],[2,2],[0,0]])
        x = tf.pad(x, padding, 'CONSTANT')
        x = self.conv1(x)
        padding = tf.constant([[0,0],[1,1],[1,1],[0,0]])
        x = tf.pad(x, padding, 'CONSTANT')
        x = self.conv2(x)
        x = tf.keras.layers.Flatten()(x)
        scores = self.fc(x)
        ########################################################################
        #                           END OF YOUR CODE                           #
        ########################################################################        
        return scores
      
def test_ThreeLayerConv():    
    channel_1, channel_2, num_classes = 12, 8, 10
    model = ThreeLayerConv(channel_1, channel_2, num_classes)
    with tf.device(device):
        x = tf.zeros((64, 3, 32, 32))
        scores = model(x) 
        print(scores)

test_ThreeLayerConv()

tf.Tensor(
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

### Keras Model Subclassing API: Eager Training

While keras models have a builtin training loop (using the `model.fit`), sometimes you need more customization. Here's an example, of a training loop implemented with eager execution.

In particular, notice `tf.GradientTape`. Automatic differentiation is used in the backend for implementing backpropagation in frameworks like TensorFlow. During eager execution, `tf.GradientTape` is used to trace operations for computing gradients later. A particular `tf.GradientTape` can only compute one gradient; subsequent calls to tape will throw a runtime error. 

TensorFlow 2.0 ships with easy-to-use built-in metrics under `tf.keras.metrics` module. Each metric is an object, and we can use `update_state()` to add observations and `reset_state()` to clear all observations. We can get the current result of a metric by calling `result()` on the metric object.

In [0]:
import datetime
def train_part34(model_init_fn, optimizer_init_fn, num_epochs=1, is_training=False):
    """
    Simple training loop for use with models defined using tf.keras. It trains
    a model for one epoch on the CIFAR-10 training set and periodically checks
    accuracy on the CIFAR-10 validation set.
    
    Inputs:
    - model_init_fn: A function that takes no parameters; when called it
      constructs the model we want to train: model = model_init_fn()
    - optimizer_init_fn: A function which takes no parameters; when called it
      constructs the Optimizer object we will use to optimize the model:
      optimizer = optimizer_init_fn()
    - num_epochs: The number of epochs to train for
    
    Returns: Nothing, but prints progress during trainingn
    """    
    with tf.device(device):
        training_start_time = datetime.datetime.now()

        # Compute the loss like we did in Part II
        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
        
        model = model_init_fn()
        optimizer = optimizer_init_fn()
        
        train_loss = tf.keras.metrics.Mean(name='train_loss')
        train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
    
        val_loss = tf.keras.metrics.Mean(name='val_loss')
        val_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='val_accuracy')
        
        t = 0
        for epoch in range(num_epochs):
            
            # Reset the metrics - https://www.tensorflow.org/alpha/guide/migration_guide#new-style_metrics
            train_loss.reset_states()
            train_accuracy.reset_states()
            
            for x_np, y_np in train_dset:
                with tf.GradientTape() as tape:
                    
                    # Use the model function to build the forward pass.
                    scores = model(x_np, training=is_training)
                    loss = loss_fn(y_np, scores)
      
                    gradients = tape.gradient(loss, model.trainable_variables)
                    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
                    
                    # Update the metrics
                    train_loss.update_state(loss)
                    train_accuracy.update_state(y_np, scores)
                    
                    if t % print_every == 0:
                        val_loss.reset_states()
                        val_accuracy.reset_states()
                        for test_x, test_y in val_dset:
                            # During validation at end of epoch, training set to False
                            prediction = model(test_x, training=False)
                            t_loss = loss_fn(test_y, prediction)

                            val_loss.update_state(t_loss)
                            val_accuracy.update_state(test_y, prediction)
                        
                        template = 'Iteration {}, Epoch {}, Loss: {}, Accuracy: {}, Val Loss: {}, Val Accuracy: {}'
                        print (template.format(t, epoch+1,
                                             train_loss.result(),
                                             train_accuracy.result()*100,
                                             val_loss.result(),
                                             val_accuracy.result()*100))
                    t += 1
    training_end_time = datetime.datetime.now()

    time_to_train = training_end_time - training_start_time
  
    dateTimeDifferenceInHours = time_to_train.total_seconds()
    print('Training took {} seconds'.format(dateTimeDifferenceInHours))                   

### Keras Model Subclassing API: Train a Two-Layer Network
We can now use the tools defined above to train a two-layer network on CIFAR-10. We define the `model_init_fn` and `optimizer_init_fn` that construct the model and optimizer respectively when called. Here we want to train the model using stochastic gradient descent with no momentum, so we construct a `tf.keras.optimizers.SGD` function; you can [read about it here](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/optimizers/SGD).

Without any hyperparameter tuning, you should achieve validation accuracies above 40% after one epoch of training.

In [14]:
hidden_size, num_classes = 4000, 10
learning_rate = 1e-2

def model_init_fn():
    return TwoLayerFC(hidden_size, num_classes)

def optimizer_init_fn():
    return tf.keras.optimizers.SGD(learning_rate=learning_rate)

train_part34(model_init_fn, optimizer_init_fn)

NameError: ignored

In [0]:
hidden_size, num_classes = 4000, 10
learning_rate = 1e-2

def model_init_fn():
    return ThreeLayerFC(hidden_size, num_classes)

def optimizer_init_fn():
    return tf.keras.optimizers.SGD(learning_rate=learning_rate)

train_part34(model_init_fn, optimizer_init_fn)

Iteration 0, Epoch 1, Loss: 2.28182315826416, Accuracy: 14.0625, Val Loss: 2.3065717220306396, Val Accuracy: 10.800000190734863
Iteration 100, Epoch 1, Loss: 2.268281936645508, Accuracy: 15.362005233764648, Val Loss: 2.251209020614624, Val Accuracy: 17.5
Iteration 200, Epoch 1, Loss: 2.2586984634399414, Accuracy: 16.81436538696289, Val Loss: 2.2399985790252686, Val Accuracy: 19.400001525878906
Iteration 300, Epoch 1, Loss: 2.2524163722991943, Accuracy: 17.67026710510254, Val Loss: 2.231448173522949, Val Accuracy: 21.19999885559082
Iteration 400, Epoch 1, Loss: 2.2464656829833984, Accuracy: 18.566864013671875, Val Loss: 2.2268855571746826, Val Accuracy: 22.799999237060547
Iteration 500, Epoch 1, Loss: 2.242558002471924, Accuracy: 19.158557891845703, Val Loss: 2.221726894378662, Val Accuracy: 24.200000762939453
Iteration 600, Epoch 1, Loss: 2.2391881942749023, Accuracy: 19.724937438964844, Val Loss: 2.216858148574829, Val Accuracy: 24.69999885559082
Iteration 700, Epoch 1, Loss: 2.235640

In [0]:
learning_rate = 3e-3
channel_1, channel_2, num_classes = 32, 16, 10


def model_init_fn():
    model = None
    ############################################################################
    # TODO: Complete the implementation of model_fn.                           #
    ############################################################################
    return ThreeLayerFC(channel_1, channel_2, num_classes)
    ############################################################################
    #                           END OF YOUR CODE                               #
    ############################################################################
    #return model(inputs)

def optimizer_init_fn():
    optimizer = None
    ############################################################################
    # TODO: Complete the implementation of model_fn.                           #
    ############################################################################
    optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
    ############################################################################
    #                           END OF YOUR CODE                               #
    ############################################################################
    return optimizer

train_part34(model_init_fn, optimizer_init_fn)

Iteration 0, Epoch 1, Loss: 10.008524894714355, Accuracy: 10.9375, Val Loss: 8.58033561706543, Val Accuracy: 7.90000057220459
Iteration 100, Epoch 1, Loss: 9.202701568603516, Accuracy: 10.210396766662598, Val Loss: 10.24263858795166, Val Accuracy: 7.90000057220459
Iteration 200, Epoch 1, Loss: 9.125349044799805, Accuracy: 10.1445894241333, Val Loss: 8.654484748840332, Val Accuracy: 7.90000057220459
Iteration 300, Epoch 1, Loss: 8.839177131652832, Accuracy: 10.210755348205566, Val Loss: 8.654186248779297, Val Accuracy: 7.90000057220459
Iteration 400, Epoch 1, Loss: 8.813014030456543, Accuracy: 10.080267906188965, Val Loss: 9.82066822052002, Val Accuracy: 7.90000057220459
Iteration 500, Epoch 1, Loss: 8.96052360534668, Accuracy: 10.079840660095215, Val Loss: 9.845467567443848, Val Accuracy: 7.90000057220459
Iteration 600, Epoch 1, Loss: 9.072073936462402, Accuracy: 10.134151458740234, Val Loss: 9.845132827758789, Val Accuracy: 7.90000057220459
Iteration 700, Epoch 1, Loss: 9.158257484436

# Part IV: Keras Sequential API
In Part III we introduced the `tf.keras.Model` API, which allows you to define models with any number of learnable layers and with arbitrary connectivity between layers.

However for many models you don't need such flexibility - a lot of models can be expressed as a sequential stack of layers, with the output of each layer fed to the next layer as input. If your model fits this pattern, then there is an even easier way to define your model: using `tf.keras.Sequential`. You don't need to write any custom classes; you simply call the `tf.keras.Sequential` constructor with a list containing a sequence of layer objects.

One complication with `tf.keras.Sequential` is that you must define the shape of the input to the model by passing a value to the `input_shape` of the first layer in your model.

### Keras Sequential API: Two-Layer Network
In this subsection, we will rewrite the two-layer fully-connected network using `tf.keras.Sequential`, and train it using the training loop defined above.

Without any hyperparameter tuning, you should see validation accuracies above 40% after training for one epoch.

In [0]:
learning_rate = 1e-2

def model_init_fn():
    input_shape = (32, 32, 3)
    hidden_layer_size, num_classes = 4000, 10
    initializer = tf.initializers.VarianceScaling(scale=2.0)
    layers = [
        tf.keras.layers.Flatten(input_shape=input_shape),
        tf.keras.layers.Dense(hidden_layer_size, activation='relu',
                              kernel_initializer=initializer),
        tf.keras.layers.BatchNormalization(axis=1),
        tf.keras.layers.Dense(num_classes, activation='softmax', 
                              kernel_initializer=initializer),
    ]
    model = tf.keras.Sequential(layers)
    return model

def optimizer_init_fn():
    return tf.keras.optimizers.SGD(learning_rate=learning_rate) 

train_part34(model_init_fn, optimizer_init_fn)

Iteration 0, Epoch 1, Loss: 3.283254861831665, Accuracy: 4.6875, Val Loss: 3.265125036239624, Val Accuracy: 10.699999809265137
Iteration 100, Epoch 1, Loss: 2.2577285766601562, Accuracy: 28.449874877929688, Val Loss: 1.907272219657898, Val Accuracy: 39.0
Iteration 200, Epoch 1, Loss: 2.085240125656128, Accuracy: 32.45491409301758, Val Loss: 1.8180067539215088, Val Accuracy: 39.599998474121094
Iteration 300, Epoch 1, Loss: 2.0084543228149414, Accuracy: 34.13621139526367, Val Loss: 1.8622419834136963, Val Accuracy: 37.5
Iteration 400, Epoch 1, Loss: 1.940671682357788, Accuracy: 35.89853286743164, Val Loss: 1.7754285335540771, Val Accuracy: 41.60000228881836
Iteration 500, Epoch 1, Loss: 1.8946212530136108, Accuracy: 36.938621520996094, Val Loss: 1.6755846738815308, Val Accuracy: 42.89999771118164
Iteration 600, Epoch 1, Loss: 1.8617912530899048, Accuracy: 37.92377471923828, Val Loss: 1.7067407369613647, Val Accuracy: 42.599998474121094
Iteration 700, Epoch 1, Loss: 1.8350392580032349, Ac

In [0]:
learning_rate = 1e-2

def three_layer_model_init_fn():
    input_shape = (32, 32, 3)
    hidden_layer_size, num_classes = 4000, 10
    initializer = tf.initializers.VarianceScaling(scale=2.0)
    layers = [
        tf.keras.layers.Flatten(input_shape=input_shape),
        tf.keras.layers.Dense(hidden_layer_size, activation='relu',
                              kernel_initializer=initializer),
        tf.keras.layers.BatchNormalization(axis=1),
        tf.keras.layers.Dense(num_classes, activation='softmax', 
                              kernel_initializer=initializer),
        tf.keras.layers.BatchNormalization(axis=1),        
        tf.keras.layers.Dense(num_classes, activation='sigmoid', 
                              kernel_initializer=initializer),
    ]
    model = tf.keras.Sequential(layers)
    return model

def optimizer_init_fn():
    return tf.keras.optimizers.SGD(learning_rate=learning_rate) 

train_part34(three_layer_model_init_fn, optimizer_init_fn)

Iteration 0, Epoch 1, Loss: 2.290614128112793, Accuracy: 12.5, Val Loss: 2.305014133453369, Val Accuracy: 9.5
Iteration 100, Epoch 1, Loss: 2.274724006652832, Accuracy: 11.215965270996094, Val Loss: 2.265923023223877, Val Accuracy: 13.40000057220459
Iteration 200, Epoch 1, Loss: 2.2671642303466797, Accuracy: 12.041356086730957, Val Loss: 2.253378391265869, Val Accuracy: 15.399999618530273
Iteration 300, Epoch 1, Loss: 2.2602460384368896, Accuracy: 13.102158546447754, Val Loss: 2.2413699626922607, Val Accuracy: 18.700000762939453
Iteration 400, Epoch 1, Loss: 2.2535767555236816, Accuracy: 14.685940742492676, Val Loss: 2.2352731227874756, Val Accuracy: 21.799999237060547
Iteration 500, Epoch 1, Loss: 2.248988628387451, Accuracy: 16.136476516723633, Val Loss: 2.2290380001068115, Val Accuracy: 22.30000114440918
Iteration 600, Epoch 1, Loss: 2.244640350341797, Accuracy: 17.22389793395996, Val Loss: 2.225301504135132, Val Accuracy: 22.799999237060547
Iteration 700, Epoch 1, Loss: 2.241050720

In [0]:
def model_init_fn():
    model = None
    ############################################################################
    # TODO: Construct a three-layer ConvNet using tf.keras.Sequential.         #
    ############################################################################
    input_shape = (32,32,3)
    channel_1, channel_2, num_classes = 32, 16, 10
    initializer = tf.initializers.VarianceScaling(scale=2.0)   
    layers = [
        tf.keras.layers.InputLayer(input_shape=input_shape),
        tf.keras.layers.Conv2D(channel_1, [5,5], [1,1], padding='same',
                               kernel_initializer=initializer,
                               activation=tf.nn.relu),
        tf.keras.layers.Conv2D(channel_2, [3,3], [1,1], padding='same',
                               kernel_initializer=initializer,
                               activation=tf.nn.relu),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(num_classes, kernel_initializer=initializer)
    ]
    return tf.keras.Sequential(layers)
    ############################################################################
    #                            END OF YOUR CODE                              #
    ############################################################################
   # return model(inputs)

learning_rate = 5e-4
def optimizer_init_fn():
    optimizer = None
    ############################################################################
    # TODO: Complete the implementation of model_fn.                           #
    ############################################################################
    #optimizer = tf.kerastrain.MomentumOptimizer(learning_rate, momentum=0.9, 
     #                                      use_nesterov=True)
    optimizer =  tf.keras.optimizers.SGD(learning_rate=learning_rate)
    ############################################################################
    #                           END OF YOUR CODE                               #
    ############################################################################
    return optimizer

train_part34(model_init_fn, optimizer_init_fn)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Iteration 0, Epoch 1, Loss: 9.314998626708984, Accuracy: 6.25, Val Loss: 6.736607551574707, Val Accuracy: 11.200000762939453


KeyboardInterrupt: ignored

### Abstracting Away the Training Loop
In the previous examples, we used a customised training loop to train models (e.g. `train_part34`). Writing your own training loop is only required if you need more flexibility and control during training your model. Alternately, you can also use  built-in APIs like `tf.keras.Model.fit()` and `tf.keras.Model.evaluate` to train and evaluate a model. Also remember to configure your model for training by calling `tf.keras.Model.compile.

Without any hyperparameter tuning, you should see validation and test accuracies above 42% after training for one epoch.

In [0]:
model = model_init_fn()
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=learning_rate),
              loss='sparse_categorical_crossentropy',
              metrics=[tf.keras.metrics.sparse_categorical_accuracy])
model.fit(X_train, y_train, batch_size=64, epochs=1, validation_data=(X_val, y_val))
model.evaluate(X_test, y_test)

##  Part IV: Functional API
### Demonstration with a Two-Layer Network 

In the previous section, we saw how we can use `tf.keras.Sequential` to stack layers to quickly build simple models. But this comes at the cost of losing flexibility.

Often we will have to write complex models that have non-sequential data flows: a layer can have **multiple inputs and/or outputs**, such as stacking the output of 2 previous layers together to feed as input to a third! (Some examples are residual connections and dense blocks.)

In such cases, we can use Keras functional API to write models with complex topologies such as:

 1. Multi-input models
 2. Multi-output models
 3. Models with shared layers (the same layer called several times)
 4. Models with non-sequential data flows (e.g. residual connections)

Writing a model with Functional API requires us to create a `tf.keras.Model` instance and explicitly write input tensors and output tensors for this model. 

In [0]:
def two_layer_fc_functional(input_shape, hidden_size, num_classes):  
    initializer = tf.initializers.VarianceScaling(scale=2.0)
    inputs = tf.keras.Input(shape=input_shape)
    flattened_inputs = tf.keras.layers.Flatten()(inputs)
    fc1_output = tf.keras.layers.Dense(hidden_size, activation='relu',
                                 kernel_initializer=initializer)(flattened_inputs)
    scores = tf.keras.layers.Dense(num_classes, activation='softmax',
                             kernel_initializer=initializer)(fc1_output)

    # Instantiate the model given inputs and outputs.
    model = tf.keras.Model(inputs=inputs, outputs=scores)
    return model

def test_two_layer_fc_functional():
    """ A small unit test to exercise the TwoLayerFC model above. """
    input_size, hidden_size, num_classes = 50, 42, 10
    input_shape = (50,)
    
    x = tf.zeros((64, input_size))
    model = two_layer_fc_functional(input_shape, hidden_size, num_classes)
    
    with tf.device(device):
        scores = model(x)
        print(scores.shape)
        
test_two_layer_fc_functional()

(64, 10)


In [0]:
def three_layer_fc_functional(input_shape, hidden_size, num_classes):  
    initializer = tf.initializers.VarianceScaling(scale=2.0)
    inputs = tf.keras.Input(shape=input_shape)
    flattened_inputs = tf.keras.layers.Flatten()(inputs)
    fc1_output = tf.keras.layers.Dense(hidden_size, activation='relu',
                                 kernel_initializer=initializer)(flattened_inputs)
    fc2_output = tf.keras.layers.Dense(num_classes, activation='softmax',
                             kernel_initializer=initializer)(fc1_output)
    scores = tf.keras.layers.Dense(num_classes, activation='sigmoid',
                             kernel_initializer=initializer)(fc2_output)

    # Instantiate the model given inputs and outputs.
    model = tf.keras.Model(inputs=inputs, outputs=scores)
    return model

def test_three_layer_fc_functional():
    """ A small unit test to exercise the TwoLayerFC model above. """
    input_size, hidden_size, num_classes = 50, 42, 10
    input_shape = (50,)
    
    x = tf.zeros((64, input_size))
    model = three_layer_fc_functional(input_shape, hidden_size, num_classes)
    
    with tf.device(device):
        scores = model(x)
        print(scores.shape)
        
test_three_layer_fc_functional()

(64, 10)


### Keras Functional API: Train a Two-Layer Network
You can now train this two-layer network constructed using the functional API.

Without any hyperparameter tuning, but you should see validation accuracies above 40% after training for one epoch.

In [0]:
input_shape = (32, 32, 3)
hidden_size, num_classes = 4000, 10
learning_rate = 1e-2

def model_init_fn():
    return two_layer_fc_functional(input_shape, hidden_size, num_classes)

def optimizer_init_fn():
    return tf.keras.optimizers.SGD(learning_rate=learning_rate)

train_part34(model_init_fn, optimizer_init_fn)

Iteration 0, Epoch 1, Loss: 3.084482431411743, Accuracy: 6.25, Val Loss: 2.7636258602142334, Val Accuracy: 14.0
Iteration 700, Epoch 1, Loss: 1.8224191665649414, Accuracy: 38.69026184082031, Val Loss: 1.6211074590682983, Val Accuracy: 43.599998474121094


In [0]:
input_shape = (32, 32, 3)
hidden_size, num_classes = 4000, 10
learning_rate = 1e-2

def three_layer_model_init_fn():
    return three_layer_fc_functional(input_shape, hidden_size, num_classes)

def optimizer_init_fn():
    return tf.keras.optimizers.SGD(learning_rate=learning_rate)

train_part34(three_layer_model_init_fn, optimizer_init_fn)

Iteration 0, Epoch 1, Loss: 2.308790922164917, Accuracy: 9.375, Val Loss: 2.3053598403930664, Val Accuracy: 10.699999809265137
Iteration 100, Epoch 1, Loss: 2.2764015197753906, Accuracy: 12.12871265411377, Val Loss: 2.2551395893096924, Val Accuracy: 16.599998474121094
Iteration 200, Epoch 1, Loss: 2.260119676589966, Accuracy: 14.692163467407227, Val Loss: 2.236701011657715, Val Accuracy: 18.399999618530273
Iteration 300, Epoch 1, Loss: 2.250509023666382, Accuracy: 15.993563652038574, Val Loss: 2.2268240451812744, Val Accuracy: 20.0
Iteration 400, Epoch 1, Loss: 2.2429234981536865, Accuracy: 17.101776123046875, Val Loss: 2.2217795848846436, Val Accuracy: 21.0
Iteration 500, Epoch 1, Loss: 2.2383415699005127, Accuracy: 17.521207809448242, Val Loss: 2.2170510292053223, Val Accuracy: 21.5
Iteration 600, Epoch 1, Loss: 2.2346527576446533, Accuracy: 17.95185089111328, Val Loss: 2.211547374725342, Val Accuracy: 21.799999237060547
Iteration 700, Epoch 1, Loss: 2.231525182723999, Accuracy: 18.2

# Part V: Tuning
In this section, you are asked to experiment with different dense/fully connnected architectures, activation functions, weight initializations, hyperparameters, optimizers, and regularization approaches to train models on the CIFAR-10 dataset. You can use the built-in train function, the `train_part34` function from above, or implement your own training loop.

Describe what you did at the end of the notebook.

### Things to experiment with:
- **Network architectures**: The network above has two layers of trainable parameters. Can you do better with a deeper network? Or maybe with a wider network? Try five different architectures and observe the performance on the validation data. Use the architectures in combinations with other hyperparameters, as outlines below. Discuss your findings.
- **Activation functions**: In your networks, use five different activation functions, such as ReLU, leaky ReLU, parametric ReLU, ELU, MaxOut, or tanh to gain practical insights into their ability to improve accuracy. 
- **Weight initialization**: Corresponding to your activation functions, use different weight initialization schemes. Discuss your findings. What happens if you use the zero_weight initialization? 
- **Batch normalization**: Try adding batch normalization. Do your networks train faster? Does the accuracy improve?
- **Optimizers**: Use different optimizers, including SGD, SGD with momentum, RMSprop and Adam. Use the optimizers with and without batch normalization to observe what optimizers benefit more from batch normalization, or different weight initializations schemes and what optimizers are more robust to initialization/normalization. 
- **Regularization**: Compare L2 weight regularization, with dropout, batch normalization, and data augmentation. Discuss your findings.  
- **Model Ensemble**: Construct a model ensemble using some of your best hyperparameters as identified before, and compare the accuracy of the model assemble with the accuracy of your best individual model (based on the validation dataset). 


### NOTE: Batch Normalization / Dropout
When you are using Batch Normalization and Dropout, remember to pass `is_training=True` if you use the `train_part34()` function. BatchNorm and Dropout layers have different behaviors at training and inference time. `training` is a specific keyword argument reserved for this purpose in any `tf.keras.Model`'s `call()` function. Read more about this here : https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/BatchNormalization#methods
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/Dropout#methods

### Tips for training
For each network architecture that you try, you should tune the learning rate and other hyperparameters. When doing this there are a couple important things to keep in mind: 

- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and save your test set for evaluating your architecture on the best parameters as selected by the validation set.



In [0]:
# Fully connected 3 layer network#1 starts

In [17]:
 """ Custom network with three fully connected layers. 
    Activation Functions Used: Relu, Softmax, Sigmoid
    Optimizers Used: SGD
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: No
    Regularization Usage: No
"""
class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(num_classes, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(num_classes, activation='sigmoid',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 2.3004648685455322, Accuracy: 9.375, Val Loss: 2.3054885864257812, Val Accuracy: 10.0
Iteration 700, Epoch 1, Loss: 2.2931909561157227, Accuracy: 12.73849868774414, Val Loss: 2.28623104095459, Val Accuracy: 12.800000190734863
Iteration 1400, Epoch 2, Loss: 2.278656244277954, Accuracy: 14.62352466583252, Val Loss: 2.2751588821411133, Val Accuracy: 12.600000381469727
Iteration 2100, Epoch 3, Loss: 2.26861834526062, Accuracy: 14.927504539489746, Val Loss: 2.2670528888702393, Val Accuracy: 13.40000057220459
Iteration 2800, Epoch 4, Loss: 2.2591516971588135, Accuracy: 15.308150291442871, Val Loss: 2.2568225860595703, Val Accuracy: 14.0
Iteration 3500, Epoch 5, Loss: 2.2487266063690186, Accuracy: 17.176774978637695, Val Loss: 2.2472569942474365, Val Accuracy: 16.100000381469727
Iteration 4200, Epoch 6, Loss: 2.241701364517212, Accuracy: 17.516002655029297, Val Loss: 2.240373134613037, Val Accuracy: 17.5
Iteration 4900, Epoch 7, Loss: 2.2363786697387695, Accuracy: 

In [19]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, LeakyReLu, PreLU
    Optimizers Used: RMSProp
    Learning Rate = 0.003
    Weight Initializer = glorot_normal (Xavier)
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_normal(seed=None)
        self.fc1 = tf.keras.layers.Dense(192, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(192, activation=tf.keras.layers.LeakyReLU(),
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(192, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.RMSprop(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 11.420839309692383, Accuracy: 1.5625, Val Loss: 7.956015586853027, Val Accuracy: 9.09999942779541
Iteration 700, Epoch 1, Loss: 4.972878456115723, Accuracy: 16.56785011291504, Val Loss: 5.257494926452637, Val Accuracy: 8.699999809265137
Iteration 1400, Epoch 2, Loss: 5.2580695152282715, Accuracy: 9.97047233581543, Val Loss: 5.257494926452637, Val Accuracy: 8.699999809265137
Iteration 2100, Epoch 3, Loss: 5.257570743560791, Accuracy: 9.943431854248047, Val Loss: 5.257494926452637, Val Accuracy: 8.699999809265137
Iteration 2800, Epoch 4, Loss: 5.257495880126953, Accuracy: 9.872017860412598, Val Loss: 5.257494926452637, Val Accuracy: 8.699999809265137
Iteration 3500, Epoch 5, Loss: 5.2574849128723145, Accuracy: 9.950657844543457, Val Loss: 5.257494926452637, Val Accuracy: 8.699999809265137
Iteration 4200, Epoch 6, Loss: 5.257476329803467, Accuracy: 9.880391120910645, Val Loss: 5.257494926452637, Val Accuracy: 8.699999809265137
Iteration 4900, Epoch 7, Loss: 5.2

In [20]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: SGD with momentum
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate, momentum = 0.1) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 5.074756622314453, Accuracy: 0.0, Val Loss: 5.029781818389893, Val Accuracy: 0.7000000476837158
Iteration 700, Epoch 1, Loss: 3.0958364009857178, Accuracy: 20.564373016357422, Val Loss: 2.169178009033203, Val Accuracy: 29.600000381469727
Iteration 1400, Epoch 2, Loss: 2.0062339305877686, Accuracy: 32.46309280395508, Val Loss: 1.902441382408142, Val Accuracy: 33.89999771118164
Iteration 2100, Epoch 3, Loss: 1.8462144136428833, Accuracy: 36.06930923461914, Val Loss: 1.8116933107376099, Val Accuracy: 36.89999771118164
Iteration 2800, Epoch 4, Loss: 1.7679942846298218, Accuracy: 38.54373550415039, Val Loss: 1.7625174522399902, Val Accuracy: 38.400001525878906
Iteration 3500, Epoch 5, Loss: 1.7198361158370972, Accuracy: 40.224544525146484, Val Loss: 1.723604440689087, Val Accuracy: 39.39999771118164
Iteration 4200, Epoch 6, Loss: 1.6853551864624023, Accuracy: 41.45468521118164, Val Loss: 1.6874781847000122, Val Accuracy: 41.60000228881836
Iteration 4900, Epoch 7,

In [21]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: ELU, ReLU, Softmax
    Optimizers Used: Nadam
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Nadam(learning_rate, beta_1=0.9, beta_2=0.999)

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 6.019235610961914, Accuracy: 0.0, Val Loss: 4.85754919052124, Val Accuracy: 5.600000381469727
Iteration 700, Epoch 1, Loss: 1.869362473487854, Accuracy: 37.905670166015625, Val Loss: 1.6481701135635376, Val Accuracy: 46.0
Iteration 1400, Epoch 2, Loss: 1.5243102312088013, Accuracy: 46.66584777832031, Val Loss: 1.5237269401550293, Val Accuracy: 48.79999923706055
Iteration 2100, Epoch 3, Loss: 1.403150200843811, Accuracy: 50.917179107666016, Val Loss: 1.5113362073898315, Val Accuracy: 48.400001525878906
Iteration 2800, Epoch 4, Loss: 1.3204530477523804, Accuracy: 53.66550827026367, Val Loss: 1.4689565896987915, Val Accuracy: 48.599998474121094
Iteration 3500, Epoch 5, Loss: 1.254278302192688, Accuracy: 56.078372955322266, Val Loss: 1.4901604652404785, Val Accuracy: 50.599998474121094
Iteration 4200, Epoch 6, Loss: 1.205810785293579, Accuracy: 57.5850715637207, Val Loss: 1.4509872198104858, Val Accuracy: 51.5
Iteration 4900, Epoch 7, Loss: 1.1536744832992554, A

In [22]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(num_classes, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(num_classes, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 128, 128, 10

model1 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 5.934971809387207, Accuracy: 0.0, Val Loss: 5.023471832275391, Val Accuracy: 4.400000095367432
Iteration 700, Epoch 1, Loss: 1.9073517322540283, Accuracy: 37.776390075683594, Val Loss: 1.7194112539291382, Val Accuracy: 41.5
Iteration 1400, Epoch 2, Loss: 1.5460330247879028, Accuracy: 46.14419174194336, Val Loss: 1.5238200426101685, Val Accuracy: 47.400001525878906
Iteration 2100, Epoch 3, Loss: 1.4272615909576416, Accuracy: 50.08787536621094, Val Loss: 1.5060251951217651, Val Accuracy: 46.39999771118164
Iteration 2800, Epoch 4, Loss: 1.3508049249649048, Accuracy: 52.75844955444336, Val Loss: 1.4731191396713257, Val Accuracy: 48.5
Iteration 3500, Epoch 5, Loss: 1.2877105474472046, Accuracy: 54.83409881591797, Val Loss: 1.4372961521148682, Val Accuracy: 50.5
Iteration 4200, Epoch 6, Loss: 1.2307240962982178, Accuracy: 57.10916519165039, Val Loss: 1.523315668106079, Val Accuracy: 50.0
Iteration 4900, Epoch 7, Loss: 1.1784148216247559, Accuracy: 58.9139328002929

In [48]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(256, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(256, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(256, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 256, 256, 10

model2 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

<__main__.CustomNet object at 0x7fba568814e0>
Iteration 0, Epoch 1, Loss: 6.017521858215332, Accuracy: 0.0, Val Loss: 5.75267219543457, Val Accuracy: 8.299999237060547


KeyboardInterrupt: ignored

In [24]:
""" Custom network with three fully connected layers and weights initialized to zero. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = zero
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.Zeros()
        self.fc1 = tf.keras.layers.Dense(256, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(256, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(256, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 5.545177459716797, Accuracy: 6.25, Val Loss: 5.543257713317871, Val Accuracy: 11.90000057220459
Iteration 700, Epoch 1, Loss: 4.91896915435791, Accuracy: 10.103869438171387, Val Loss: 4.322484016418457, Val Accuracy: 9.800000190734863
Iteration 1400, Epoch 2, Loss: 3.7813146114349365, Accuracy: 9.84990119934082, Val Loss: 3.394547939300537, Val Accuracy: 9.800000190734863
Iteration 2100, Epoch 3, Loss: 3.0459656715393066, Accuracy: 9.902240753173828, Val Loss: 2.863523244857788, Val Accuracy: 9.800000190734863
Iteration 2800, Epoch 4, Loss: 2.683000326156616, Accuracy: 9.862698554992676, Val Loss: 2.608543634414673, Val Accuracy: 9.800000190734863
Iteration 3500, Epoch 5, Loss: 2.5164992809295654, Accuracy: 9.564502716064453, Val Loss: 2.485023021697998, Val Accuracy: 10.699999809265137
Iteration 4200, Epoch 6, Loss: 2.434187889099121, Accuracy: 9.775101661682129, Val Loss: 2.419881582260132, Val Accuracy: 7.90000057220459
Iteration 4900, Epoch 7, Loss: 2.38

In [0]:
# Fully connected 3 layer network#1 ends

In [0]:
# Fully connected 4 layer network#2 starts

In [74]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, tanh, Softmax, Sigmoid
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(32, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(32, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(64, activation='sigmoid',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.15327262878418, Accuracy: 3.125, Val Loss: 4.136752605438232, Val Accuracy: 5.200000286102295
Iteration 700, Epoch 1, Loss: 3.532177448272705, Accuracy: 10.995452880859375, Val Loss: 3.063830614089966, Val Accuracy: 7.800000190734863
Iteration 1400, Epoch 2, Loss: 2.780595541000366, Accuracy: 9.858044624328613, Val Loss: 2.616070508956909, Val Accuracy: 7.800000190734863
Iteration 2100, Epoch 3, Loss: 2.507383346557617, Accuracy: 9.942680358886719, Val Loss: 2.4549179077148438, Val Accuracy: 7.800000190734863
Iteration 2800, Epoch 4, Loss: 2.4077513217926025, Accuracy: 9.759374618530273, Val Loss: 2.387640953063965, Val Accuracy: 7.800000190734863
Iteration 3500, Epoch 5, Loss: 2.3634157180786133, Accuracy: 9.847719192504883, Val Loss: 2.3544113636016846, Val Accuracy: 11.200000762939453
Iteration 4200, Epoch 6, Loss: 2.340364456176758, Accuracy: 9.951332092285156, Val Loss: 2.335966110229492, Val Accuracy: 7.800000190734863
Iteration 4900, Epoch 7, Loss: 

In [0]:
""" Custom network with four fully connected layers and weights initialized to zero. 
    Activation Functions Used: Relu, tanh, Softmax, Sigmoid
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.zeros()
        self.fc1 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(32, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(32, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(64, activation='sigmoid',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

In [49]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(num_classes, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 32, 32, 10

model3 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.138031959533691, Accuracy: 1.5625, Val Loss: 4.124605178833008, Val Accuracy: 1.3000000715255737
Iteration 700, Epoch 1, Loss: 1.9835296869277954, Accuracy: 32.883827209472656, Val Loss: 1.7953952550888062, Val Accuracy: 40.400001525878906
Iteration 1400, Epoch 2, Loss: 1.69623863697052, Accuracy: 40.27996826171875, Val Loss: 1.6975334882736206, Val Accuracy: 43.20000076293945
Iteration 2100, Epoch 3, Loss: 1.6118003129959106, Accuracy: 43.15751647949219, Val Loss: 1.750809907913208, Val Accuracy: 42.599998474121094
Iteration 2800, Epoch 4, Loss: 1.5469318628311157, Accuracy: 45.39687728881836, Val Loss: 1.7856608629226685, Val Accuracy: 43.20000076293945
Iteration 3500, Epoch 5, Loss: 1.4930534362792969, Accuracy: 47.513710021972656, Val Loss: 1.7507803440093994, Val Accuracy: 44.20000076293945
Iteration 4200, Epoch 6, Loss: 1.4522528648376465, Accuracy: 48.98394775390625, Val Loss: 1.760956883430481, Val Accuracy: 45.89999771118164
Iteration 4900, Epoch 

In [27]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: SGD
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform(xavier)
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(num_classes, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(num_classes, activation='relu',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(num_classes, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 96, 96, 10

model4 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.629169940948486, Accuracy: 0.0, Val Loss: 4.784705638885498, Val Accuracy: 1.3000000715255737
Iteration 700, Epoch 1, Loss: 3.175558090209961, Accuracy: 17.981008529663086, Val Loss: 2.185238838195801, Val Accuracy: 28.5
Iteration 1400, Epoch 2, Loss: 2.0409128665924072, Accuracy: 29.781003952026367, Val Loss: 1.9500231742858887, Val Accuracy: 33.39999771118164
Iteration 2100, Epoch 3, Loss: 1.89655339717865, Accuracy: 33.460567474365234, Val Loss: 1.8785152435302734, Val Accuracy: 35.10000228881836
Iteration 2800, Epoch 4, Loss: 1.8248779773712158, Accuracy: 35.744903564453125, Val Loss: 1.837938666343689, Val Accuracy: 36.599998474121094
Iteration 3500, Epoch 5, Loss: 1.7790374755859375, Accuracy: 37.39273452758789, Val Loss: 1.8019938468933105, Val Accuracy: 37.400001525878906
Iteration 4200, Epoch 6, Loss: 1.7467286586761475, Accuracy: 38.53605270385742, Val Loss: 1.7713929414749146, Val Accuracy: 39.5
Iteration 4900, Epoch 7, Loss: 1.7236790657043457,

In [28]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, softmax
    Optimizers Used: SGD
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform(xavier)
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(96, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.622900009155273, Accuracy: 0.0, Val Loss: 4.652817249298096, Val Accuracy: 0.7000000476837158
Iteration 700, Epoch 1, Loss: 3.3350508213043213, Accuracy: 16.17778205871582, Val Loss: 2.2686514854431152, Val Accuracy: 26.399999618530273
Iteration 1400, Epoch 2, Loss: 2.081228256225586, Accuracy: 28.67618179321289, Val Loss: 1.9813309907913208, Val Accuracy: 31.799999237060547
Iteration 2100, Epoch 3, Loss: 1.9220768213272095, Accuracy: 32.74934005737305, Val Loss: 1.8951126337051392, Val Accuracy: 33.29999923706055
Iteration 2800, Epoch 4, Loss: 1.8471931219100952, Accuracy: 35.08635711669922, Val Loss: 1.8455675840377808, Val Accuracy: 36.19999694824219
Iteration 3500, Epoch 5, Loss: 1.7960790395736694, Accuracy: 36.813499450683594, Val Loss: 1.8083993196487427, Val Accuracy: 36.19999694824219
Iteration 4200, Epoch 6, Loss: 1.7603392601013184, Accuracy: 38.072776794433594, Val Loss: 1.7709484100341797, Val Accuracy: 38.5
Iteration 4900, Epoch 7, Loss: 1.73

In [29]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, softmax
    Optimizers Used: SGD
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform (xavier)
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(128, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(128, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.953373432159424, Accuracy: 0.0, Val Loss: 4.910431385040283, Val Accuracy: 0.30000001192092896
Iteration 700, Epoch 1, Loss: 3.648277759552002, Accuracy: 16.79074478149414, Val Loss: 2.5129098892211914, Val Accuracy: 26.0
Iteration 1400, Epoch 2, Loss: 2.1723668575286865, Accuracy: 29.19537353515625, Val Loss: 2.046302318572998, Val Accuracy: 30.599998474121094
Iteration 2100, Epoch 3, Loss: 1.9551620483398438, Accuracy: 32.58457946777344, Val Loss: 1.9382052421569824, Val Accuracy: 33.19999694824219
Iteration 2800, Epoch 4, Loss: 1.8666714429855347, Accuracy: 34.9931640625, Val Loss: 1.872639536857605, Val Accuracy: 34.79999923706055
Iteration 3500, Epoch 5, Loss: 1.8077040910720825, Accuracy: 36.72411346435547, Val Loss: 1.825153112411499, Val Accuracy: 35.29999923706055
Iteration 4200, Epoch 6, Loss: 1.7668704986572266, Accuracy: 38.00960159301758, Val Loss: 1.7848596572875977, Val Accuracy: 37.5
Iteration 4900, Epoch 7, Loss: 1.7364652156829834, Accura

In [30]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, tanh, PRrelu, sigmoid
    Optimizers Used: SGD with momentum
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(128, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(128, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(128, activation='sigmoid',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate, momentum = 0.1) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.872550964355469, Accuracy: 0.0, Val Loss: 4.887304782867432, Val Accuracy: 0.20000000298023224
Iteration 700, Epoch 1, Loss: 4.674384117126465, Accuracy: 6.4439191818237305, Val Loss: 4.497355937957764, Val Accuracy: 13.799999237060547
Iteration 1400, Epoch 2, Loss: 4.359328746795654, Accuracy: 17.40895652770996, Val Loss: 4.2574238777160645, Val Accuracy: 17.5
Iteration 2100, Epoch 3, Loss: 4.144274711608887, Accuracy: 17.286357879638672, Val Loss: 4.062170505523682, Val Accuracy: 16.899999618530273
Iteration 2800, Epoch 4, Loss: 3.9162659645080566, Accuracy: 16.47614288330078, Val Loss: 3.8149335384368896, Val Accuracy: 16.899999618530273
Iteration 3500, Epoch 5, Loss: 3.556414842605591, Accuracy: 15.224541664123535, Val Loss: 3.400240421295166, Val Accuracy: 13.0
Iteration 4200, Epoch 6, Loss: 2.9416260719299316, Accuracy: 11.986186027526855, Val Loss: 2.7763848304748535, Val Accuracy: 10.59999942779541
Iteration 4900, Epoch 7, Loss: 2.518076181411743, 

In [0]:
# Fully connected 4 layer network#2 ends.

In [0]:
# Fully connected 6 layer network#3 begins

In [35]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, relu, relu, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(128, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)        
        self.fc6 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 6.257875442504883, Accuracy: 0.0, Val Loss: 5.689838886260986, Val Accuracy: 0.20000000298023224
Iteration 700, Epoch 1, Loss: 1.8572686910629272, Accuracy: 36.274070739746094, Val Loss: 1.6541603803634644, Val Accuracy: 44.10000228881836
Iteration 1400, Epoch 2, Loss: 1.5752373933792114, Accuracy: 44.288875579833984, Val Loss: 1.5404925346374512, Val Accuracy: 45.0
Iteration 2100, Epoch 3, Loss: 1.4780240058898926, Accuracy: 47.86906814575195, Val Loss: 1.506411075592041, Val Accuracy: 46.39999771118164
Iteration 2800, Epoch 4, Loss: 1.3960193395614624, Accuracy: 50.42557144165039, Val Loss: 1.4578404426574707, Val Accuracy: 49.599998474121094
Iteration 3500, Epoch 5, Loss: 1.3328073024749756, Accuracy: 53.18578338623047, Val Loss: 1.4350227117538452, Val Accuracy: 50.5
Iteration 4200, Epoch 6, Loss: 1.2761518955230713, Accuracy: 55.327659606933594, Val Loss: 1.4524667263031006, Val Accuracy: 50.80000305175781
Iteration 4900, Epoch 7, Loss: 1.21780431270599

In [75]:
""" Custom network with six fully connected layers with weights initialized to zero. 
    Activation Functions Used: Relu, relu, PRrelu, relu, relu, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.zeros()
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(128, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)        
        self.fc6 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.852030277252197, Accuracy: 6.25, Val Loss: 4.850188255310059, Val Accuracy: 11.90000057220459
Iteration 700, Epoch 1, Loss: 4.272616386413574, Accuracy: 10.103869438171387, Val Loss: 3.74444842338562, Val Accuracy: 9.800000190734863
Iteration 1400, Epoch 2, Loss: 3.317512273788452, Accuracy: 9.818612098693848, Val Loss: 3.033723831176758, Val Accuracy: 9.800000190734863
Iteration 2100, Epoch 3, Loss: 2.800137996673584, Accuracy: 9.948192596435547, Val Loss: 2.682596206665039, Val Accuracy: 9.800000190734863
Iteration 2800, Epoch 4, Loss: 2.566826820373535, Accuracy: 9.868749618530273, Val Loss: 2.519131898880005, Val Accuracy: 9.800000190734863
Iteration 3500, Epoch 5, Loss: 2.458301544189453, Accuracy: 9.613163948059082, Val Loss: 2.437211751937866, Val Accuracy: 10.699999809265137
Iteration 4200, Epoch 6, Loss: 2.4021129608154297, Accuracy: 9.789105415344238, Val Loss: 2.3921749591827393, Val Accuracy: 10.699999809265137
Iteration 4900, Epoch 7, Loss: 2.

In [36]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(num_classes, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(num_classes, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(num_classes, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dense(num_classes, activation='sigmoid',
                                   kernel_initializer=initializer)        
        self.fc6 = tf.keras.layers.Dense(num_classes, activation='softmax',
                                   kernel_initializer=initializer)        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 128, 128, 10

model5 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.740220069885254, Accuracy: 6.25, Val Loss: 4.549184322357178, Val Accuracy: 10.199999809265137
Iteration 700, Epoch 1, Loss: 1.8214665651321411, Accuracy: 36.6507682800293, Val Loss: 1.6466152667999268, Val Accuracy: 42.5
Iteration 1400, Epoch 2, Loss: 1.5620239973068237, Accuracy: 44.78838348388672, Val Loss: 1.5124034881591797, Val Accuracy: 46.599998474121094
Iteration 2100, Epoch 3, Loss: 1.46746826171875, Accuracy: 47.97616195678711, Val Loss: 1.479024052619934, Val Accuracy: 46.099998474121094
Iteration 2800, Epoch 4, Loss: 1.401035189628601, Accuracy: 50.279571533203125, Val Loss: 1.4699703454971313, Val Accuracy: 48.0
Iteration 3500, Epoch 5, Loss: 1.3516278266906738, Accuracy: 52.23112106323242, Val Loss: 1.4272321462631226, Val Accuracy: 50.400001525878906
Iteration 4200, Epoch 6, Loss: 1.3136874437332153, Accuracy: 53.714622497558594, Val Loss: 1.411895990371704, Val Accuracy: 50.19999694824219
Iteration 4900, Epoch 7, Loss: 1.2758690118789673, 

In [37]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(hidden_size, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(num_classes, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(num_classes, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(num_classes, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dense(num_classes, activation='sigmoid',
                                   kernel_initializer=initializer)        
        self.fc6 = tf.keras.layers.Dense(num_classes, activation='softmax',
                                   kernel_initializer=initializer)        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 96, 96, 10

model6 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 4.362520217895508, Accuracy: 6.25, Val Loss: 4.269597053527832, Val Accuracy: 8.699999809265137
Iteration 700, Epoch 1, Loss: 1.8376331329345703, Accuracy: 36.124732971191406, Val Loss: 1.6089344024658203, Val Accuracy: 43.39999771118164
Iteration 1400, Epoch 2, Loss: 1.568851351737976, Accuracy: 44.44635772705078, Val Loss: 1.529409646987915, Val Accuracy: 45.69999694824219
Iteration 2100, Epoch 3, Loss: 1.4876595735549927, Accuracy: 47.31985855102539, Val Loss: 1.521875262260437, Val Accuracy: 45.10000228881836
Iteration 2800, Epoch 4, Loss: 1.4268298149108887, Accuracy: 49.58995819091797, Val Loss: 1.478678584098816, Val Accuracy: 48.10000228881836
Iteration 3500, Epoch 5, Loss: 1.3775806427001953, Accuracy: 51.56965255737305, Val Loss: 1.4256869554519653, Val Accuracy: 50.099998474121094
Iteration 4200, Epoch 6, Loss: 1.3351408243179321, Accuracy: 52.49747085571289, Val Loss: 1.4241743087768555, Val Accuracy: 50.400001525878906
Iteration 4900, Epoch 7, L

In [0]:
# Fully connected 6 layer network#3 ends

In [38]:
""" At this point, I have 9 different architectures with varying number of layers and width. The different kinds of models are:
1. Three layer network with 42, 192 and 128 widths. 
2. Four layer network with 32, 96, 128 widths.
3. Six layer network with 96 and 128 widths.


Four Different Optimizers have been used:
1. SGD
2. SGD with Momentum
3. Adam
4. RMSProp

Five different Activation function have been used:
1. ReLU
2. PrELU
3. Tanh
4. Softmax
5. Sigmoid

Three different Initializers have been used:
1. Zeros
2. Xavier
3. he_uniform

"""

' At this point, I have 9 different architectures with varying number of layers and width. The different kinds of models are:\n1. Three layer network with 42, 192 and 128 widths. \n2. Four layer network with 32, 96, 128 widths.\n3. Six layer network with 96 and 128 widths.\n\n\nFour Different Optimizers have been used:\n1. SGD\n2. SGD with Momentum\n3. Adam\n4. RMSProp\n\nFive different Activation function have been used:\n1. ReLU\n2. PrELU\n3. Tanh\n4. Softmax\n5. Sigmoid\n\nThree different Initializers have been used:\n1. Zeros\n2. Xavier\n3. he_uniform\n\n'

In [0]:
# The next step in the process is to experiment with batch normalization and regularization usages. 
# The idea is to apply batch normalization and regularization to two of the best networks identified in each of the three layer and six layer networks and experiment

In [0]:
# Application of batch normalization and regularization usage on fully connected 3 layer network begins

In [39]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: SGD with momentum
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform
    Batch Normalization Usage: Yes
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.BatchNormalization(axis = 1)
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.BatchNormalization(axis = 1)
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc6 = tf.keras.layers.BatchNormalization(axis = 1)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc4(x)
        x = self.fc2(x)
        x = self.fc5(x)
        x = self.fc3(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate, momentum = 0.1) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 14.460289001464844, Accuracy: 0.0, Val Loss: 4.903382301330566, Val Accuracy: 0.30000001192092896
Iteration 700, Epoch 1, Loss: 4.43472146987915, Accuracy: 4.874732494354248, Val Loss: 2.959465503692627, Val Accuracy: 9.300000190734863
Iteration 1400, Epoch 2, Loss: 2.9847371578216553, Accuracy: 10.625, Val Loss: 2.8424923419952393, Val Accuracy: 12.899999618530273
Iteration 2100, Epoch 3, Loss: 2.75673508644104, Accuracy: 14.685853004455566, Val Loss: 2.6703648567199707, Val Accuracy: 17.200000762939453
Iteration 2800, Epoch 4, Loss: 2.5931546688079834, Accuracy: 17.088096618652344, Val Loss: 2.5262579917907715, Val Accuracy: 18.700000762939453
Iteration 3500, Epoch 5, Loss: 2.4586844444274902, Accuracy: 18.367420196533203, Val Loss: 2.4074325561523438, Val Accuracy: 20.0
Iteration 4200, Epoch 6, Loss: 2.354135274887085, Accuracy: 19.246967315673828, Val Loss: 2.318803310394287, Val Accuracy: 21.299999237060547
Iteration 490

In [40]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: SGD with momentum
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform
    Batch Normalization Usage: No
    Regularization Usage: Yes
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate, momentum = 0.1) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 5.338575839996338, Accuracy: 0.0, Val Loss: 5.148134708404541, Val Accuracy: 0.0
Iteration 700, Epoch 1, Loss: 3.077404499053955, Accuracy: 21.45149803161621, Val Loss: 2.1859166622161865, Val Accuracy: 28.5
Iteration 1400, Epoch 2, Loss: 1.9976201057434082, Accuracy: 32.760826110839844, Val Loss: 1.941796898841858, Val Accuracy: 33.099998474121094
Iteration 2100, Epoch 3, Loss: 1.845123291015625, Accuracy: 36.33018493652344, Val Loss: 1.8538987636566162, Val Accuracy: 35.10000228881836
Iteration 2800, Epoch 4, Loss: 1.7697933912277222, Accuracy: 38.2424201965332, Val Loss: 1.8027914762496948, Val Accuracy: 36.5
Iteration 3500, Epoch 5, Loss: 1.7222458124160767, Accuracy: 39.938499450683594, Val Loss: 1.7633614540100098, Val Accuracy: 38.5
Iteration 4200, Epoch 6, Loss: 1.6880912780761719, Accuracy: 41.05037307739258, Val Loss: 1.7259232997894287, Val Accuracy: 40.900001525878906
Iteration 4900, Epoch 7, Loss: 1.6625773906707

In [41]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: SGD with momentum
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Dropout
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dropout(0.2)
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dropout(0.2)
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc6 = tf.keras.layers.Dropout(0.2)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc4(x)
        x = self.fc2(x)
        x = self.fc5(x)
        x = self.fc3(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate, momentum = 0.1) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 7.430489540100098, Accuracy: 0.0, Val Loss: 5.105891704559326, Val Accuracy: 0.4000000059604645
Iteration 700, Epoch 1, Loss: 5.9045538902282715, Accuracy: 14.481544494628906, Val Loss: 2.350713014602661, Val Accuracy: 28.0
Iteration 1400, Epoch 2, Loss: 4.899555683135986, Accuracy: 23.28740119934082, Val Loss: 2.017662286758423, Val Accuracy: 32.60000228881836
Iteration 2100, Epoch 3, Loss: 4.708881855010986, Accuracy: 25.991323471069336, Val Loss: 1.921962022781372, Val Accuracy: 35.10000228881836
Iteration 2800, Epoch 4, Loss: 4.58647346496582, Accuracy: 27.568960189819336, Val Loss: 1.8640259504318237, Val Accuracy: 36.39999771118164
Iteration 3500, Epoch 5, Loss: 4.5348968505859375, Accuracy: 28.804346084594727, Val Loss: 1.8229739665985107, Val Accuracy: 36.89999771118164
Iteration 4200, Epoch 6, Loss: 4.474116325378418, Accuracy: 30.403470993041992, Val Loss: 1.79880952835083, Val Accuracy: 37.79999923706055
Iteration 

In [42]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: SGD with momentum
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Data Augmentation
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate, momentum = 0.1) 

X_train, y_train = load_datagen(X_train, y_train)
train_dset = Dataset(X_train, y_train, batch_size=64, shuffle=True)
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

Iteration 0, Epoch 1, Loss: 5.008494853973389, Accuracy: 0.0, Val Loss: 5.275371074676514, Val Accuracy: 0.30000001192092896
Iteration 700, Epoch 1, Loss: 3.8247485160827637, Accuracy: 16.240192413330078, Val Loss: 2.4506404399871826, Val Accuracy: 26.30000114440918
Iteration 1400, Epoch 2, Loss: 2.4668073654174805, Accuracy: 26.821273803710938, Val Loss: 2.0642549991607666, Val Accuracy: 31.299999237060547
Iteration 2100, Epoch 3, Loss: 2.1641080379486084, Accuracy: 29.968584060668945, Val Loss: 1.9843767881393433, Val Accuracy: 32.5
Iteration 2800, Epoch 4, Loss: 2.0472772121429443, Accuracy: 31.831249237060547, Val Loss: 1.9495902061462402, Val Accuracy: 33.599998474121094
Iteration 3500, Epoch 5, Loss: 1.9813458919525146, Accuracy: 33.06148910522461, Val Loss: 1.9231754541397095, Val Accuracy: 34.900001525878906
Iteration 4200, Epoch 6, Loss: 1.9375848770141602, Accuracy: 33.62790298461914, Val Loss: 1.8862206935882568, Val Accuracy: 37.599998474121094
Iteration 4900, Epoch 7, Loss

In [43]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: Yes
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.BatchNormalization(axis = 1)        
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.BatchNormalization(axis = 1)        
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc6 = tf.keras.layers.BatchNormalization(axis = 1)        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc4(x)
        x = self.fc2(x)
        x = self.fc5(x)
        x = self.fc3(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 15.501960754394531, Accuracy: 1.5625, Val Loss: 7.0485029220581055, Val Accuracy: 0.5
Iteration 700, Epoch 1, Loss: 7.164695739746094, Accuracy: 12.388551712036133, Val Loss: 4.198223114013672, Val Accuracy: 21.100000381469727
Iteration 1400, Epoch 2, Loss: 3.196993589401245, Accuracy: 22.434444427490234, Val Loss: 3.3213348388671875, Val Accuracy: 17.600000381469727
Iteration 2100, Epoch 3, Loss: 3.1271371841430664, Accuracy: 22.255290985107422, Val Loss: 3.2736854553222656, Val Accuracy: 19.200000762939453
Iteration 2800, Epoch 4, Loss: 2.8874690532684326, Accuracy: 22.881250381469727, Val Loss: 2.930332899093628, Val Accuracy: 22.899999618530273
Iteration 3500, Epoch 5, Loss: 2.6119842529296875, Accuracy: 29.102914810180664, Val Loss: 2.797142744064331, Val Accuracy: 28.799999237060547
Iteration 4200, Epoch 6, Loss: 2.670226573944092, Accuracy: 24.786544799804688, Val Loss: 2.829084873199463, Val Accuracy: 25.0
Iteration 4

In [44]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: No
    Regularization Usage: Yes
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 5.681793212890625, Accuracy: 0.0, Val Loss: 5.520319938659668, Val Accuracy: 1.4000000953674316
Iteration 700, Epoch 1, Loss: 1.9762439727783203, Accuracy: 34.83193588256836, Val Loss: 1.8685003519058228, Val Accuracy: 44.10000228881836
Iteration 1400, Epoch 2, Loss: 1.6178958415985107, Accuracy: 43.68345642089844, Val Loss: 1.7776645421981812, Val Accuracy: 44.29999923706055
Iteration 2100, Epoch 3, Loss: 1.4768613576889038, Accuracy: 48.66071319580078, Val Loss: 1.7747807502746582, Val Accuracy: 45.5
Iteration 2800, Epoch 4, Loss: 1.3615370988845825, Accuracy: 52.88125228881836, Val Loss: 1.993119716644287, Val Accuracy: 44.10000228881836
Iteration 3500, Epoch 5, Loss: 1.2567222118377686, Accuracy: 56.33300018310547, Val Loss: 2.0021939277648926, Val Accuracy: 45.79999923706055
Iteration 4200, Epoch 6, Loss: 1.1709330081939697, Accuracy: 59.27680587768555, Val Loss: 2.0948681831359863, Val Accuracy: 44.900001525878906
Itera

In [45]:
""" Custom network with three fully connected layers. 
    Activation Functions Used: Relu, ReLU, Softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = he_uniform
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Dropout
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.he_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dropout(0.2)       
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dropout(0.2)       
        self.fc3 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc6 = tf.keras.layers.Dropout(0.2)       
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc4(x)
        x = self.fc2(x)
        x = self.fc5(x)
        x = self.fc3(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 7.477950096130371, Accuracy: 0.0, Val Loss: 5.167136192321777, Val Accuracy: 2.8999998569488525
Iteration 700, Epoch 1, Loss: 4.810964584350586, Accuracy: 26.673948287963867, Val Loss: 1.788588047027588, Val Accuracy: 40.79999923706055
Iteration 1400, Epoch 2, Loss: 4.476006507873535, Accuracy: 32.07561111450195, Val Loss: 1.801803708076477, Val Accuracy: 41.70000076293945
Iteration 2100, Epoch 3, Loss: 4.3686041831970215, Accuracy: 34.182098388671875, Val Loss: 1.7802538871765137, Val Accuracy: 43.599998474121094
Iteration 2800, Epoch 4, Loss: 4.4042463302612305, Accuracy: 35.015625, Val Loss: 1.8047302961349487, Val Accuracy: 43.20000076293945
Iteration 3500, Epoch 5, Loss: 4.224153995513916, Accuracy: 36.15040588378906, Val Loss: 1.8850326538085938, Val Accuracy: 43.0
Iteration 4200, Epoch 6, Loss: 4.254215717315674, Accuracy: 37.149932861328125, Val Loss: 1.8703242540359497, Val Accuracy: 45.599998474121094
Iteration 4900

In [0]:
# Application of batch normalization and regularization usage on fully connected 3 layer network ends

In [0]:
# Application of batch normalization and regularization usage on fully connected 4 layer network begins

In [46]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: Yes
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.BatchNormalization(axis = 1) 
        self.fc2 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc6 = tf.keras.layers.BatchNormalization(axis = 1) 
        self.fc3 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc7 = tf.keras.layers.BatchNormalization(axis = 1) 
        self.fc4 = tf.keras.layers.Dense(64, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc8 = tf.keras.layers.BatchNormalization(axis = 1) 
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc5(x)
        x = self.fc2(x)
        x = self.fc6(x)
        x = self.fc3(x)
        x = self.fc7(x)
        x = self.fc4(x)
        x = self.fc8(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 11.263551712036133, Accuracy: 0.0, Val Loss: 4.1783013343811035, Val Accuracy: 3.0


KeyboardInterrupt: ignored

In [0]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Dropout
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dropout(0.2)
        self.fc2 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc6 = tf.keras.layers.Dropout(0.2)
        self.fc3 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc7 = tf.keras.layers.Dropout(0.2)
        self.fc4 = tf.keras.layers.Dense(64, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc8 = tf.keras.layers.Dropout(0.2)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc5(x)
        x = self.fc2(x)
        x = self.fc6(x)
        x = self.fc3(x)
        x = self.fc7(x)
        x = self.fc4(x)
        x = self.fc8(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

In [0]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes -- L2
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc2 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc3 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc4 = tf.keras.layers.Dense(64, activation='softmax',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

In [0]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Data Augmentation
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer
        self.fc3 = tf.keras.layers.Dense(32, activation='relu',
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(64, activation='softmax',
                                   kernel_initializer=initializer)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train = load_datagen(X_train, y_train)
train_dset = Dataset(X_train, y_train, batch_size=64, shuffle=True)                                         

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

In [0]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: SGD
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform(xavier)
    Batch Normalization Usage: Yes
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.BatchNormalization(axis = 1)         
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc6 = tf.keras.layers.BatchNormalization(axis = 1)         
        self.fc3 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc7 = tf.keras.layers.BatchNormalization(axis = 1)         
        self.fc4 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc8 = tf.keras.layers.BatchNormalization(axis = 1) 
        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc5(x)        
        x = self.fc2(x)
        x = self.fc6(x)        
        x = self.fc3(x)
        x = self.fc7(x)        
        x = self.fc4(x)
        x = self.fc8(x)        
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

In [0]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: SGD
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform(xavier)
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Dropout
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dropout(0.2)         
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc6 = tf.keras.layers.Dropout(0.2)
        self.fc3 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc7 = tf.keras.layers.Dropout(0.2)
        self.fc4 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer)
        self.fc8 = tf.keras.layers.Dropout(0.2)
        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc5(x)        
        x = self.fc2(x)
        x = self.fc6(x)        
        x = self.fc3(x)
        x = self.fc7(x)        
        x = self.fc4(x)
        x = self.fc8(x)        
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

In [0]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: SGD
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform(xavier)
    Batch Normalization Usage: No
    Regularization Usage: Yes -- L2
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc3 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc4 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

In [0]:
""" Custom network with four fully connected layers. 
    Activation Functions Used: Relu, relu, relu, softmax
    Optimizers Used: SGD
    Learning Rate = 0.003
    Weight Initializer = glorot_uniform(xavier)
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Data Augmentation
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc3 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc4 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 
  
X_train, y_train = load_datagen(X_train, y_train)
train_dset = Dataset(X_train, y_train, batch_size=64, shuffle=True)                                           

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

In [0]:
# Application of batch normalization and regularization usage on fully connected 4 layer network ends

In [0]:
# Application of batch normalization and regularization usage on fully connected 6 layer network begins

In [0]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Data Augmentation
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(128, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(128, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dense(128, activation='sigmoid',
                                   kernel_initializer=initializer)        
        self.fc6 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer)        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train = load_datagen(X_train, y_train)
train_dset = Dataset(X_train, y_train, batch_size=64, shuffle=True)

train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

Iteration 0, Epoch 1, Loss: 5.1438093185424805, Accuracy: 0.0, Val Loss: 4.908463478088379, Val Accuracy: 0.10000000149011612
Iteration 700, Epoch 1, Loss: 1.911417841911316, Accuracy: 33.00642013549805, Val Loss: 1.6266423463821411, Val Accuracy: 43.39999771118164
Iteration 1400, Epoch 2, Loss: 1.6126383543014526, Accuracy: 42.717369079589844, Val Loss: 1.5668234825134277, Val Accuracy: 45.0
Iteration 2100, Epoch 3, Loss: 1.4854590892791748, Accuracy: 47.60527038574219, Val Loss: 1.5635242462158203, Val Accuracy: 46.20000076293945
Iteration 2800, Epoch 4, Loss: 1.3739393949508667, Accuracy: 51.64374542236328, Val Loss: 1.5371068716049194, Val Accuracy: 47.79999923706055
Iteration 3500, Epoch 5, Loss: 1.2752476930618286, Accuracy: 55.6654167175293, Val Loss: 1.5476415157318115, Val Accuracy: 46.70000076293945
Iteration 4200, Epoch 6, Loss: 1.1919894218444824, Accuracy: 58.99077606201172, Val Loss: 1.532097578048706, Val Accuracy: 48.79999923706055
Iteration 4900, Epoch 7, Loss: 1.12042

In [0]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: Yes
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc7 =  tf.keras.layers.BatchNormalization(axis = 1)
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc8 =  tf.keras.layers.BatchNormalization(axis = 1)
        self.fc3 = tf.keras.layers.Dense(96, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc9 =  tf.keras.layers.BatchNormalization(axis = 1)
        self.fc4 = tf.keras.layers.Dense(96, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc10 =  tf.keras.layers.BatchNormalization(axis = 1)
        self.fc5 = tf.keras.layers.Dense(96, activation='sigmoid',
                                   kernel_initializer=initializer)        
        self.fc11 =  tf.keras.layers.BatchNormalization(axis = 1)
        self.fc6 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer)   
        self.fc12 =  tf.keras.layers.BatchNormalization(axis = 1)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc7(x)
        x = self.fc2(x)
        x = self.fc8(x)
        x = self.fc3(x)
        x = self.fc9(x)
        x = self.fc4(x)
        x = self.fc10(x)
        x = self.fc5(x)
        x = self.fc11(x)
        x = self.fc6(x)
        x = self.fc12(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

Iteration 0, Epoch 1, Loss: 12.905860900878906, Accuracy: 3.125, Val Loss: 4.785928726196289, Val Accuracy: 0.0
Iteration 700, Epoch 1, Loss: 7.190889358520508, Accuracy: 14.635342597961426, Val Loss: 5.972823619842529, Val Accuracy: 17.5
Iteration 1400, Epoch 2, Loss: 4.3861517906188965, Accuracy: 17.682085037231445, Val Loss: 4.4350128173828125, Val Accuracy: 20.80000114440918
Iteration 2100, Epoch 3, Loss: 2.8762364387512207, Accuracy: 21.191234588623047, Val Loss: 2.857100486755371, Val Accuracy: 23.80000114440918
Iteration 2800, Epoch 4, Loss: 2.6665308475494385, Accuracy: 22.378231048583984, Val Loss: 2.6825778484344482, Val Accuracy: 28.799999237060547
Iteration 3500, Epoch 5, Loss: 2.386935234069824, Accuracy: 22.951231002807617, Val Loss: 2.224994421005249, Val Accuracy: 20.5
Iteration 4200, Epoch 6, Loss: 2.066650390625, Accuracy: 25.1768856048584, Val Loss: 2.027719259262085, Val Accuracy: 27.599998474121094
Iteration 4900, Epoch 7, Loss: 2.148189067840576, Accuracy: 21.0758

In [0]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc2 = tf.keras.layers.Dense(128, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc3 = tf.keras.layers.Dense(128, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc4 = tf.keras.layers.Dense(128, activation='tanh',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc5 = tf.keras.layers.Dense(128, activation='sigmoid',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))        
        self.fc6 = tf.keras.layers.Dense(128, activation='softmax',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

Iteration 0, Epoch 1, Loss: 4.937735557556152, Accuracy: 0.0, Val Loss: 4.757192611694336, Val Accuracy: 7.0
Iteration 700, Epoch 1, Loss: 1.8446135520935059, Accuracy: 35.78593063354492, Val Loss: 1.6341135501861572, Val Accuracy: 44.400001525878906
Iteration 1400, Epoch 2, Loss: 1.5908197164535522, Accuracy: 43.68848419189453, Val Loss: 1.5549242496490479, Val Accuracy: 45.69999694824219
Iteration 2100, Epoch 3, Loss: 1.5030704736709595, Accuracy: 47.004066467285156, Val Loss: 1.4803909063339233, Val Accuracy: 47.900001525878906
Iteration 2800, Epoch 4, Loss: 1.4402287006378174, Accuracy: 49.22030258178711, Val Loss: 1.4516634941101074, Val Accuracy: 47.900001525878906
Iteration 3500, Epoch 5, Loss: 1.3890063762664795, Accuracy: 50.790191650390625, Val Loss: 1.4422556161880493, Val Accuracy: 50.099998474121094
Iteration 4200, Epoch 6, Loss: 1.3575228452682495, Accuracy: 52.046836853027344, Val Loss: 1.4154597520828247, Val Accuracy: 50.30000305175781
Iteration 4900, Epoch 7, Loss: 1.

In [0]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc7 =  tf.keras.layers.Dropout(0.2)
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc8 =  tf.keras.layers.Dropout(0.2)
        self.fc3 = tf.keras.layers.Dense(96, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc9 =  tf.keras.layers.Dropout(0.2)
        self.fc4 = tf.keras.layers.Dense(96, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc10 =  tf.keras.layers.Dropout(0.2)
        self.fc5 = tf.keras.layers.Dense(96, activation='sigmoid',
                                   kernel_initializer=initializer)        
        self.fc11 =  tf.keras.layers.Dropout(0.2)
        self.fc6 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer)   
        self.fc12 =  tf.keras.layers.Dropout(0.2)
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc7(x)
        x = self.fc2(x)
        x = self.fc8(x)
        x = self.fc3(x)
        x = self.fc9(x)
        x = self.fc4(x)
        x = self.fc10(x)
        x = self.fc5(x)
        x = self.fc11(x)
        x = self.fc6(x)
        x = self.fc12(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

Iteration 0, Epoch 1, Loss: 6.434884548187256, Accuracy: 0.0, Val Loss: 4.456222057342529, Val Accuracy: 0.0
Iteration 700, Epoch 1, Loss: 4.688473224639893, Accuracy: 24.382577896118164, Val Loss: 1.7971866130828857, Val Accuracy: 36.39999771118164
Iteration 1400, Epoch 2, Loss: 4.4786481857299805, Accuracy: 30.939960479736328, Val Loss: 1.7016263008117676, Val Accuracy: 40.900001525878906
Iteration 2100, Epoch 3, Loss: 4.4306159019470215, Accuracy: 33.416629791259766, Val Loss: 1.6937204599380493, Val Accuracy: 41.20000076293945
Iteration 2800, Epoch 4, Loss: 4.371032238006592, Accuracy: 34.38431930541992, Val Loss: 1.690187931060791, Val Accuracy: 41.400001525878906
Iteration 3500, Epoch 5, Loss: 4.310122489929199, Accuracy: 35.33323669433594, Val Loss: 1.6263463497161865, Val Accuracy: 41.29999923706055
Iteration 4200, Epoch 6, Loss: 4.354306697845459, Accuracy: 35.827999114990234, Val Loss: 1.6064870357513428, Val Accuracy: 45.10000228881836
Iteration 4900, Epoch 7, Loss: 4.424750

In [0]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: Yes
    Regularization Usage: No
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',                                       
                                   kernel_initializer=initializer)
        self.fc7 =  tf.keras.layers.BatchNormalization(axis = 1)
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc8 =  tf.keras.layers.BatchNormalization(axis = 1)

        self.fc3 = tf.keras.layers.Dense(96, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc9 =  tf.keras.layers.BatchNormalization(axis = 1)
        
        self.fc4 = tf.keras.layers.Dense(96, activation='tanh', 
                                   kernel_initializer=initializer)
        self.fc10 =  tf.keras.layers.BatchNormalization(axis = 1)
        
        self.fc5 = tf.keras.layers.Dense(96, activation='sigmoid',
                                   kernel_initializer=initializer)
        self.fc11 =  tf.keras.layers.BatchNormalization(axis = 1)
        
        self.fc6 = tf.keras.layers.Dense(96, activation='softmax',                                      
                                   kernel_initializer=initializer)
        self.fc12 =  tf.keras.layers.BatchNormalization(axis = 1)
        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc7(x)
        x = self.fc2(x)
        x = self.fc8(x)
        x = self.fc3(x)
        x = self.fc9(x)        
        x = self.fc4(x)
        x = self.fc10(x)
        x = self.fc5(x)
        x = self.fc11(x)
        x = self.fc6(x)  
        x = self.fc12(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

Iteration 0, Epoch 1, Loss: 13.219505310058594, Accuracy: 1.5625, Val Loss: 4.530873775482178, Val Accuracy: 0.10000000149011612
Iteration 700, Epoch 1, Loss: 7.949471473693848, Accuracy: 14.325517654418945, Val Loss: 5.933352470397949, Val Accuracy: 14.0
Iteration 1400, Epoch 2, Loss: 4.8681254386901855, Accuracy: 20.01722526550293, Val Loss: 5.538185119628906, Val Accuracy: 13.899999618530273
Iteration 2100, Epoch 3, Loss: 3.6516146659851074, Accuracy: 17.35775375366211, Val Loss: 3.7411880493164062, Val Accuracy: 10.899999618530273
Iteration 2800, Epoch 4, Loss: 3.0616986751556396, Accuracy: 17.647241592407227, Val Loss: 2.7608256340026855, Val Accuracy: 19.69999885559082
Iteration 3500, Epoch 5, Loss: 2.937623977661133, Accuracy: 20.215961456298828, Val Loss: 2.9398903846740723, Val Accuracy: 12.300000190734863
Iteration 4200, Epoch 6, Loss: 2.1914775371551514, Accuracy: 19.604951858520508, Val Loss: 2.188457489013672, Val Accuracy: 19.400001525878906
Iteration 4900, Epoch 7, Loss:

In [77]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',                                       
                                   kernel_initializer=initializer)
        self.fc7 =  tf.keras.layers.Dropout(0.2)
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc8 =  tf.keras.layers.Dropout(0.2)

        self.fc3 = tf.keras.layers.Dense(96, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc9 =  tf.keras.layers.Dropout(0.2)
        
        self.fc4 = tf.keras.layers.Dense(96, activation='tanh', 
                                   kernel_initializer=initializer)
        self.fc10 =  tf.keras.layers.Dropout(0.2)
        
        self.fc5 = tf.keras.layers.Dense(96, activation='sigmoid',
                                   kernel_initializer=initializer)
        self.fc11 =  tf.keras.layers.Dropout(0.2)
        
        self.fc6 = tf.keras.layers.Dense(96, activation='softmax',                                      
                                   kernel_initializer=initializer)
        self.fc12 =  tf.keras.layers.Dropout(0.2)
        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc7(x)
        x = self.fc2(x)
        x = self.fc8(x)
        x = self.fc3(x)
        x = self.fc9(x)        
        x = self.fc4(x)
        x = self.fc10(x)
        x = self.fc5(x)
        x = self.fc11(x)
        x = self.fc6(x)  
        x = self.fc12(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 7.134275913238525, Accuracy: 0.0, Val Loss: 4.6618852615356445, Val Accuracy: 0.0
Iteration 700, Epoch 1, Loss: 4.712298393249512, Accuracy: 22.416635513305664, Val Loss: 1.8301550149917603, Val Accuracy: 34.900001525878906
Iteration 1400, Epoch 2, Loss: 4.504629611968994, Accuracy: 30.14836311340332, Val Loss: 1.7332100868225098, Val Accuracy: 39.79999923706055
Iteration 2100, Epoch 3, Loss: 4.438045978546143, Accuracy: 32.5589714050293, Val Loss: 1.6576486825942993, Val Accuracy: 43.5
Iteration 2800, Epoch 4, Loss: 4.357651710510254, Accuracy: 34.06875228881836, Val Loss: 1.649827480316162, Val Accuracy: 41.80000305175781
Iteration 3500, Epoch 5, Loss: 4.396771430969238, Accuracy: 35.38539505004883, Val Loss: 1.672469139099121, Val Accuracy: 43.5
Iteration 4200, Epoch 6, Loss: 4.3443121910095215, Accuracy: 35.864925384521484, Val Loss: 1.592858076095581, Val Accuracy: 45.0
Iteration 4900, Epoch 7, Loss: 4.430395603179932, A

In [78]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc3 = tf.keras.layers.Dense(96, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc4 = tf.keras.layers.Dense(96, activation='tanh',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))
        self.fc5 = tf.keras.layers.Dense(96, activation='sigmoid',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))        
        self.fc6 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer, kernel_regularizer = regularizers.l2(0.01))        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
  
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=False)

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 4.866191864013672, Accuracy: 0.0, Val Loss: 4.631097316741943, Val Accuracy: 0.0
Iteration 700, Epoch 1, Loss: 1.9377410411834717, Accuracy: 32.56062698364258, Val Loss: 1.641908884048462, Val Accuracy: 43.20000076293945
Iteration 1400, Epoch 2, Loss: 1.629835844039917, Accuracy: 42.35755157470703, Val Loss: 1.569594383239746, Val Accuracy: 46.29999923706055
Iteration 2100, Epoch 3, Loss: 1.504233717918396, Accuracy: 46.75374984741211, Val Loss: 1.5463755130767822, Val Accuracy: 45.89999771118164
Iteration 2800, Epoch 4, Loss: 1.4029607772827148, Accuracy: 50.937496185302734, Val Loss: 1.5323801040649414, Val Accuracy: 46.70000076293945
Iteration 3500, Epoch 5, Loss: 1.3180736303329468, Accuracy: 53.742061614990234, Val Loss: 1.5497726202011108, Val Accuracy: 47.5
Iteration 4200, Epoch 6, Loss: 1.2424312829971313, Accuracy: 57.03125, Val Loss: 1.5363303422927856, Val Accuracy: 47.70000076293945
Iteration 4900, Epoch 7, Loss: 

In [79]:
""" Custom network with six fully connected layers. 
    Activation Functions Used: Relu, relu, PRrelu, tanh, sigmoid, softmax
    Optimizers Used: Adam
    Learning Rate = 0.003
    Weight Initializer = xavier
    Batch Normalization Usage: No
    Regularization Usage: Yes -- Data Augmentation
"""

class CustomNet(tf.keras.Model):
    def __init__(self, hidden_size, num_classes):
        super(CustomNet, self).__init__()        
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
       

        initializer = tf.initializers.glorot_uniform(seed=None)
        self.fc1 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc2 = tf.keras.layers.Dense(96, activation='relu',
                                   kernel_initializer=initializer)
        self.fc3 = tf.keras.layers.Dense(96, activation=tf.keras.layers.PReLU(),
                                   kernel_initializer=initializer)
        self.fc4 = tf.keras.layers.Dense(96, activation='tanh',
                                   kernel_initializer=initializer)
        self.fc5 = tf.keras.layers.Dense(96, activation='sigmoid',
                                   kernel_initializer=initializer)        
        self.fc6 = tf.keras.layers.Dense(96, activation='softmax',
                                   kernel_initializer=initializer)        
        self.flatten = tf.keras.layers.Flatten()
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################      
        
    def call(self, x, training=False):
        ############################################################################
        # TODO: Construct a model that performs well on CIFAR-10                   #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = self.fc6(x)
        return x
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                            END OF YOUR CODE                              #
        ############################################################################
        

device = '/device:CPU:0'   # Change this to a CPU/GPU as you wish!
#device = '/cpu:0'        # Change this to a CPU/GPU as you wish!
print_every = 700
num_epochs = 10
input_size, hidden_size, num_classes = 50, 42, 10

model = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return CustomNet(hidden_size, num_classes)

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

X_train, y_train = load_datagen(X_train, y_train)
train_dset = Dataset(X_train, y_train, batch_size=64, shuffle=True)
  
train_part34(model_init_fn, optimizer_init_fn, num_epochs=num_epochs, is_training=True)

Iteration 0, Epoch 1, Loss: 4.4269585609436035, Accuracy: 1.5625, Val Loss: 4.265326023101807, Val Accuracy: 3.5
Iteration 700, Epoch 1, Loss: 1.9244894981384277, Accuracy: 32.353336334228516, Val Loss: 1.6823586225509644, Val Accuracy: 40.20000076293945
Iteration 1400, Epoch 2, Loss: 1.640328049659729, Accuracy: 41.67241668701172, Val Loss: 1.5962867736816406, Val Accuracy: 45.599998474121094
Iteration 2100, Epoch 3, Loss: 1.5141018629074097, Accuracy: 46.38447952270508, Val Loss: 1.563576579093933, Val Accuracy: 45.79999923706055
Iteration 2800, Epoch 4, Loss: 1.4098576307296753, Accuracy: 50.95000457763672, Val Loss: 1.5241942405700684, Val Accuracy: 44.400001525878906
Iteration 3500, Epoch 5, Loss: 1.3240249156951904, Accuracy: 53.91166305541992, Val Loss: 1.5048496723175049, Val Accuracy: 47.60000228881836
Iteration 4200, Epoch 6, Loss: 1.2543821334838867, Accuracy: 56.220115661621094, Val Loss: 1.5359407663345337, Val Accuracy: 48.69999694824219
Iteration 4900, Epoch 7, Loss: 1.1

In [0]:
# Application of batch normalization and regularization usage on fully connected 6 layer network ends.

In [73]:
""" Model Ensemble 


From the models trained above, I identified six of the best models.
It turned out that there were two models from each of the 3, 4 and 6 layer networks.
For the model ensemble, I use models with no batch normalization and regularization since their usage did not prove to make the models any better.
In order to compute the decision, I use the statistical mode and take the most frequently occuring decision as the output.

"""

X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()

# model# 1
input_size, hidden_size, num_classes = 128, 128, 10

model1 = CustomNet(hidden_size, num_classes) 

def model_init_fn():    
    return model1

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

  
train_part34(model_init_fn, optimizer_init_fn, num_epochs= num_epochs, is_training=False)

pred1 = model1.predict(X_test)

# model# 2
input_size, hidden_size, num_classes = 256, 256, 10

model2 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return model2

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs= num_epochs, is_training=False)

pred2 = model2.predict(X_test)

# model# 3
input_size, hidden_size, num_classes = 32, 32, 10

model3 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return model3

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs= num_epochs, is_training=False)

pred3 = model3.predict(X_test)


# model# 4
input_size, hidden_size, num_classes = 96, 96, 10

model4 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return model4

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.SGD(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs= num_epochs, is_training=False)

pred4 = model4.predict(X_test)

 
# model# 5
input_size, hidden_size, num_classes = 128, 128, 10

model5 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return model5

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

train_part34(model_init_fn, optimizer_init_fn, num_epochs= num_epochs, is_training=False)

pred5 = model5.predict(X_test)


#model# 6
input_size, hidden_size, num_classes = 96, 96, 10

model6 = CustomNet(hidden_size, num_classes)

def model_init_fn():
    return model6

def optimizer_init_fn():
    learning_rate = 1e-3
    return tf.keras.optimizers.Adam(learning_rate) 

  
train_part34(model_init_fn, optimizer_init_fn, num_epochs= num_epochs, is_training=False)

pred6 = model6.predict(X_test)


final_pred = np.array([])
for i in range(0,len(X_test)):
    final_pred = np.append(final_pred, stats.mode([pred1[i], pred2[i], pred3[i], pred4[i], pred5[i], pred6[i]]))

print(final_pred)        

X_trian shape (50000, 32, 32, 3)
Iteration 0, Epoch 1, Loss: 2.378326654434204, Accuracy: 9.375, Val Loss: 2.406445264816284, Val Accuracy: 15.600000381469727
Iteration 700, Epoch 1, Loss: 1.8162766695022583, Accuracy: 35.96647644042969, Val Loss: 1.706996202468872, Val Accuracy: 44.20000076293945
Iteration 1400, Epoch 2, Loss: 1.5996094942092896, Accuracy: 43.53312301635742, Val Loss: 1.669964075088501, Val Accuracy: 45.29999923706055
Iteration 2100, Epoch 3, Loss: 1.4700086116790771, Accuracy: 48.0296516418457, Val Loss: 1.7992178201675415, Val Accuracy: 43.29999923706055
Iteration 2800, Epoch 4, Loss: 1.3608421087265015, Accuracy: 51.78750228881836, Val Loss: 1.850010633468628, Val Accuracy: 44.900001525878906
Iteration 3500, Epoch 5, Loss: 1.2544454336166382, Accuracy: 55.365909576416016, Val Loss: 1.833878993988037, Val Accuracy: 46.900001525878906
Iteration 4200, Epoch 6, Loss: 1.1603496074676514, Accuracy: 58.952354431152344, Val Loss: 1.9971764087677002, Val Accuracy: 45.0
Iter

[0.01618557 0.00836718 0.05860651 ... 1.         1.         1.        ]


## Test set 

Now that we've gotten a result we're happy with, we test our final model on the test set (which you should store in best_model). Think about how this compares to your validation set accuracy.

In [76]:
best_model = model5
learning_rate = 1e-3

best_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss='sparse_categorical_crossentropy',
              metrics=[tf.keras.metrics.sparse_categorical_accuracy])
test_loss, test_accuracy = best_model.evaluate(X_test, y_test)
print('test_loss: {}, test_accuracy: {} '.format(test_loss, test_accuracy))

test_loss: 2.7035632587432863, test_accuracy: 0.4453999996185303 
