# What's this TensorFlow business?

You've written a lot of code in this assignment to provide a whole host of neural network functionality. Dropout, Batch Norm, and 2D convolutions are some of the workhorses of deep learning in computer vision. You've also worked hard to make your code efficient and vectorized.

For the last part of this assignment, though, we're going to leave behind your beautiful codebase and instead migrate to one of two popular deep learning frameworks: in this instance, TensorFlow (or PyTorch, if you switch over to that notebook)

#### What is it?
TensorFlow is a system for executing computational graphs over Tensor objects, with native support for performing backpropogation for its Variables. In it, we work with Tensors which are n-dimensional arrays analogous to the numpy ndarray.

#### Why?

* Our code will now run on GPUs! Much faster training. Writing your own modules to run on GPUs is beyond the scope of this class, unfortunately.
* We want you to be ready to use one of these frameworks for your project so you can experiment more efficiently than if you were writing every feature you want to use by hand. 
* We want you to stand on the shoulders of giants! TensorFlow and PyTorch are both excellent frameworks that will make your lives a lot easier, and now that you understand their guts, you are free to use them :) 
* We want you to be exposed to the sort of deep learning code you might run into in academia or industry. 

## How will I learn TensorFlow?

TensorFlow has many excellent tutorials available, including those from [Google themselves](https://www.tensorflow.org/get_started/get_started).

Otherwise, this notebook will walk you through much of what you need to do to train models in TensorFlow. See the end of the notebook for some links to helpful tutorials if you want to learn more or need further clarification on topics that aren't fully explained here.


# Table of Contents

This notebook has 5 parts. We will walk through TensorFlow at three different levels of abstraction, which should help you better understand it and prepare you for working on your project.

1. Preparation: load the CIFAR-10 dataset.
2. Barebone TensorFlow: we will work directly with low-level TensorFlow graphs. 
3. Keras Model API: we will use `tf.keras.Model` to define arbitrary neural network architecture. 
4. Keras Sequential API: we will use `tf.keras.Sequential` to define a linear feed-forward network very conveniently. 
5. CIFAR-10 open-ended challenge: please implement your own network to get as high accuracy as possible on CIFAR-10. You can experiment with any layer, optimizer, hyperparameters or other advanced features. 

Here is a table of comparison:

| API           | Flexibility | Convenience |
|---------------|-------------|-------------|
| Barebone      | High        | Low         |
| `tf.keras.Model`     | High        | Medium      |
| `tf.keras.Sequential` | Low         | High        |

# Part I: Preparation

First, we load the CIFAR-10 dataset. This might take a few minutes to download the first time you run it, but after that the files should be cached on disk and loading should be faster.

In previous parts of the assignment we used CS231N-specific code to download and read the CIFAR-10 dataset; however the `tf.keras.datasets` package in TensorFlow provides prebuilt utility functions for loading many common datasets.

For the purposes of this assignment we will still write our own code to preprocess the data and iterate through it in minibatches. The `tf.data` package in TensorFlow provides tools for automating this process, but working with this package adds extra complication and is beyond the scope of this notebook. However using `tf.data` can be much more efficient than the simple approach used in this notebook, so you should consider using it for your project.

In [1]:
import os
import tensorflow as tf
import numpy as np
import math
import timeit
import matplotlib.pyplot as plt

tf.logging.set_verbosity(tf.logging.INFO)

%matplotlib inline

  from ._conv import register_converters as _register_converters


In [7]:
def load_cifar10(num_training=49000, num_validation=1000, num_test=10000):
    """
    Fetch the CIFAR-10 dataset from the web and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.
    """
    # Load the raw CIFAR-10 dataset and use appropriate data types and shapes
    cifar10 = tf.keras.datasets.cifar10.load_data()
    (X_train, y_train), (X_test, y_test) = cifar10
    X_train = np.asarray(X_train, dtype=np.float32)
    y_train = np.asarray(y_train, dtype=np.int32).flatten()
    X_test = np.asarray(X_test, dtype=np.float32)
    y_test = np.asarray(y_test, dtype=np.int32).flatten()

    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean pixel and divide by std
    mean_pixel = X_train.mean(axis=(0, 1, 2), keepdims=True)
    std_pixel = X_train.std(axis=(0, 1, 2), keepdims=True)
    X_train = (X_train - mean_pixel) / std_pixel
    X_val = (X_val - mean_pixel) / std_pixel
    X_test = (X_test - mean_pixel) / std_pixel

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
NHW = (0, 1, 2)
X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape, y_train.dtype)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,) int32
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)


### Preparation: Dataset object

For our own convenience we'll define a lightweight `Dataset` class which lets us iterate over data and labels. This is not the most flexible or most efficient way to iterate through data, but it will serve our purposes.

In [8]:
class Dataset(object):
    def __init__(self, X, y, batch_size, shuffle=False):
        """
        Construct a Dataset object to iterate over data X and labels y
        
        Inputs:
        - X: Numpy array of data, of any shape
        - y: Numpy array of labels, of any shape but with y.shape[0] == X.shape[0]
        - batch_size: Integer giving number of elements per minibatch
        - shuffle: (optional) Boolean, whether to shuffle the data on each epoch
        """
        assert X.shape[0] == y.shape[0], 'Got different numbers of data and labels'
        self.X, self.y = X, y
        self.batch_size, self.shuffle = batch_size, shuffle

    def __iter__(self):
        N, B = self.X.shape[0], self.batch_size
        idxs = np.arange(N)
        if self.shuffle:
            np.random.shuffle(idxs)
        return iter((self.X[i:i+B], self.y[i:i+B]) for i in range(0, N, B))


train_dset = Dataset(X_train, y_train, batch_size=64, shuffle=True)
val_dset = Dataset(X_val, y_val, batch_size=64, shuffle=False)
test_dset = Dataset(X_test, y_test, batch_size=64)

In [9]:
# We can iterate through a dataset like this:
for t, (x, y) in enumerate(train_dset):
    print(t, x.shape, y.shape)
    if t > 5: break

0 (64, 32, 32, 3) (64,)
1 (64, 32, 32, 3) (64,)
2 (64, 32, 32, 3) (64,)
3 (64, 32, 32, 3) (64,)
4 (64, 32, 32, 3) (64,)
5 (64, 32, 32, 3) (64,)
6 (64, 32, 32, 3) (64,)


In [10]:
# Set up some global variables
USE_GPU = False

if USE_GPU:
    device = '/device:GPU:0'
else:
    device = '/cpu:0'

# Constant to control how often we print when training models
print_every = 100

print('Using device: ', device)

Using device:  /cpu:0


In [13]:
from cs231n.my_cnn_tf.my_cnn import my_initial_model

def test_my_initial_model():
    tf.reset_default_graph()
    
    params = {}
    params['conv_sizes'] = [3]
    params['channels'] = [16]
    params['conv_wt_std'] = 5e-2
    params['fc_sizes'] = [200]
    params['num_classes'] = 10
    params['dropout_p'] = 0.5
    params['weight_decay_reg'] = 0.1
    params['is_training'] = 1
    
    model = my_initial_model
    with tf.device(device):
        x = tf.zeros((64, 32, 32, 3))
        scores = model(x, params)
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        scores_np = sess.run(scores)
        print(scores_np.shape)

test_my_initial_model()

NameError: name 'tf' is not defined

In [104]:
def loss_func(logits, labels):
    # Calculate the average cross entropy loss across the batch.
    labels = tf.cast(labels, tf.int64)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, 
                                                                   name='cross_entropy_per_example')
    cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
    tf.add_to_collection('losses', cross_entropy_mean)

    # The total loss is defined as the cross entropy loss plus all of the weight
    # decay terms (L2 loss).
    return tf.add_n(tf.get_collection('losses'), name='total_loss')

In [122]:
def _train_step(total_loss, training_params, global_step):
    # Variables that affect learning rate.
    num_batches_per_epoch = training_params['ex_per_epoch'] / training_params['batch_size']
    decay_steps = int(num_batches_per_epoch * training_params['num_epochs_per_decay'])
    
    # Decay the learning rate exponentially based on the number of steps.
    learning_rate = tf.train.exponential_decay(training_params['initial_lr'],
                               global_step,
                               decay_steps,
                               training_params['lr_decay'],
                               staircase=False)
    
    # Update.
    opt = training_params['optimizer'](learning_rate)
    train_op = opt.minimize(total_loss, global_step=global_step)
    
    return train_op

In [179]:
def _check_accuracy(sess, dset, x, scores, is_training=None):
    num_correct, num_samples = 0, 0
    for x_batch, y_batch in dset:
        feed_dict = {x: x_batch, is_training: 0}
        scores_np = sess.run(scores, feed_dict=feed_dict)
        y_pred = scores_np.argmax(axis=1)
        num_samples += x_batch.shape[0]
        num_correct += (y_pred == y_batch).sum()
    acc = float(num_correct) / num_samples
    print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

In [198]:
def train(batch_size=64, num_epochs=10, optimizer=tf.train.AdamOptimizer, initial_lr=1e-3, lr_decay=0.5, print_every=100):
    
    # Loading Data
    train_dset = Dataset(X_train, y_train, batch_size=batch_size, shuffle=True)
    val_dset = Dataset(X_val, y_val, batch_size=batch_size, shuffle=False)
    test_dset = Dataset(X_test, y_test, batch_size=batch_size)
    
    params = {}
    params['conv_sizes'] = [3]
    params['channels'] = [16]
    params['conv_wt_std'] = 5e-2
    params['fc_sizes'] = [16*16*16, 200]
    params['num_classes'] = 10
    params['dropout_p'] = 0.5
    params['weight_decay_reg'] = 0.1
    params['is_training'] = 1
    
    training_params = {}
    training_params['num_epochs'] = num_epochs
    training_params['ex_per_epoch'] = train_dset.y.shape[0]
    training_params['batch_size'] = batch_size
    training_params['num_epochs_per_decay'] = 1
    training_params['num_batches_per_epoch'] = int(training_params['ex_per_epoch'] / training_params['batch_size'])
    training_params['num_ex_per_decay'] = training_params['ex_per_epoch'] * training_params['batch_size'] * training_params['num_epochs_per_decay']
    training_params['initial_lr'] = initial_lr
    training_params['lr_decay'] = lr_decay
    training_params['optimizer'] = optimizer
    training_params['print_every'] = print_every
    training_params['logs_path'] = '/media/advait/DATA/Advait/Handouts_and_assignments/Ubuntu-Documents/CNN-Stanford/spring1718_assignment2_v2/TestLog'
    
    
    tf.reset_default_graph()
    with tf.device('/cpu:0'):
        x = tf.placeholder(tf.float32, [None, 32, 32, 3])
        y = tf.placeholder(tf.int32, [None])
        # global_step = tf.Variable(0, trainable=False)
        global_step = tf.train.get_or_create_global_step()
    
        # Build a Graph that computes the logits predictions from the
        # inference model.
        logits = my_initial_model(x, params)

        # Calculate loss.
        loss = loss_func(logits, y)

        # Build a Graph that trains the model with one batch of examples and
        # updates the model parameters.
        training_op = _train_step(loss, training_params, global_step)
        
        # check_accuracy
        acc_op = tf.metrics.accuracy(labels=y, predictions=tf.argmax(logits, 1))
        # acc_train_op = _check_accuracy(logits, y)
        # correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(labels, 1))
        # accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        
        # Summary ops
        tf.summary.scalar("loss", loss)
        
        merged_summary_op = tf.summary.merge_all()

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        
        # op to writes log in TensorBoard
        summary_writer = tf.summary.FileWriter(training_params['logs_path'], graph=tf.get_default_graph())
        
        loss_hist = []
        av_loss = 0.0
        # Step counter
        t = 0
        for epoch in range(training_params['num_epochs']):
            print('Starting epoch %d' % epoch)
            for x_np, y_np in train_dset:
                feed_dict = {x: x_np, y: y_np}
                loss_val, _, summary = sess.run([loss, training_op, merged_summary_op], feed_dict=feed_dict)
                av_loss += loss_val
                
                if t % training_params['print_every'] == 0:
                    # Add to tensorboard
                    summary_writer.add_summary(summary, epoch * training_params['num_batches_per_epoch'] + t)
                    print("Loss after %d iterations:" % t,  "{:.9f}".format(av_loss / training_params['print_every']))
                    print("Training Acc: ")
                    _check_accuracy(sess, train_dset, x, logits, is_training=tf.constant(0))
                    print("Validation Acc: ")
                    _check_accuracy(sess, val_dset, x, logits, is_training=tf.constant(0))
                    # print("training accuracy: %f" % acc_op.eval(feed_dict={x:train_dset.X, y:train_dset.y}))
                    loss_hist.append(av_loss / training_params['print_every'])
                    av_loss = 0.0
                t += 1
            

In [199]:
train(batch_size=128, num_epochs=6, optimizer=tf.train.RMSPropOptimizer, initial_lr=1e-5, lr_decay=0.1, print_every=100)

Starting epoch 0
('Loss after 0 iterations:', '0.023617277')
Training Acc: 
Got 5321 / 49000 correct (10.86%)
Validation Acc: 
Got 106 / 1000 correct (10.60%)
('Loss after 100 iterations:', '2.335769985')
Training Acc: 
Got 6089 / 49000 correct (12.43%)
Validation Acc: 
Got 137 / 1000 correct (13.70%)
('Loss after 200 iterations:', '2.247766931')
Training Acc: 
Got 10481 / 49000 correct (21.39%)
Validation Acc: 
Got 236 / 1000 correct (23.60%)
('Loss after 300 iterations:', '2.160851319')
Training Acc: 
Got 12064 / 49000 correct (24.62%)
Validation Acc: 
Got 252 / 1000 correct (25.20%)
Starting epoch 1
('Loss after 400 iterations:', '2.110767949')
Training Acc: 
Got 12700 / 49000 correct (25.92%)
Validation Acc: 
Got 252 / 1000 correct (25.20%)
('Loss after 500 iterations:', '2.086508203')
Training Acc: 
Got 13089 / 49000 correct (26.71%)
Validation Acc: 
Got 279 / 1000 correct (27.90%)
('Loss after 600 iterations:', '2.070028788')
Training Acc: 
Got 13289 / 49000 correct (27.12%)
Vali

KeyboardInterrupt: 