# Saving and restoring states

In this notebook we see how to save and restore the variables in a computational graph. This can be used for example when working with networks that require week to be trained. 
We can also save and load only some arbitrary network variables. This can be used when doing Transfer Learning, which means to take a partial initialization of a network to train another network with different ending layers.

The network implemented here is the simple one used in the CNN notebook,

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import os.path
import math
BATCH_SIZE = 100
RESTORE_OPT = True
EPOCHS = 1000

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [2]:
import tensorflow as tf

def conv_weights(filters_size, channels_size, name):
    shape = filters_size + channels_size
    return tf.Variable(tf.truncated_normal(shape, stddev=0.1), name=name)

def conv(x, W, stride, name):
    strides_shape = [1, stride, stride, 1]
    return tf.nn.conv2d(x, W, strides_shape, padding='SAME', name=name)

def pool(x, size, stride, name):
    pool_shape = [1] + size + [1]
    strides_shape = [1, stride, stride, 1]
    return tf.nn.max_pool(x, pool_shape, strides_shape, padding='SAME', name=name)

In [3]:
x = tf.placeholder(tf.float32, [None, 28 * 28], name='input_images')
y_ = tf.placeholder(tf.float32, [None, 10], name='labels')
input_images = tf.reshape(x, [-1, 28, 28, 1])

In [4]:
W1 = conv_weights([3, 3], [1, 32], 'L1_weights')
b1 = tf.Variable(tf.constant(0.1, shape=[32]), name='L1_biases')
c1 = conv(input_images, W1, stride=2, name='L1_conv')
h1 = tf.nn.relu(tf.nn.bias_add(c1, b1), name='L1_ReLU')
p1 = pool(h1, size=[2, 2], stride=2, name='L1_pool')

In [5]:
W2 = tf.Variable(tf.truncated_normal([7 * 7 * 32, 10], stddev=1 / math.sqrt(7 * 7 * 32)), name='L2_weights')
b2 = tf.Variable(tf.constant(0.1, shape=[10]), name='L2_biases')
p1_flat = tf.reshape(p1, [-1, 7 * 7 * 32])


In [6]:
logits = tf.matmul(p1_flat, W2) + b2
y = tf.nn.softmax(logits, name='softmax')

In [7]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]), name='cross_entropy')
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1), name='correct_prediction')
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='train_accuracy')

sess = tf.InteractiveSession()


In [None]:
save_dir = 'check/'
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

In [8]:
if os.path.exists("check/checkpoint") and RESTORE_OPT:
    saver.restore(sess, "check/checkp")
    print ("Model restored")
    init_new_vars_op = tf.variables_initializer([W2,b2])
    sess.run(init_new_vars_op)
else:
    print ("New model initialized")
    init = tf.global_variables_initializer()
    sess.run(init)



Model restored


In order to save our data we need the tf.train.Saver object. When we initialize the Saver without passing arguments to its constructor, by default it will save every variable with the command save that we will later run.
Here we show how to save only the variable belonging to the first layer, by passing them in the costructor.
The same holds for the initializer object, that can be a global one and working by default on all the variables in the graph, or be called only on some of them.
It is a good practice to check the presence of previous checkpoint, before initializing the network. 

In [10]:
for i in range(EPOCHS):
    batch_xs, batch_ys = mnist.train.next_batch(BATCH_SIZE)
    _ = sess.run([train_step], feed_dict={x: batch_xs, y_: batch_ys})     
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
# Save the variables to disk.
save_path = saver.save(sess, "check/checkp")
print("Model saved in file: %s" % save_path)


0.9737
Model saved in file: check/checkp


Here we are actually saving the state, in the specified path

In [None]:
sess.close()

# Ex 2
Now let's combine what we have learnt here and in the "Dataset enrichment" notebook.
Let's create a network and feed it with the data from the CIFAR-10 dataset (by using only 2-3 classes to save training time and see faster results). As a network architecture you can use the same in the "Dataset enrichment" notebook, or create one on your own. 

Our goal in this exercise is to test if there is an effective improvement in using the operations for enriching the dataset. To do so we will:

1) Run the network by using the data from the CIFAR-10, without preprocessing, and store the network configuration that performs better on the test set after some epochs of training. To do so, write some code that 
- evaluate the accuracy on the test set every 100 iterations, and keep the maximum accuracy in memory. 
- As long as the network improves on the test set, save its current variables. 
- Stop the training phase when the network is not improving after 400 iterations. 

This technique is called "Early stopping" and is used to prevent overfitting.

2) Run the same network, but instead of inizialiting it randomly, use the configuration of the previously trained network. This time, train the network with the enriched dataset and check wether the test prediction has improved.

