## Model Training Example
This notebook demonstrates saving and loading a tensorflow model into and out of a class.

It implements a class to store the basic skeleton of a tensorflow model, but leaves out any training or prediction methods.

In [1]:
import tensorflow as tf
import numpy as np

---
### Generate Data

In [2]:
from sklearn import datasets

In [3]:
iris_dataset = datasets.load_iris()

In [4]:
iris_dataset.data.shape

(150, 4)

In [5]:
iris_dataset.target.shape

(150,)

### Model Class

We can use classes to encapsulate tensorflow models. The below skeleton shows one way of using classes with tensorflow.

The Model class has methods, variables, and properties that capture both the graph and the tensorflow session

#### Tensorflow Graph

A tensorflow graph is a computational graph of different tensorflow operations. It defines the computation and how different operations and tensors relate, but it doesn't actually do the computation or store the values of the variables. All of that magic happens within the tensorflow session.

#### Tensorflow Session

A tensorflow session is the context where values for tensorflow variables are instantiated and computations are run. So if you are saving a model's weights, you are actually saving the weights of the tensorflow session. If you are loading a model's weights, you need to load them into a session. When variables are initialized, that has to happen within a session. In a way, the graph is stateless. State is stored in sessions. The session also takes care of running computations, so if you are running training, those need to be run in the session.

A session is instantiated with a graph, typically the current default graph. A session is only able to run computations on the graph that is tied to the session.

In [6]:
class DeepLearningModel():
    def __init__():
        return
    
    def gen_uniform_random_weights(self, k_out, k_in, scale, dtype=np.float32):
        """
        Returns weights of shape (k_in, k_out) initialized between [-scale, scale]
        """
        return ((np.random.rand(k_in, k_out) * 2 - 1) * scale).astype(dtype)

    def gen_random_weights_tanh(self, k_out, k_in, dtype=np.float32):
        scale = (6. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_random_weights_sigmoid(self, k_out, k_in, dtype=np.float32):
        scale = 4. * (6. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_random_weights_reLu(self, k_out, k_in, dtype=np.float32):
        scale = (2. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_biases(self, k, dtype=np.float32):
        """
        Initialize biases as zero.
        """
        return np.zeros((k, ), dtype=dtype)
    
    def clip_gradient(self, grad, magnitude=1.0):
        """returns a clipped gradient, where it is between [-magnitude and magnitude]"""
        magnitude = abs(magnitude)
        return tf.maximum(tf.minimum(grad, magnitude), - magnitude)

In [15]:
cell = tf.contrib.rnn.BasicRNNCell(10)

In [16]:
cell.weights

[]

In [18]:
cell.variables

[]

In [9]:
class RecurrentNetworkModel(DeepLearningModel):
    """
    Tutorial Model
    """
    
    def __init__(self, num_layers, k_rnncell, k_input, time_steps):
        """
        args:
            num_layers: number of hidden layers
            k_hidden: number of units in the hidden layers
            k_input: dimensionality of the input
            k_softmax: dimensionality of the output layer
        """
        self._graph = None
        self._session = None
        self.num_layers = num_layers
        self.k_rnncell = k_rnncell
        self.k_input = k_input
        self.time_steps = time_steps
        
        self._merged_training_summary = None
        self._merged_validation_summary = None
    
    
    def load_model(self, model_filename):
        with self.graph.as_default():
            model_saver = tf.train.Saver()
        
        self._session = tf.Session(graph=self.graph)
        model_saver.restore(self._session, model_filename)
        return
    
    
    def save_model(self, model_filename):
        with self.graph.as_default():
            model_saver = tf.train.Saver()
            
        model_saver.save(self.session, model_filename)
        
    def create_graph(self):
        self.cells = {}
                        
        self._graph = tf.Graph()
        with self._graph.as_default():
            with tf.name_scope("inputs"):
                self.data = tf.placeholder(tf.float32, shape=(None, self.time_steps, self.k_rnncell))
            with tf.name_scope("recurrent_layers"):
                cells = []
                for layer in range(1, num_layers + 1):
                    cell = tf.contrib.rnn.BasicRNNCell(self.k_rnncell)
                    cells.append(cell)
                rnn_cells = tf.contrib.rnn.MultiRNNCell(cells)
            
            with tf.name_scope("cost"):
                with tf.name_scope("regularization"):
                    self.L2_reg = tf.placeholder(tf.float32, name="L2_reg")
                    for layer in range(1, self.num_layers + 1):
                        if layer == 1:
                            k_in = self.k_input
                            self.cost_L2 = self.L2_reg * tf.reduce_mean(tf.square(self.W[(layer, layer - 1)]))
                        else:
                            k_in = self.k_hidden
                            self.cost_L2 = self.cost_L2 + self.L2_reg * tf.reduce_mean(tf.square(self.W[(layer, layer - 1)]))
                    self.cost_L2 = tf.identity(self.cost_L2, 'cost_L2_regularization')
                
                with tf.name_scope("error"):
                    self.cross_entropy_error = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.Y, logits=self.Z["softmax"]))
                self.total_cost = tf.add(self.cost_L2, self.cross_entropy_error)
            
            with tf.name_scope("optimization"):
                self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=self.adam_beta1, beta2=self.adam_beta2, epsilon=self.adam_epsilon)
                self.grads_and_vars = self.optimizer.compute_gradients(self.total_cost)                    
                self.clipped_grads_and_vars = [(self.clip_gradient(gv[0]), gv[1]) for gv in self.grads_and_vars]
                self.update_op = self.optimizer.apply_gradients(self.clipped_grads_and_vars)
                
            self.init_op = tf.global_variables_initializer()
        return
            
    def create_tensorboard_summaries(self):
        with self.graph.as_default():
            with tf.name_scope("summaries"):
                tf.summary.scalar('cross_entropy_error', self.cross_entropy_error, collections=['train'])
                tf.summary.scalar('cost_L2_regularization', self.cost_L2, collections=['train'])

                tf.summary.scalar('cross_entropy_error_validation', self.cross_entropy_error, collections=['validation'])

                for layer in range(1, self.num_layers + 1):
                    tf.summary.histogram("W_%i_%i" % (layer, layer - 1), self.W[(layer, layer - 1)], collections=['train'])
                    tf.summary.histogram("b_%i" % (layer, ), self.b[layer], collections=['train'])
                for layer in ["softmax"]:
                    tf.summary.histogram("W_%s_%i" % (layer, self.num_layers), self.W[(layer, self.num_layers)], collections=['train'])
                    tf.summary.histogram("b_%s" % (layer, ), self.b[layer], collections=['train'])

                self._merged_training_summary = tf.summary.merge_all(key='train')
                self._merged_validation_summary = tf.summary.merge_all(key='validation')
        return
    
    def create_tensorboard_writer(self, tensorboard_directory="./"):
        """I'm not sure if this needs to be within a session"""
        self._tensorboard_writer = tf.summary.FileWriter(tensorboard_directory, graph=self.graph)
    
    def write_graph(self):
        self.tensorboard_writer.add_graph(self.graph)
        return
    
    def init_model(self, adam_beta1=0.9, adam_beta2=0.999):
        self.session.run(self.init_op, 
                         feed_dict={
                             self.adam_beta1: adam_beta1,
                             self.adam_beta2: adam_beta2
                         })
    
    def train_model(self, X, Y, learning_rate=1e-2, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, 
                    input_dropout_keep_prob=1.0, hidden_dropout_keep_prob=1.0,
                    L2_reg=1e-4):
        """
        learning_rate: A Tensor or a floating point value. The learning rate.
        beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
        beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
        epsilon: A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
        """
        self.session.run(self.update_op,
                         feed_dict = {
                             self.X: X,
                             self.Y: Y,
                             self.learning_rate: learning_rate,
                             self.adam_beta1: adam_beta1,
                             self.adam_beta2: adam_beta2,
                             self.adam_epsilon: adam_epsilon,
                             self.input_dropout_keep_prob: input_dropout_keep_prob,
                             self.hidden_dropout_keep_prob: hidden_dropout_keep_prob,
                             self.L2_reg: L2_reg
                         })
        return
    
    def write_validation_summary(self, X, Y, step,
                                 L2_reg=1e-4):
        summary = self.session.run(self.merged_validation_summary,
                                   feed_dict = {
                                       self.X: X,
                                       self.Y: Y,
                                       self.input_dropout_keep_prob: 1.0,
                                       self.hidden_dropout_keep_prob: 1.0,
                                       self.L2_reg: L2_reg
                                   })
        self.tensorboard_writer.add_summary(summary, step)
        return
    
    def write_training_summary(self, X, Y, step,
                               L2_reg=1e-4):
        summary = self.session.run(self.merged_training_summary,
                                   feed_dict = {
                                       self.X: X,
                                       self.Y: Y,
                                       self.input_dropout_keep_prob: 1.0,
                                       self.hidden_dropout_keep_prob: 1.0,
                                       self.L2_reg: L2_reg
                                   })
        self.tensorboard_writer.add_summary(summary, step)
        return
    
    
    @property
    def graph(self):
        if self._graph is None:
            self.create_graph()
        return self._graph
    
    @property
    def session(self):
        if self._session is None:
            self._session = tf.Session(graph=self.graph)
        return self._session
    
    @property
    def merged_training_summary(self):
        if self._merged_training_summary is None:
            self.create_tensorboard_summaries()
        return self._merged_training_summary
    
    @property
    def merged_validation_summary(self):
        if self._merged_validation_summary is None:
            self.create_tensorboard_summaries()
        return self._merged_validation_summary
    
    @property
    def tensorboard_writer(self):
        if self._tensorboard_writer is None:
            self.create_tensorboard_writer()
        return self._tensorboard_writer

In [10]:
num_layers = 2
k_rnncell = 10
k_input = 4
time_steps = 10

In [12]:
# create a model instance with 2 hidden layers and 10 hidden units.

model_a = RecurrentNetworkModel(num_layers,  k_rnncell, k_input, time_steps)

In [13]:
model_a.graph

AttributeError: RecurrentNetworkModel instance has no attribute 'W'

### Lazy Properties

The class doesn't actually create the graph or session until the graph and session properties are called. The @property decorator functions above are used to create a graph or session if none exists.

In [None]:
model_a._graph is None

In [None]:
model_a._session is None

In [None]:
model_a.graph

In [None]:
model_a.session

In [None]:
model_a.W

In [None]:
model_a.b

In [None]:
model_a.merged_training_summary

In [None]:
model_a.merged_validation_summary

In [None]:
model_a.merged_training_summary

In [None]:
model_a.merged_validation_summary

In [None]:
model_a.init_model()

In [None]:
model_a.session.run(model_a.W)

In [None]:
model_a.session.run(model_a.b)

In [None]:
model_a.create_tensorboard_writer("./tensorboard/model_a")

In [None]:
model_a.write_graph()

### Assignment Operations

The class adds some convenience functions for assigning weights. Tensorflow can only assign values to tensor variables using assignment operations, and a combination of a placeholder and assignment operation are used to allow the assignment through a function.

In [None]:
model_a.W

In [None]:
model_a.W.keys()

In [None]:
model_a.new_W_value

In [None]:
model_a.assign_W((1, 0), np.ones((4, 10)))

In [None]:
model_a.assign_W(('softmax', 2), np.ones((10, 3)))

In [None]:
model_a.assign_W((2, 1), np.ones((10, 10)))

In [None]:
model_a.session.run(model_a.W)

In [None]:
model_a.b

In [None]:
model_a.new_b_value

In [None]:
model_a.assign_b(1, np.ones((10,)))

In [None]:
model_a.assign_b('softmax', np.ones((3, )))

In [None]:
model_a.assign_b(2, np.ones((10,)))

In [None]:
model_a.session.run(model_a.b)

In [None]:
model_a.save_model("./saved_model/test_saved_model.cpkt")

### Load the saved model into a second model

In [None]:
model_b = FeedForwardSoftMaxModel(num_layers, k_hidden, k_input, k_softmax)

In [None]:
model_b.load_model("./saved_model/test_saved_model.cpkt")

In [None]:
model_b.W

In [None]:
model_b.session.run(model_b.W)

## Model Training

In [None]:
model_c = FeedForwardSoftMaxModel(num_layers, k_hidden, k_input, k_softmax)

In [None]:
model_c.init_model()

In [None]:
model_c.session.run(model_c.b)

In [None]:
model_c.session.run(model_c.W)

In [None]:
iris_dataset.data

In [None]:
iris_dataset.target

In [None]:
model_c.create_tensorboard_writer("./tensorboard/model_c")

In [None]:
model_c.write_graph()

In [None]:
model_c.write_training_summary(iris_dataset.data, iris_dataset.target, 0)

In [None]:
model_c.write_training_summary(iris_dataset.data, iris_dataset.target, 10)

In [None]:
model_c.write_training_summary(iris_dataset.data, iris_dataset.target, 20)

In [None]:
model_c.tensorboard_writer.flush()

In [None]:
model_c.tensorboard_writer.close()

In [None]:
model_c.session.close()