## Model Example
This notebook demonstrates a model with a recurrent neural network. It uses manual unrolling of the recurrent connections.

In [1]:
import tensorflow as tf
import numpy as np

### Model Class

We can use classes to encapsulate tensorflow models. The below skeleton shows one way of using classes with tensorflow.

The Model class has methods, variables, and properties that capture both the graph and the tensorflow session

#### Tensorflow Graph

A tensorflow graph is a computational graph of different tensorflow operations. It defines the computation and how different operations and tensors relate, but it doesn't actually do the computation or store the values of the variables. All of that magic happens within the tensorflow session.

#### Tensorflow Session

A tensorflow session is the context where values for tensorflow variables are instantiated and computations are run. So if you are saving a model's weights, you are actually saving the weights of the tensorflow session. If you are loading a model's weights, you need to load them into a session. When variables are initialized, that has to happen within a session. In a way, the graph is stateless. State is stored in sessions. The session also takes care of running computations, so if you are running training, those need to be run in the session.

A session is instantiated with a graph, typically the current default graph. A session is only able to run computations on the graph that is tied to the session.

In [2]:
class DeepLearningModel():
    def __init__():
        return
    
    def gen_uniform_random_weights(self, k_out, k_in, scale, dtype=np.float32):
        """
        Returns weights of shape (k_in, k_out) initialized between [-scale, scale]
        """
        return ((np.random.rand(k_in, k_out) * 2 - 1) * scale).astype(dtype)

    def gen_random_weights_tanh(self, k_out, k_in, dtype=np.float32):
        scale = (6. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_random_weights_sigmoid(self, k_out, k_in, dtype=np.float32):
        scale = 4. * (6. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_random_weights_reLu(self, k_out, k_in, dtype=np.float32):
        scale = (2. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_biases(self, k, dtype=np.float32):
        """
        Initialize biases as zero.
        """
        return np.zeros((k, ), dtype=dtype)
    
    def clip_gradient(self, grad, magnitude=1.0):
        """returns a clipped gradient, where it is between [-magnitude and magnitude]"""
        magnitude = abs(magnitude)
        return tf.maximum(tf.minimum(grad, magnitude), - magnitude)

In [3]:
class RecurrentNetworkModel(DeepLearningModel):
    """
    Tutorial Model
    """
    
    def __init__(self, num_layers, k_rnncell, k_input_embedding, number_data_embeddings, time_steps):
        """
        args:
            num_layers: number of hidden layers
            k_hidden: number of units in the hidden layers
            k_input_embedding: dimensionality of the input
            k_softmax: dimensionality of the output layer
        """
        self._graph = None
        self._session = None
        self.num_layers = num_layers
        self.k_rnncell = k_rnncell
        self.k_input_embedding = k_input_embedding
        self.time_steps = time_steps
        self.number_data_embeddings = number_data_embeddings
        
        self._merged_training_summary = None
        self._merged_validation_summary = None
    
    
    def load_model(self, model_filename):
        with self.graph.as_default():
            model_saver = tf.train.Saver()
        
        self._session = tf.Session(graph=self.graph)
        model_saver.restore(self._session, model_filename)
        return
    
    
    def save_model(self, model_filename):
        with self.graph.as_default():
            model_saver = tf.train.Saver()
            
        model_saver.save(self.session, model_filename)
        
    def create_graph(self):
        self.cells = {}
        self.W = {}
        self.U = {}
        self.b = {}
        self.Z = {}
        self.A = {}
        self.A_initial = {}
        self.Z_series = {}
        self.A_series = {}
        
        self._graph = tf.Graph()
        
        with self._graph.as_default():
            with tf.name_scope("inputs"):
                self.X = tf.placeholder(tf.int32, shape=(None, self.time_steps))
            
            with tf.name_scope("embeddings"):
                self.data_embeddings = tf.Variable(np.random.rand(self.number_data_embeddings, self.k_input_embedding).astype(np.float32), dtype=tf.float32, name="data_embeddings")
                self.data_embedding_lookup = tf.nn.embedding_lookup(self.data_embeddings, self.X)
                self.A_series[0] = tf.unstack(self.data_embedding_lookup, axis = 1)
            
            with tf.name_scope("recurrent_layers"):
                for layer in range(1, num_layers + 1):
                    if layer == 1:
                        self.W[(layer, layer - 1)] = tf.Variable(self.gen_random_weights_tanh(self.k_rnncell, self.k_input_embedding), dtype=tf.float32, name="W_%i_%i" % (layer, layer - 1))
                    else:
                        self.W[(layer, layer - 1)] = tf.Variable(self.gen_random_weights_tanh(self.k_rnncell, self.k_rnncell), dtype=tf.float32, name="W_%i_%i" % (layer, layer - 1))
                    
                    self.U[layer] = tf.Variable(self.gen_random_weights_tanh(self.k_rnncell, self.k_rnncell), dtype=tf.float32, name="U_%i" % layer)
                    self.b[layer] = tf.Variable(self.gen_biases(self.k_rnncell), dtype=tf.float32, name="b_%i" % layer)
                    
                    self.Z_series[layer] = []
                    self.A_series[layer] = []
                    self.A_initial[layer] = tf.placeholder(tf.float32, shape=(None, self.k_rnncell), name="A%i_initial" % layer)
                    
                    for t, X in enumerate(self.A_series[layer - 1]):
                        Z = tf.add(tf.matmul(X, self.W[(layer, layer - 1)]), self.b[layer])
                        if t == 0:
                            Z = tf.add(Z, tf.matmul(self.A_initial[layer], self.U[layer]), name="Z%i_t%i" % (layer, t))
                        else:
                            Z = tf.add(Z, tf.matmul(self.A_series[layer][t-1], self.U[layer]), name="Z%i_t%i" % (layer, t))
                        A = tf.nn.tanh(Z, name="A%i_t%i" % (layer, t))
                        self.Z_series[layer].append(Z)
                        self.A_series[layer].append(A)
                
                    self.Z[layer] = tf.stack(self.Z_series[layer], axis=1)
                    self.A[layer] = tf.stack(self.A_series[layer], axis=1)
                
            self.init_op = tf.global_variables_initializer()
        return
            

    def write_graph(self):
        self.tensorboard_writer.add_graph(self.graph)
        return
    
    def init_model(self, adam_beta1=0.9, adam_beta2=0.999):
        self.session.run(self.init_op)
    
    @property
    def graph(self):
        if self._graph is None:
            self.create_graph()
        return self._graph
    
    @property
    def session(self):
        if self._session is None:
            self._session = tf.Session(graph=self.graph)
        return self._session

In [17]:
minibatch_size = 4
num_layers = 2
k_rnncell = 7
k_input_embedding = 12
number_data_embeddings = 100
time_steps = 10

In [5]:
# create a model instance with 2 hidden layers and 10 hidden units.

model_a = RecurrentNetworkModel(num_layers,  k_rnncell, k_input_embedding, number_data_embeddings, time_steps)

In [6]:
model_a.graph

<tensorflow.python.framework.ops.Graph at 0x114f64710>

In [7]:
model_a.data_embedding_lookup

<tf.Tensor 'embeddings/embedding_lookup:0' shape=(?, 10, 12) dtype=float32>

In [8]:
model_a.data_embedding_lookup

<tf.Tensor 'embeddings/embedding_lookup:0' shape=(?, 10, 12) dtype=float32>

In [9]:
model_a.init_model()

In [10]:
model_a.data_embedding_lookup.get_shape()

TensorShape([Dimension(None), Dimension(10), Dimension(12)])

In [11]:
model_a.session.run(model_a.data_embeddings)

array([[  2.23273933e-01,   1.18024603e-01,   1.26306370e-01, ...,
          6.58715516e-02,   3.10788095e-01,   4.05555546e-01],
       [  2.84150183e-01,   3.31155390e-01,   4.59628224e-01, ...,
          8.67651165e-01,   4.93068516e-01,   6.49731278e-01],
       [  3.47267777e-01,   9.88703310e-01,   5.08674920e-01, ...,
          1.97113469e-01,   9.73267794e-01,   1.16282761e-01],
       ..., 
       [  2.77832031e-01,   6.67628050e-01,   2.48437300e-01, ...,
          5.29849887e-01,   9.81992960e-01,   8.52635950e-02],
       [  8.81436586e-01,   8.95896554e-01,   9.86733556e-01, ...,
          3.82750732e-04,   9.67845559e-01,   5.98244667e-01],
       [  5.37714481e-01,   6.66888431e-02,   9.68286932e-01, ...,
          4.85483497e-01,   3.28924477e-01,   2.23316774e-01]], dtype=float32)

In [12]:
data = np.random.randint(0, number_data_embeddings, size=(minibatch_size, time_steps))

In [18]:
A1_initial = np.zeros((minibatch_size, k_rnncell), dtype=np.float32)
A2_initial = np.zeros((minibatch_size, k_rnncell), dtype=np.float32)

In [19]:
model_a.session.run(model_a.data_embedding_lookup,
                    feed_dict={model_a.X: data}).shape

(4, 10, 12)

In [20]:
model_a.session.run(model_a.Z[1],
                    feed_dict={model_a.X: data,
                              model_a.A_initial[1]: A1_initial,
                              model_a.A_initial[2]: A2_initial}).shape

(4, 10, 7)

In [23]:
model_a.session.run(model_a.Z[2],
                    feed_dict={model_a.X: data,
                              model_a.A_initial[1]: A1_initial,
                              model_a.A_initial[2]: A2_initial}).shape

(4, 10, 7)

In [24]:
%%time
model_a.session.run(model_a.A[2],
                    feed_dict={model_a.X: data,
                              model_a.A_initial[1]: A1_initial,
                              model_a.A_initial[2]: A2_initial}).shape

CPU times: user 10.4 ms, sys: 1.23 ms, total: 11.6 ms
Wall time: 9.9 ms


(4, 10, 7)

In [45]:
%%time
model_a.session.run(model_a.A[2],
                    feed_dict={model_a.X: data,
                              model_a.A_initial[1]: A1_initial,
                              model_a.A_initial[2]: A2_initial})

CPU times: user 5.31 ms, sys: 1.05 ms, total: 6.36 ms
Wall time: 2.23 ms


array([[[  2.38828242e-01,   1.64421603e-01,  -4.21284944e-01, ...,
           4.73785520e-01,   1.45027801e-01,  -6.33360207e-01],
        [  6.17669940e-01,  -1.13969566e-02,   1.12746105e-01, ...,
           2.06384912e-01,   3.83218862e-02,  -1.79678991e-01],
        [  3.26760530e-01,  -5.75415373e-01,   7.78281569e-01, ...,
           5.69413006e-01,  -2.63058037e-01,  -4.00189608e-01],
        ..., 
        [  4.48622257e-01,   1.89298496e-01,   3.41163486e-01, ...,
          -4.43991572e-01,   7.30797946e-01,  -5.44195175e-01],
        [ -3.49161476e-01,  -6.88485980e-01,   4.06309925e-02, ...,
           2.25180626e-01,   2.43149146e-01,  -3.81700516e-01],
        [ -7.56358564e-01,  -3.68954450e-01,  -5.10366261e-02, ...,
          -5.89900196e-01,   3.74844611e-01,   3.34033251e-01]],

       [[  7.78789520e-01,   1.17408812e-01,  -4.52179760e-02, ...,
           3.93389165e-01,   3.39728296e-01,  -7.11556852e-01],
        [  7.45175183e-01,   4.44641471e-01,  -3.55250873e-0

---
changing the initial state changes the hidden activations.

In [48]:
A1_initial = np.random.rand(minibatch_size, k_rnncell).astype(np.float32)
A2_initial = np.random.rand(minibatch_size, k_rnncell).astype(np.float32)

In [49]:
%%time
model_a.session.run(model_a.A[2],
                    feed_dict={model_a.X: data,
                              model_a.A_initial[1]: A1_initial,
                              model_a.A_initial[2]: A2_initial})

CPU times: user 5.41 ms, sys: 1.46 ms, total: 6.88 ms
Wall time: 2.23 ms


array([[[  4.90554005e-01,  -2.65483409e-01,  -2.94894785e-01, ...,
           9.20996904e-01,  -4.32067335e-01,  -4.64350432e-01],
        [  5.78043818e-01,   2.54621208e-01,   5.29456973e-01, ...,
           4.99930173e-01,  -2.26490363e-01,   2.74755180e-01],
        [  6.88004017e-01,  -6.39707565e-01,   7.87295520e-01, ...,
           8.66000414e-01,  -4.79904532e-01,  -6.52879298e-01],
        ..., 
        [  5.84563196e-01,   9.72419530e-02,   2.95505494e-01, ...,
          -4.71625715e-01,   7.53913522e-01,  -4.51628566e-01],
        [ -5.15302956e-01,  -6.49502277e-01,   1.78956628e-01, ...,
           2.47091055e-01,   2.50953227e-01,  -4.09029365e-01],
        [ -6.86422825e-01,  -4.50241059e-01,  -1.75621718e-01, ...,
          -6.15710676e-01,   4.17529911e-01,   3.41995299e-01]],

       [[  6.29157245e-01,  -1.15184955e-01,  -5.15222433e-04, ...,
           9.44855630e-01,  -7.47872233e-01,  -7.27635920e-01],
        [  8.34035993e-01,   4.46208745e-01,   2.55712092e-0

In [37]:
minibatch_size = 100

In [38]:
data = np.random.randint(0, number_data_embeddings, size=(minibatch_size, time_steps))

In [39]:
A1_initial = np.zeros((minibatch_size, k_rnncell), dtype=np.float32)
A2_initial = np.zeros((minibatch_size, k_rnncell), dtype=np.float32)

In [40]:
%%time
model_a.session.run(model_a.A[2],
                    feed_dict={model_a.X: data,
                              model_a.A_initial[1]: A1_initial,
                              model_a.A_initial[2]: A2_initial}).shape

CPU times: user 1.93 ms, sys: 933 µs, total: 2.86 ms
Wall time: 1.1 ms


(100, 10, 7)

In [41]:
minibatch_size = 1000

In [42]:
data = np.random.randint(0, number_data_embeddings, size=(minibatch_size, time_steps))

In [43]:
A1_initial = np.zeros((minibatch_size, k_rnncell), dtype=np.float32)
A2_initial = np.zeros((minibatch_size, k_rnncell), dtype=np.float32)

In [44]:
%%time
model_a.session.run(model_a.A[2],
                    feed_dict={model_a.X: data,
                              model_a.A_initial[1]: A1_initial,
                              model_a.A_initial[2]: A2_initial}).shape

CPU times: user 5.71 ms, sys: 1.61 ms, total: 7.32 ms
Wall time: 2.58 ms


(1000, 10, 7)

---
The output above shows how to get the rnn_output, it's fairly straightforward to append an output model on the end of the final rnn_states or the rnn_outputs.