## Word2Vec SkipGram Model Example
This notebook implements a skipgram model.

In [1]:
import tensorflow as tf
import numpy as np

### Model Class

We can use classes to encapsulate tensorflow models. The below skeleton shows one way of using classes with tensorflow.

The Model class has methods, variables, and properties that capture both the graph and the tensorflow session

#### Tensorflow Graph

A tensorflow graph is a computational graph of different tensorflow operations. It defines the computation and how different operations and tensors relate, but it doesn't actually do the computation or store the values of the variables. All of that magic happens within the tensorflow session.

#### Tensorflow Session

A tensorflow session is the context where values for tensorflow variables are instantiated and computations are run. So if you are saving a model's weights, you are actually saving the weights of the tensorflow session. If you are loading a model's weights, you need to load them into a session. When variables are initialized, that has to happen within a session. In a way, the graph is stateless. State is stored in sessions. The session also takes care of running computations, so if you are running training, those need to be run in the session.

A session is instantiated with a graph, typically the current default graph. A session is only able to run computations on the graph that is tied to the session.

In [2]:
class DeepLearningModel():
    def __init__():
        return
    
    def gen_uniform_random_weights(self, k_out, k_in, scale, dtype=np.float32):
        """
        Returns weights of shape (k_in, k_out) initialized between [-scale, scale]
        """
        return ((np.random.rand(k_in, k_out) * 2 - 1) * scale).astype(dtype)

    def gen_random_weights_tanh(self, k_out, k_in, dtype=np.float32):
        scale = (6. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_random_weights_sigmoid(self, k_out, k_in, dtype=np.float32):
        scale = 4. * (6. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_random_weights_reLu(self, k_out, k_in, dtype=np.float32):
        scale = (2. / (k_in + k_out)) ** .5
        return self.gen_uniform_random_weights(k_out, k_in, scale, dtype=dtype)

    def gen_biases(self, k, dtype=np.float32):
        """
        Initialize biases as zero.
        """
        return np.zeros((k, ), dtype=dtype)
    
    def clip_gradient(self, grad, magnitude=1.0):
        """returns a clipped gradient, where it is between [-magnitude and magnitude]"""
        magnitude = abs(magnitude)
        return tf.maximum(tf.minimum(grad, magnitude), - magnitude)

---
There are several options for cost

- Negative Sampling
- NCE Loss
- Sampled Soft Max

In [3]:
class SkipGramModel(DeepLearningModel):
    """
    Tutorial Model
    """
    
    def __init__(self, k_embedding, n_embeddings, n_negative_sample):
        """
        args:
            num_layers: number of hidden layers
            k_hidden: number of units in the hidden layers
            k_embedding: dimensionality of the input
            k_softmax: dimensionality of the output layer
        """
        self._graph = None
        self._session = None
        self.k_embedding = k_embedding
        self.n_embeddings = n_embeddings
        self.n_negative_sample = n_negative_sample
        
        self._merged_training_summary = None
        self._merged_validation_summary = None
    
    
    def load_model(self, model_filename):
        with self.graph.as_default():
            model_saver = tf.train.Saver()
        
        self._session = tf.Session(graph=self.graph)
        model_saver.restore(self._session, model_filename)
        return
    
    
    def save_model(self, model_filename):
        with self.graph.as_default():
            model_saver = tf.train.Saver()
            
        model_saver.save(self.session, model_filename)
        
    def create_graph(self):
        self.cells = {}
                        
        self._graph = tf.Graph()
        with self._graph.as_default():
            with tf.name_scope("inputs"):
                self.X_center_word = tf.placeholder(tf.int32, shape=(None), name="X_center_word")
                self.X_outer_word = tf.placeholder(tf.int32, shape=(None), name="X_outer_word")
                self.X_negative_sample = tf.placeholder(tf.int32, shape=(None, self.n_negative_sample), name="X_outer_word")
                self.learning_rate = tf.placeholder(tf.float32, shape=())
            
            with tf.name_scope("embeddings"):
                self.center_word_embeddings = tf.Variable(self.gen_uniform_random_weights(self.k_embedding, self.n_embeddings, .01), dtype=tf.float32, name="center_word_embeddings")
                self.outer_word_embeddings = tf.Variable(self.gen_uniform_random_weights(self.k_embedding, self.n_embeddings, .01), dtype=tf.float32, name="outer_word_embeddings")
                self.outer_word_biases = tf.Variable(np.zeros(self.n_embeddings, dtype=np.float32), name="outer_word_biases")
                
                self.center_word_embedding_lookup = tf.nn.embedding_lookup(self.center_word_embeddings, self.X_center_word)
                self.center_word_embedding_lookup_expanded = tf.expand_dims(tf.nn.embedding_lookup(self.center_word_embeddings, self.X_center_word), axis = 1)
                self.outer_word_embedding_lookup = tf.nn.embedding_lookup(self.outer_word_embeddings, self.X_outer_word)
                self.outer_word_bias_lookup = tf.nn.embedding_lookup(self.outer_word_biases, self.X_outer_word)
                self.negative_sample_embedding_lookup = tf.nn.embedding_lookup(self.outer_word_embeddings, self.X_negative_sample)
                self.negative_sample_bias_lookup = tf.nn.embedding_lookup(self.outer_word_biases, self.X_negative_sample)
                
            
            with tf.name_scope("loss"):
                self.objective_target = tf.sigmoid(tf.reduce_sum(tf.multiply(self.center_word_embedding_lookup, self.outer_word_embedding_lookup), axis = 1) + self.outer_word_bias_lookup)
                self.test = tf.reduce_sum(tf.multiply(- self.center_word_embedding_lookup_expanded, self.negative_sample_embedding_lookup), axis=2)
                self.objective_negative_sample = tf.reduce_sum(tf.sigmoid(tf.reduce_sum(tf.multiply(- self.center_word_embedding_lookup_expanded, self.negative_sample_embedding_lookup), axis=2) + self.negative_sample_bias_lookup), axis=1)
                self.loss = tf.reduce_mean(- (self.objective_target + self.objective_negative_sample))
            
            
            with tf.name_scope("optimization"):
                self.optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)
                self.grads_and_vars = self.optimizer.compute_gradients(self.loss)                    
                self.clipped_grads_and_vars = [(self.clip_gradient(gv[0]), gv[1]) for gv in self.grads_and_vars]
                self.update_op = self.optimizer.apply_gradients(self.clipped_grads_and_vars)

            self.init_op = tf.global_variables_initializer()
        return
            

    def write_graph(self):
        self.tensorboard_writer.add_graph(self.graph)
        return
    
    def init_model(self, adam_beta1=0.9, adam_beta2=0.999):
        self.session.run(self.init_op)
        
    def train_model(self, center_word, outer_word, negative_samples, learning_rate):
        self.session.run(self.update_op, feed_dict={self.X_center_word: center_word, 
                                                    self.X_outer_word: outer_word,
                                                    self.X_negative_sample: negative_samples,
                                                    self.learning_rate: learning_rate})
    
    @property
    def graph(self):
        if self._graph is None:
            self.create_graph()
        return self._graph
    
    @property
    def session(self):
        if self._session is None:
            self._session = tf.Session(graph=self.graph)
        return self._session

In [4]:
k_embedding=2
n_embeddings=100
n_negative_sample= 8

learning_rate = .1
minibatch_size = 3

In [5]:
# create a model instance with 2 hidden layers and 10 hidden units.

model = SkipGramModel(k_embedding, n_embeddings, n_negative_sample)

In [6]:
model.graph

<tensorflow.python.framework.ops.Graph at 0x1160985d0>

In [7]:
model.init_model()

In [8]:
center_word = np.random.randint(0, n_embeddings, size=(minibatch_size, ))
target_word = np.random.randint(0, n_embeddings, size=(minibatch_size, ))
negative_samples = np.random.randint(0, n_embeddings, size=(minibatch_size, n_negative_sample))

In [9]:
model.session.run(model.center_word_embedding_lookup, feed_dict={model.X_center_word: center_word})

array([[ 0.0065446 , -0.00950536],
       [ 0.00822226, -0.00507754],
       [-0.00338755, -0.00528418]], dtype=float32)

In [10]:
model.session.run(model.outer_word_embedding_lookup, feed_dict={model.X_outer_word: target_word})

array([[-0.00790009,  0.00605009],
       [-0.00185049, -0.00462213],
       [ 0.00253445, -0.00010422]], dtype=float32)

In [11]:
model.session.run(model.negative_sample_embedding_lookup, feed_dict={model.X_negative_sample: negative_samples}).shape

(3, 8, 2)

In [12]:
model.session.run(model.negative_sample_bias_lookup, feed_dict={model.X_negative_sample: negative_samples})

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)

In [13]:
model.session.run(model.objective_target, feed_dict={model.X_center_word: center_word, 
                                                     model.X_outer_word: target_word,
                                                     model.X_negative_sample: negative_samples})

array([ 0.4999727 ,  0.50000209,  0.49999797], dtype=float32)

In [14]:
model.session.run(model.objective_negative_sample, feed_dict={model.X_center_word: center_word, 
                                                              model.X_outer_word: target_word,
                                                              model.X_negative_sample: negative_samples})

array([ 3.99995136,  4.00007486,  3.99999094], dtype=float32)

In [15]:
model.session.run(model.loss, feed_dict={model.X_center_word: center_word, 
                                         model.X_outer_word: target_word,
                                         model.X_negative_sample: negative_samples})

-4.4999967

In [16]:
model.session.run(model.test, feed_dict={model.X_center_word: center_word, 
                                         model.X_outer_word: target_word,
                                         model.X_negative_sample: negative_samples})

array([[ -1.06750333e-04,  -3.34709657e-05,   9.25614877e-05,
         -7.78011672e-05,  -6.72503447e-05,  -1.48597010e-05,
         -2.35932239e-05,   3.65822634e-05],
       [  1.01353304e-04,   7.95730884e-05,   2.18993137e-05,
          6.24393579e-05,   4.58735376e-05,  -1.35509563e-05,
          3.40296720e-05,  -3.29307622e-05],
       [  5.20781396e-06,  -5.51463745e-06,  -7.88088000e-05,
          3.43019337e-06,   2.34720519e-05,   2.09616555e-05,
          5.20781396e-06,  -9.92318382e-06]], dtype=float32)

In [17]:
model.session.run(model.update_op, feed_dict={model.X_center_word: center_word, 
                                              model.X_outer_word: target_word,
                                              model.X_negative_sample: negative_samples,
                                              model.learning_rate: learning_rate})

In [18]:
model.session.run(model.test, feed_dict={model.X_center_word: center_word, 
                                         model.X_outer_word: target_word,
                                         model.X_negative_sample: negative_samples})

array([[ -1.03215454e-04,  -3.08869203e-05,   9.19990125e-05,
         -7.57898524e-05,  -6.60512596e-05,  -1.33271278e-05,
         -1.92401858e-05,   3.82480794e-05],
       [  1.04952516e-04,   8.25340103e-05,   2.34966028e-05,
          6.47334527e-05,   4.78789043e-05,  -1.30141998e-05,
          3.62582941e-05,  -3.27494417e-05],
       [  9.29315320e-06,  -2.55616214e-06,  -7.71898267e-05,
          4.36364917e-06,   2.47927328e-05,   2.31342838e-05,
          9.29315320e-06,  -7.69893086e-06]], dtype=float32)

In [19]:
model.session.run(model.loss, feed_dict={model.X_center_word: center_word, 
                                         model.X_outer_word: target_word,
                                         model.X_negative_sample: negative_samples})

-4.5256953

In [20]:
model.session.run(model.update_op, feed_dict={model.X_center_word: center_word, 
                                              model.X_outer_word: target_word,
                                              model.X_negative_sample: negative_samples,
                                              model.learning_rate: learning_rate})

In [21]:
model.session.run(model.test, feed_dict={model.X_center_word: center_word, 
                                         model.X_outer_word: target_word,
                                         model.X_negative_sample: negative_samples})

array([[ -9.98076939e-05,  -2.83791323e-05,   9.14367993e-05,
         -7.38938688e-05,  -6.49650829e-05,  -1.18637618e-05,
         -1.49468906e-05,   3.98861921e-05],
       [  1.08689419e-04,   8.56162951e-05,   2.51806559e-05,
          6.71317102e-05,   5.00161696e-05,  -1.24178350e-05,
          3.86047905e-05,  -3.25174296e-05],
       [  1.33104113e-05,   3.86628017e-07,  -7.56306472e-05,
          5.29165754e-06,   2.61199821e-05,   2.53099806e-05,
          1.33104113e-05,  -5.42772614e-06]], dtype=float32)

In [22]:
model.session.run(model.loss, feed_dict={model.X_center_word: center_word, 
                                         model.X_outer_word: target_word,
                                         model.X_negative_sample: negative_samples})

-4.5513883

Woot! It's learning!

---