# Manage Experiments
- More word2vec
- tf.train.Saver
- tf.summary
- Randomization
- Data Readers

In [1]:
import tensorflow as tf

In [2]:
x = tf.Variable(2.0)
y = 2.0 * (x**3)
z = 3.0 + y**2

grad_z = tf.gradients(z, [x,y])

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter('tmp/CS20SI5/', sess.graph)
    print(sess.run(grad_z))

[0.0, 0.0]


## Need models to be reusable

- Q: How do we make our model easy to reuse?
- A: take advantage of Python's OOP - Object Oriented Programming. 

In [None]:
class SkipGramModel:
    def __init__(self, params):
        pass
    def _create_placeholder(self):
        '''Step 1: define the placeholder for input and output'''
        pass
    def _create_embedding(self):
        '''Step 2: define weights. In word2vec, it is actually the weights that we care about. '''
        pass
    
    def _create_loss(self):
        '''Step 3 + 4: define the inference + the loss function'''
        pass
    
    def _create_optimizer(self):
        '''Step 5: define optimizer'''
        pass
    
    def _create_summaries(self):
        with tf.name_scope('summaries'):
            tf.summary.scalar('loss', sef.loss)
            tf.summary.scalar('accuracy', self.accuracy)
            tf.summary.histogram('histogram loss', self.loss)
            # because you have several summaries, we should merge them all into one op to make it easier to manage. 
            self.summary_op = tf.summary.merge_all()

## Save sessions, not graphs
`tf.train.Saver.save(sess, save_path, global_step=None...)`
- save parameters after N steps, each saved step is a checkpoint
- Restore variables

为了避免在模型训练过程中出现一些意外崩溃的情况，或者主动停掉训练过程，避免之前的训练工作（深度学习等训练时间一般较长）徒劳，最好能进行一段时间的训练后就对模型参数进行一定的保存。

如下，每训练1000步进行参数保存：

首先定义 `global_step`

```
self.global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')

# update global_step during training
self.optimizer = tf.train.GradientDescentOptimizer(self.learning_rate).minimize(self.loss, global_step = self.global_step)
```

在训练过程中记录 `global_step` 对应下的模型参数，并保存到指定的断点文件。
```
# define model

# create a saver object
saver = tf.train.Saver()

# launch a session to compute the graph
with tf.Session() as sess:
    # actual training loop
    for step in range(training_steps):
        sess.run([optimizer])
        
        if (step + 1) % 1000 == 0:
            saver.save(sess, 'checkpoint_directory/model_name', global_step = model.global_step)
```



通过运行程序，就能够在指定文件夹下看到保存的断点文件，如下所示：

![CheckpointsFiles](CheckpointsFiles.JPG)

那既然保存了断点下的模型训练参数，就肯定有朝一日会重新拿回来再用，或继续训练等，用如下方式进行取回

```
saver.restore(sess, 'checkpoints/skip-gram-10000')
```

或者稍作优化下，通过程序检查最新的断点文件，如果存在就取回，然后继续进行模型训练过程； 如果不存在就从最开始进行训练，并且在训练过程中不断保存断点文件， 这样可以避免 _低水平重复劳动_ 。

默认情况下，`saver.save()` 会存储计算图中所有的变量。

但是仍然可以通过自定义 list or dict 的方式，指定存储的变量。

```
# variables that are used in the computational graph
v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')

# - method one: pass variables as a dict:
saver = tf.train.Saver({'v1':v1, 'v2':v2})

# - method two: pass variables as a list:
saver = tf.train.Saver([v1, v2])
```

__Attentions:__

每次存储或取回的都是变量及其断点下的取值，并不包含整个计算图computational graph，所以在需要时还得重新搭建Model。

## Visualize our summary statistics during our training

相对于原来使用 `matplotlib` 绘制变量在训练过程中变化情况， __TensorBoard__ 提供了更加完善的可视化功能，应当善加利用。
- Step 1: create summaries
- Step 2: run them
- Step 3: write summaris to file
- Step 4: see summaries on TensorBoard

```
def _create_summaries(self):
    with tf.name_scope('summaries'):
        tf.summary.scalar('loss', self.loss)
        tf.summary.scalar('accuracy', self.accuracy)
        tf.summary.histogram('histogram loss', self.loss)
        tf.summary.image(name, tensor, max_outputs=3, collections=None)
        self.summary_op = tf.summary.merge_all()
```
像是别的变量一样需要对该`summary_op` 进行运行：

```
loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op], feed_dict = feed_dict)
```
并且写到文件里面去：
```
writer.add_summary(summary, global_step = step)
```

因为深度学习模型中确实有许多参数需要调节， TensorBoard也提供了可视化的效果用于参数比较。如 `learning_rate` 为0.5或1.0，分别保存到文件夹`improved_graph/lr1.0`和 `impropved_graph/lr0.5`，然后两者就能够进行直接比较，如下图所示：

![Learning Rate selection](LR1.JPG) ![Learning Rate Coomparison](LR2.JPG)


## Control Randomization
何时会需要Randomization
- 初始化权重矩阵；
- 对训练样本进行随机排序形成batch；

但是就会出现一个问题，那如果需要复现某些模型较好的结果，这些随机化的东西如何进行控制呢？
总不能模型“随机”出现了一个好结果，但是以后就无法复现了，这好像并不能说服人。
- Operations level random seed
- Sessions keep track of random state

In [16]:
import tensorflow as tf
# a = tf.Variable(tf.truncated_normal( (-1.0, 1.0), stddev=0.1, seed=0) )
a = tf.truncated_normal([1], seed=3)
with tf.Session() as sess:
    print(sess.run(a))

[ 1.06166208]


## Data Readers