# Structure your model
- Overall structure of a model in TensorFlow
- word2vec
- Name scope
- Embedding visualization

## Phase 1: Assemble graph
1. Define placeholders for input and output
2. Define the weights
3. Define the inference model - forward propogation work through
4. Define loss function
5. Define optimizer

## Phase 2: Training the model
![TrainingLoop](TrainingLoop.JPG)

## Embedding Lookup

In [None]:
# tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None, validate_indices=True, max_norm=None)

## NCE Loss

In [None]:
# tf.nn.nce_loss(weights, biases, labels, inputs, num_sampled, num_classes)

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from process_data import process_data
VOCAB_SIZE = 50000
BATCH_SIZE = 128
EMBED_SIZE = 128 # dimension of the word embedding vectors
SKIP_WINDOW = 1 # the context window
NUM_SAMPLED = 64 # Number of negative examples to sample.
LEARNING_RATE = 1.0
NUM_TRAIN_STEPS = 20000
SKIP_STEP = 2000 # how many steps to skip before reporting the loss

In [2]:
with tf.name_scope('data'):
    center_word = tf.placeholder(tf.int32, [BATCH_SIZE], name='center_words')
    y = tf.placeholder(tf.int32, [BATCH_SIZE, SKIP_WINDOW], name='target_words')

embed_matrix = tf.get_variable(
                 "WordEmbedding", [VOCAB_SIZE, EMBED_SIZE],
                  tf.float32,
                  initializer=tf.random_uniform_initializer(-1.0, 1.0))

embed = tf.nn.embedding_lookup(embed_matrix, center_word, name='embed')

nce_weight = tf.get_variable('nce_weight', [VOCAB_SIZE, EMBED_SIZE],
                             initializer=tf.truncated_normal_initializer(
                                           stddev=1.0 / (EMBED_SIZE**0.5)))

nce_bias = tf.get_variable('nce_bias', [VOCAB_SIZE], 
                          initializer=tf.zeros_initializer())

nce_loss = tf.nn.nce_loss(nce_weight, nce_bias, y, embed,
                         NUM_SAMPLED,
                         VOCAB_SIZE)
loss = tf.reduce_mean(nce_loss, 0)

optimizer = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(loss)

batch_gen = process_data(VOCAB_SIZE, BATCH_SIZE, SKIP_WINDOW)
# word2vec(batch_gen)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    total_loss = 0.0 
    # we use this to calculate the average loss in the last SKIP_STEP steps0
    writer = tf.summary.FileWriter('tmp/', sess.graph)
    for index in range(NUM_TRAIN_STEPS):
        centers, targets = next(batch_gen)
        train_dict = {center_word: centers, y: targets}
        _, loss_batch = sess.run([optimizer, loss], feed_dict=train_dict)
        total_loss += loss_batch
        if (index + 1) % SKIP_STEP == 0:
            print('Average loss at step {}: {:5.1f}'.format(
               index, total_loss / SKIP_STEP))
        total_loss = 0.0
    writer.close()
    
    

Dataset ready
Average loss at step 1999:   0.0
Average loss at step 3999:   0.0
Average loss at step 5999:   0.0
Average loss at step 7999:   0.0
Average loss at step 9999:   0.0
Average loss at step 11999:   0.0
Average loss at step 13999:   0.0
Average loss at step 15999:   0.0
Average loss at step 17999:   0.0
Average loss at step 19999:   0.0


## TensorBoard Visualization
TensorBoard has two kinds of edges:
- solid lines. represent __data flow__ edges
- dotted lines. represent __control dependence__ edges.

## Why should we still learn gradients
> Well, maybe. But for now, Tensorflow can take gradients for us, but it can't give us __intuition__ about what function to use. 
It doesn't tell us if a function will suffer from exploding or vanishing gradients. 
We still need to know about gradients to get an understanding of why a model works while another doesn't.

我个人觉得，无论做得简单抑或复杂的模型或数据，一定要从零基础培养自己对模型和数据的 __直觉-intuition__ 。这个词儿也在 Andrew Ng 《Deep Learning Specialization》 课程中被反反复复提及、强调。

有许多外行人，或者许多想入行但受限于许多标题党文章荼毒的初学者，会天真的以为做神经网络或深度学习，不过是调调参数、增加下网络层次 __而已__ 的事情。 许多人甚至对神经网络或深度学习嗤之以鼻，或者他们已经从各类的文章里面看到了 神经网络的 “三起三落”，会相当有 _高见地_ 认为神经网络或深度学习一定也会在三五年之内成为过江之鲫，大家不再提及。 

不仅仅是神经网络或深度学习， 哪怕人类知识框架中的任何一个领域或分支，在历史的长河中都会如“长江后浪推前浪”。再轰动再炒作，神经网络或深度学习也仅仅是一个普通的学科领域或分支， 也会有冷却的时候。

这确实是看待问题的一个境界，但是相对而言较为初级；

更进一步，我们应该看到神经网络或深度学习为什么“又”火了起来，在这一波浪潮中，都解决了哪些经典问题，带来什么样的实际应用提升。如果有所能力，成为其中的弄潮儿。

而再进一步，也许如Hilton，自己亲手建立起了深度学习的这套框架，自己又再次开始看到其中的问题，开始进一步探索。

而往往民科或无知的人，往往跳过第二重境界，无知的以为自己处在第三重境界。

我觉得一个很好的评判标准：__自我审视下，在进阶的道路上究竟下了多大的努力，究竟深入理解了什么。__