forked from nlintz/TensorFlow-Tutorials
-
Notifications
You must be signed in to change notification settings - Fork 0
/
tensorflow_LSTM_RNN.txt
146 lines (105 loc) · 5.33 KB
/
tensorflow_LSTM_RNN.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
//===
from tensorflow.models.rnn import rnn, rnn_cell
or
tf.nn.rnn , tf.nn.rnn_cell
lstm cell size === number of memory cells
time step size === rolled out count of iterations === number of loops
x_0, x_1, ... , x_t, x_{t+1}, ... x_end
//=== cell.state_size
http://stackoverflow.com/questions/36732877/about-lstm-cell-state-size
To compute an LSTM step you need to apply four functions each with dimension hidden_size:
- input gate
- output gate
- forget gate
- hidden state
so you need hidden_size * 4 = 200 * 4 = 800
the 30 comes from your batch size, as you are processing 30 samples per batch.
and that is your cell memory consumption (30 x 800)
--> FICO
F: forget gate, W_f
I: input gate, W_i
C: cell gate, W_c
O: output gate, W_o
[Q] difference between sigmoid and tanh?
[Q] W_f, W_i, W_c, W_o are shared(the same value) among all cells ?
//===
tensorflow tutorial + LSTM
* https://www.tensorflow.org/versions/r0.10/tutorials/recurrent/index.html
* [2015.08.27] http://colah.github.io/posts/2015-08-Understanding-LSTMs/
* [2014] http://arxiv.org/pdf/1409.2329v5.pdf
(how to correctly apply dropout to LSTMs)
* http://deeplearning.net/tutorial/lstm.html
//=== class tf.nn.rnn_cell.LSTMCell
https://www.tensorflow.org/versions/r0.10/api_docs/python/rnn_cell.html#BasicLSTMCell
The peephole implementation is based on:
* https://research.google.com/pubs/archive/43905.pdf
Hasim Sak, Andrew Senior, and Francoise Beaufays. "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." INTERSPEECH, 2014.
The class uses optional peep-hole connections, optional cell clipping, and an optional projection layer.
//===
Recurrent neural networks address this issue.
They are networks with loops in them, allowing information to persist.
This chain-like nature reveals that recurrent neural networks are intimately related to
sequences and lists.
They’re the natural architecture of neural network to use for such data.
Andrej Karpathy’s excellent blog post,
he Unreasonable Effectiveness of Recurrent Neural Networks.
But they really are pretty amazing.
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
“LSTMs,” a very special kind of recurrent neural network which works,
for many tasks, much much better than the standard version
“long-term dependencies.”
LSTMs don’t have this problem!?
1997,
"Long Short Term Memory networks"
– usually just called “LSTMs” – are a special kind of RNN,
capable of learning long-term dependencies.
They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized
by many people in following work...
LSTMs are explicitly designed to avoid the long-term dependency problem.
*** Remembering information for long periods of time is practically their default behavior,
LSTMs also have this chain like structure, but the repeating module has a different structure.
Instead of having a single neural network layer, there are four,
interacting in a very special way.
A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU,
introduced by Cho, et al. (2014).
It combines the forget and input gates into a single “update gate.”
It also merges the cell state and hidden state, and makes some other changes.
//=== data for tensorflow rnn tutorial
http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
https://www.tensorflow.org/versions/r0.10/tutorials/recurrent/index.html
the core of the model consists of an LSTM cell that processes one word at a time and computes probabilities of the possible continuations of the sentence. The memory state of the network is initialized with a vector of zeros and gets updated after reading each word. Also, for computational reasons, we will process data in mini-batches of size batch_size.
The basic pseudocode looks as follows:
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities = tf.nn.softmax(logits)
loss += loss_function(probabilities, target_words)
*** truncated backpropagation
A simplified version of the code for the graph creation for truncated backpropagation:
# Placeholder for the inputs in a given iteration.
words = tf.placeholder(tf.int32, [batch_size, num_steps])
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
# The value of state is updated after processing each batch of words.
output, state = lstm(words[:, i], state)
# The rest of the code.
# ...
final_state = state
And this is how to implement an iteration over the whole dataset:
# A numpy array holding the state of LSTM after each batch of words.
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
numpy_state, current_loss = session.run([final_state, loss],
# Initialize the LSTM state from the previous iteration.
feed_dict={initial_state: numpy_state, words: current_batch_of_words})
total_loss += current_loss
Inputs