The code-base contains Numpy
implementations of two sequence model architectures, a vanilla Recurrent Neural Network
and a vanilla Long Short Term Memory
. This repository is for those who want to know what happens under the hood of these architectures.
In the repository care is given to the feed-froward
and back-propagation
of the architectures. The derivations are unrolled as much as I could to make it understandable.
The goal is to tackle the problem of character generation
using RNNs and LSTMs. While tackling the problem, we also look into the gradient flow of the architectures. Later on an experiment to show the context understanding
is done too.
The input will be a sequence of characters and the output would be the immediate next character in the sequence. The image below demonstrates the approach. The characters in a particular sequence are H, E, L, L
and the next character is O
. A little thing to notice here is that the character O
could have been a ,
or simply a \n
. The character that is generated largely depends on the context of the sequence. A well-trained model would generate characters that fit the context very well.
Character level language model
We look into the recurrence formula for both the architectures.
Recurrence formula of RNN
Recurrence formula of LSTM
We look into the backpropagation formula for both the architectures.
Backpropagation in RNN
Backpropagation in LSTM