Recurrent Neural Networks (RNNs) are a type of neural network that is designed to handle sequential data. They are used in a variety of applications, including natural language processing, speech recognition, and time series analysis. RNNs are particularly well-suited for tasks that involve processing sequences of data, such as text or audio.
- d2l.ai
- Understanding LSTM Networks
- The Unreasonable Effectiveness of Recurrent Neural Networks
- Understanding self-attention in RNN
- Visualizing memorization in RNNs
- RNN Cheatsheet
- Implementation of common models
We will be implementing common RNN architectures from scratch using PyTorch for sentiment analysis tasks.
-
- RNN provides way to capture sequential information using self-attention
-
- RNN has 1 hidden state and 1 output state
- RNN takes input and hidden state as input and gives output and hidden state as output
- Sequences are fed one by one to RNN and hidden state is passed to next sequence
- Gradient clipping is used to avoid exploding gradients (nn.utils.clip_grad_norm_)
-
- Added Multiple layers in RNN - Stacking RNN layers top of each other
- Bidirectional RNN - Helps in capturing context from both directions
-
- Gradient Clipping - Helps in avoiding exploding gradients
- Weight Initialization - Helps model to converge faster
- Dropout Regularization - Helps in avoiding overfitting
- Pretrained Embeddings - Helps in learning better embeddings
- Packed Sequence for variable length sequences - Helps in handling less padded sequences efficiently
-
- LSTM has 3 gates - Forget, Input, Output
- LSTM has 2 states - Cell state, Hidden state
- LSTM helps in capturing long term dependencies
- LSTM helps in avoiding vanishing gradients
- LSTM converges faster than RNN
-
- GRU has 2 gates - Reset, Update
- GRU has 1 state - Hidden state
- GRU helps in capturing long term dependencies
- GRU helps in avoiding vanishing gradients
- GRU converges faster than RNN
- GRU and LSTM are similar in performance but GRU has less parameters