deeplearning-practice-pytorch/src/rnn at main · amitkverma/deeplearning-practice-pytorch

History

Name		Name	Last commit message	Last commit date
parent directory ..
__pycache__		__pycache__
01_rnn_scratch.py		01_rnn_scratch.py
02_rnn_simple.py		02_rnn_simple.py
03_rnn_complex.py		03_rnn_complex.py
04_rnn_tunned.py		04_rnn_tunned.py
05_lstm.py		05_lstm.py
06_gru.py		06_gru.py
ReadMe.md		ReadMe.md
helper.py		helper.py
recurrent-neural-networks.pdf		recurrent-neural-networks.pdf

ReadMe.md

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a type of neural network that is designed to handle sequential data. They are used in a variety of applications, including natural language processing, speech recognition, and time series analysis. RNNs are particularly well-suited for tasks that involve processing sequences of data, such as text or audio.

Reading List

Implementations

We will be implementing common RNN architectures from scratch using PyTorch for sentiment analysis tasks.

RNN from scratch
- RNN provides way to capture sequential information using self-attention
RNN using Pytorch
- RNN has 1 hidden state and 1 output state
- RNN takes input and hidden state as input and gives output and hidden state as output
- Sequences are fed one by one to RNN and hidden state is passed to next sequence
- Gradient clipping is used to avoid exploding gradients (nn.utils.clip_grad_norm_)
Deep RNN
- Added Multiple layers in RNN - Stacking RNN layers top of each other
- Bidirectional RNN - Helps in capturing context from both directions
Optimized Training RNN
- Gradient Clipping - Helps in avoiding exploding gradients
- Weight Initialization - Helps model to converge faster
- Dropout Regularization - Helps in avoiding overfitting
- Pretrained Embeddings - Helps in learning better embeddings
- Packed Sequence for variable length sequences - Helps in handling less padded sequences efficiently
Optimized Training LSTM
- LSTM has 3 gates - Forget, Input, Output
- LSTM has 2 states - Cell state, Hidden state
- LSTM helps in capturing long term dependencies
- LSTM helps in avoiding vanishing gradients
- LSTM converges faster than RNN
GRU
- GRU has 2 gates - Reset, Update
- GRU has 1 state - Hidden state
- GRU helps in capturing long term dependencies
- GRU helps in avoiding vanishing gradients
- GRU converges faster than RNN
- GRU and LSTM are similar in performance but GRU has less parameters