Skip to content

Latest commit

 

History

History

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a type of neural network that is designed to handle sequential data. They are used in a variety of applications, including natural language processing, speech recognition, and time series analysis. RNNs are particularly well-suited for tasks that involve processing sequences of data, such as text or audio.

Reading List

Implementations

We will be implementing common RNN architectures from scratch using PyTorch for sentiment analysis tasks.

  • RNN from scratch

    • RNN provides way to capture sequential information using self-attention

    RNN

  • RNN using Pytorch

    • RNN has 1 hidden state and 1 output state
    • RNN takes input and hidden state as input and gives output and hidden state as output
    • Sequences are fed one by one to RNN and hidden state is passed to next sequence
    • Gradient clipping is used to avoid exploding gradients (nn.utils.clip_grad_norm_)

    RNN

  • Deep RNN

    • Added Multiple layers in RNN - Stacking RNN layers top of each other
    • Bidirectional RNN - Helps in capturing context from both directions

    Deep RNN Bi-RNN

  • Optimized Training RNN

    • Gradient Clipping - Helps in avoiding exploding gradients
    • Weight Initialization - Helps model to converge faster
    • Dropout Regularization - Helps in avoiding overfitting
    • Pretrained Embeddings - Helps in learning better embeddings
    • Packed Sequence for variable length sequences - Helps in handling less padded sequences efficiently
  • Optimized Training LSTM

    • LSTM has 3 gates - Forget, Input, Output
    • LSTM has 2 states - Cell state, Hidden state
    • LSTM helps in capturing long term dependencies
    • LSTM helps in avoiding vanishing gradients
    • LSTM converges faster than RNN

    LSTM

  • GRU

    • GRU has 2 gates - Reset, Update
    • GRU has 1 state - Hidden state
    • GRU helps in capturing long term dependencies
    • GRU helps in avoiding vanishing gradients
    • GRU converges faster than RNN
    • GRU and LSTM are similar in performance but GRU has less parameters

    GRU