Skip to content

Oshintiwari/positional-encoding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

positional-encoding

Learnable Positional Encoding with PyTorch

Overview

This project demonstrates the implementation of a learnable positional encoding method using PyTorch. Positional encoding is critical for sequential data processing in Transformer models since they lack an inherent sense of order. Instead of fixed encodings (like sine and cosine), this solution leverages learnable embeddings, which adapt to the data distribution and potentially enhance performance.

The project also includes a basic Transformer-based model applied to a dummy dataset for training and evaluation.

Problem Statement

In Transformer architectures, stacking self-attention layers with positional encoding can introduce following challenges:

  1. Loss of positional information in deeper layers.
  2. Difficulty optimizing long sequences or large datasets due to computational costs.
  3. The rigidity of fixed positional encodings in handling diverse sequence lengths or structures.

This project explores a learnable positional encoding method to mitigate these issues and adapts it within a Transformer model.

Architecture

The solution consists of:

  1. Learnable Positional Encoding Layer:

Adds position-specific embeddings to input sequences.

Can generalize better than fixed sine/cosine encodings.

  1. Transformer Encoder:

Stacked self-attention layers for sequence modeling.

  1. Fully Connected Layer:

Aggregates sequence-level information for classification.

Main Components

Learnable Positional Encoding:

A layer that adds position-specific embeddings to the input sequence. Implemented using PyTorch nn.Parameter.

Transformer Model:

Composed of multiple self-attention layers for sequence representation. Uses the learnable positional encoding layer before passing data to attention layers.

Dummy Dataset:

Randomly generated data simulates sequential inputs of length 10 with 16 features per step.

Training:

The model is trained on the dummy dataset for 5 epochs with Cross-Entropy Loss and Adam optimizer.

Results

The project successfully demonstrates:

Integration of learnable positional encodings in a Transformer model.

Training on a dummy dataset with minimal overfitting.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages