Skip to content

Complete Transformer implementation in PyTorch following DataCamp tutorial - Multi-Head Attention, Positional Encoding, Encoder/Decoder layers

Notifications You must be signed in to change notification settings

dueprincipati/transformer_pytorch

Repository files navigation

Transformer Implementation in PyTorch

This project implements a complete Transformer model from scratch using PyTorch, following the DataCamp tutorial "Building a Transformer with PyTorch".

Project Structure

transformer_pytorch/
├── transformer/
│   ├── __init__.py
│   ├── model.py          # Complete Transformer implementation
│   ├── attention.py      # Multi-Head Attention mechanism  
│   ├── feedforward.py    # Position-wise Feed-Forward Network
│   ├── positional.py     # Positional Encoding
│   ├── encoder.py        # Encoder Layer
│   └── decoder.py        # Decoder Layer
├── train.py              # Training script
├── demo.py               # Example usage
├── requirements.txt      # Dependencies
└── README.md            # This file

Features

  • Complete Transformer architecture implementation
  • Multi-Head Attention mechanism
  • Position-wise Feed-Forward Networks
  • Positional Encoding with sinusoidal functions
  • Encoder and Decoder blocks with residual connections
  • Layer normalization and dropout for regularization
  • Training loop with sample data
  • Model evaluation capabilities

Installation

pip install -r requirements.txt

Usage

Basic Training

python train.py

Demo

python demo.py

Model Architecture

The transformer consists of:

  1. Multi-Head Attention: Captures dependencies across different positions
  2. Feed-Forward Networks: Position-wise fully connected layers
  3. Positional Encoding: Provides sequence order context
  4. Layer Normalization: Stabilizes training
  5. Residual Connections: Helps train deeper networks
  6. Dropout: Prevents overfitting

Hyperparameters

Parameter Default Description
d_model 512 Model embedding dimension
num_heads 8 Number of attention heads
num_layers 6 Number of encoder/decoder layers
d_ff 2048 Feed-forward network dimension
dropout 0.1 Dropout rate
max_seq_length 100 Maximum sequence length

Based on

  • DataCamp Tutorial: "Building a Transformer with PyTorch"
  • Original Paper: "Attention is All You Need" (Vaswani et al., 2017)

About

Complete Transformer implementation in PyTorch following DataCamp tutorial - Multi-Head Attention, Positional Encoding, Encoder/Decoder layers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages