This project implements a complete Transformer model from scratch using PyTorch, following the DataCamp tutorial "Building a Transformer with PyTorch".
transformer_pytorch/
├── transformer/
│ ├── __init__.py
│ ├── model.py # Complete Transformer implementation
│ ├── attention.py # Multi-Head Attention mechanism
│ ├── feedforward.py # Position-wise Feed-Forward Network
│ ├── positional.py # Positional Encoding
│ ├── encoder.py # Encoder Layer
│ └── decoder.py # Decoder Layer
├── train.py # Training script
├── demo.py # Example usage
├── requirements.txt # Dependencies
└── README.md # This file
- Complete Transformer architecture implementation
- Multi-Head Attention mechanism
- Position-wise Feed-Forward Networks
- Positional Encoding with sinusoidal functions
- Encoder and Decoder blocks with residual connections
- Layer normalization and dropout for regularization
- Training loop with sample data
- Model evaluation capabilities
pip install -r requirements.txt
python train.py
python demo.py
The transformer consists of:
- Multi-Head Attention: Captures dependencies across different positions
- Feed-Forward Networks: Position-wise fully connected layers
- Positional Encoding: Provides sequence order context
- Layer Normalization: Stabilizes training
- Residual Connections: Helps train deeper networks
- Dropout: Prevents overfitting
Parameter | Default | Description |
---|---|---|
d_model | 512 | Model embedding dimension |
num_heads | 8 | Number of attention heads |
num_layers | 6 | Number of encoder/decoder layers |
d_ff | 2048 | Feed-forward network dimension |
dropout | 0.1 | Dropout rate |
max_seq_length | 100 | Maximum sequence length |
- DataCamp Tutorial: "Building a Transformer with PyTorch"
- Original Paper: "Attention is All You Need" (Vaswani et al., 2017)