Building Transformers From Scratch 🤖

This repository contains my personal journey and hands-on implementations of Transformer models from the ground up — one of the most revolutionary architectures in modern Deep Learning and Natural Language Processing.

The concepts and structure are inspired by the book "Building Transformers From Scratch" by Jason Brownlee, available at machinelearningmastery.com, which provides a clear and progressive path to mastering this essential architecture in AI.

🧠 What's included in this repository?

Part I: Overview

Introduction to Attention and Transformer Models
Understanding Encoders and Decoders in Transformers

Part II: Building Blocks of Transformer Models

Tokenizers in Language Models:
- Byte-Pair Encoding (BPE)
- WordPiece
- SentencePiece and Unigram
Word Embeddings:
- Word2Vec implementations with Gensim and PyTorch
- Embeddings in Transformer models
Positional Encodings:
- Sinusoidal Positional Encodings
- Learned Positional Encodings
- Rotary Positional Encodings (RoPE)
- Relative Positional Encodings
- YaRN for larger context windows
Attention Mechanisms:
- Multi-Head Attention (MHA)
- Grouped-Query Attention (GQA)
- Multi-Query Attention (MQA)
- Multi-Head Latent Attention (MLA)
- Attention Masking
Normalization Techniques:
- Layer Normalization
- RMS Normalization
- Adaptive Layer Norm
Feed-Forward Networks:
- Linear layers and activation functions
- SwiGLU and variants
Advanced Architectures:
- Mixture of Experts (MoE)
- Skip Connections
- Pre-norm vs Post-norm architectures

Part III: Building Complete Models

Plain Seq2Seq Model for Language Translation (LSTM-based)
Seq2Seq Model with Attention for Language Translation
Full Transformer Model for Language Translation (Encoder-Decoder)
Decoder-Only Transformer Model for Text Generation

💡 Why are Transformers important in Machine Learning?

Transformers have revolutionized AI by:

Enabling large language models like GPT, BERT, LLaMA, and Claude
Parallel processing of sequences (unlike RNNs)
Capturing long-range dependencies through self-attention
Scaling efficiently to billions of parameters
Transferring across domains: NLP, Computer Vision, Audio, and more

Understanding Transformers from scratch is essential for:

Building custom AI models
Fine-tuning pre-trained models
Understanding state-of-the-art architectures
Optimizing inference and training performance

🛠️ Tools & Libraries

PyTorch - Deep Learning framework
tokenizers - Hugging Face tokenizers library
torch.nn.functional - Neural network operations
requests - Data downloading
tqdm - Progress bars
matplotlib - Visualization

📂 Repository Structure

Each notebook is self-contained and can be run independently. Start with the overview chapters to understand the fundamentals, then explore the building blocks before diving into complete model implementations.

📊 Key Features

✅ From-scratch implementations - No black boxes, understand every component
✅ Progressive learning path - Build from basics to complex architectures
✅ Modern techniques - RoPE, GQA, MoE, RMSNorm, and more
✅ Complete working models - Translation and text generation systems
✅ Well-commented code - Clear explanations in Spanish and English
✅ Production-ready patterns - Best practices for model design

🎯 Learning Path

Start with the Overview - Understand what Transformers are and why they work
Master the Building Blocks - Learn each component in isolation
Build Complete Models - Combine components into working systems
Experiment and Extend - Modify architectures and explore variations

🌐 References

Book: Building Transformers From Scratch
Author: Jason Brownlee
Website: https://machinelearningmastery.com
Paper: "Attention Is All You Need" (Vaswani et al., 2017)

📝 License

This repository is for educational purposes. The implementations are based on the book by Jason Brownlee and academic papers.

🤝 Contributing

Feel free to open issues or submit pull requests if you find any bugs or have suggestions for improvements!

🌟 Acknowledgments

Jason Brownlee for the excellent book and structured approach
The original Transformer paper authors (Vaswani et al.)
The open-source ML community Happy Learning! 🚀

Building AI models one transformer layer at a time...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building Transformers From Scratch 🤖

🧠 What's included in this repository?

Part I: Overview

Part II: Building Blocks of Transformer Models

Part III: Building Complete Models

💡 Why are Transformers important in Machine Learning?

🛠️ Tools & Libraries

📂 Repository Structure

📊 Key Features

🎯 Learning Path

🌐 References

📝 License

🤝 Contributing

🌟 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
AttentionMaskinginTransformerModels.ipynb		AttentionMaskinginTransformerModels.ipynb
BuildingaDecoderOnlyTransformerModelforTextGeneration.ipynb		BuildingaDecoderOnlyTransformerModelforTextGeneration.ipynb
BuildingaTransformerModelforLanguageTranslation.ipynb		BuildingaTransformerModelforLanguageTranslation.ipynb
InterpolationinPositionalEncodingsandUsingYaRNforLargerContextWindow.ipynb		InterpolationinPositionalEncodingsandUsingYaRNforLargerContextWindow.ipynb
LayerNormandRMSNorminTransformerModels.ipynb		LayerNormandRMSNorminTransformerModels.ipynb
LinearLayersandActivationFunctionsinTransformerModels.ipynb		LinearLayersandActivationFunctionsinTransformerModels.ipynb
MixtureofExpertsArchitectureinTransformerModels.ipynb		MixtureofExpertsArchitectureinTransformerModels.ipynb
Multi-HeadAttentionandGrouped-QueryAttention.ipynb		Multi-HeadAttentionandGrouped-QueryAttention.ipynb
Multi-HeadLatentAttention(MLA).ipynb		Multi-HeadLatentAttention(MLA).ipynb
PositionalEncodinginTransformerModels.ipynb		PositionalEncodinginTransformerModels.ipynb
README.md		README.md
SkipConnectionsinTransformerModels.ipynb		SkipConnectionsinTransformerModels.ipynb
TokenizersinLanguageModels.ipynb		TokenizersinLanguageModels.ipynb
WordEmbeddingsinLanguageModels.ipynb		WordEmbeddingsinLanguageModels.ipynb
desktop.ini		desktop.ini

RicardoRobledo/BuildingTransformerModelsWithPytorch

Folders and files

Latest commit

History

Repository files navigation

Building Transformers From Scratch 🤖

🧠 What's included in this repository?

Part I: Overview

Part II: Building Blocks of Transformer Models

Part III: Building Complete Models

💡 Why are Transformers important in Machine Learning?

🛠️ Tools & Libraries

📂 Repository Structure

📊 Key Features

🎯 Learning Path

🌐 References

📝 License

🤝 Contributing

🌟 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages