Skip to content

RicardoRobledo/BuildingTransformerModelsWithPytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building Transformers From Scratch 🤖

This repository contains my personal journey and hands-on implementations of Transformer models from the ground up — one of the most revolutionary architectures in modern Deep Learning and Natural Language Processing.

The concepts and structure are inspired by the book "Building Transformers From Scratch" by Jason Brownlee, available at machinelearningmastery.com, which provides a clear and progressive path to mastering this essential architecture in AI.


🧠 What's included in this repository?

Part I: Overview

  • Introduction to Attention and Transformer Models
  • Understanding Encoders and Decoders in Transformers

Part II: Building Blocks of Transformer Models

  • Tokenizers in Language Models:
    • Byte-Pair Encoding (BPE)
    • WordPiece
    • SentencePiece and Unigram
  • Word Embeddings:
    • Word2Vec implementations with Gensim and PyTorch
    • Embeddings in Transformer models
  • Positional Encodings:
    • Sinusoidal Positional Encodings
    • Learned Positional Encodings
    • Rotary Positional Encodings (RoPE)
    • Relative Positional Encodings
    • YaRN for larger context windows
  • Attention Mechanisms:
    • Multi-Head Attention (MHA)
    • Grouped-Query Attention (GQA)
    • Multi-Query Attention (MQA)
    • Multi-Head Latent Attention (MLA)
    • Attention Masking
  • Normalization Techniques:
    • Layer Normalization
    • RMS Normalization
    • Adaptive Layer Norm
  • Feed-Forward Networks:
    • Linear layers and activation functions
    • SwiGLU and variants
  • Advanced Architectures:
    • Mixture of Experts (MoE)
    • Skip Connections
    • Pre-norm vs Post-norm architectures

Part III: Building Complete Models

  • Plain Seq2Seq Model for Language Translation (LSTM-based)
  • Seq2Seq Model with Attention for Language Translation
  • Full Transformer Model for Language Translation (Encoder-Decoder)
  • Decoder-Only Transformer Model for Text Generation

💡 Why are Transformers important in Machine Learning?

Transformers have revolutionized AI by:

  • Enabling large language models like GPT, BERT, LLaMA, and Claude
  • Parallel processing of sequences (unlike RNNs)
  • Capturing long-range dependencies through self-attention
  • Scaling efficiently to billions of parameters
  • Transferring across domains: NLP, Computer Vision, Audio, and more

Understanding Transformers from scratch is essential for:

  • Building custom AI models
  • Fine-tuning pre-trained models
  • Understanding state-of-the-art architectures
  • Optimizing inference and training performance

🛠️ Tools & Libraries

  • PyTorch - Deep Learning framework
  • tokenizers - Hugging Face tokenizers library
  • torch.nn.functional - Neural network operations
  • requests - Data downloading
  • tqdm - Progress bars
  • matplotlib - Visualization

📂 Repository Structure

Each notebook is self-contained and can be run independently. Start with the overview chapters to understand the fundamentals, then explore the building blocks before diving into complete model implementations.


📊 Key Features

From-scratch implementations - No black boxes, understand every component
Progressive learning path - Build from basics to complex architectures
Modern techniques - RoPE, GQA, MoE, RMSNorm, and more
Complete working models - Translation and text generation systems
Well-commented code - Clear explanations in Spanish and English
Production-ready patterns - Best practices for model design


🎯 Learning Path

  1. Start with the Overview - Understand what Transformers are and why they work
  2. Master the Building Blocks - Learn each component in isolation
  3. Build Complete Models - Combine components into working systems
  4. Experiment and Extend - Modify architectures and explore variations

🌐 References

Book: Building Transformers From Scratch
Author: Jason Brownlee
Website: https://machinelearningmastery.com
Paper: "Attention Is All You Need" (Vaswani et al., 2017)


📝 License

This repository is for educational purposes. The implementations are based on the book by Jason Brownlee and academic papers.


🤝 Contributing

Feel free to open issues or submit pull requests if you find any bugs or have suggestions for improvements!


🌟 Acknowledgments

  • Jason Brownlee for the excellent book and structured approach
  • The original Transformer paper authors (Vaswani et al.)
  • The open-source ML community Happy Learning! 🚀

Building AI models one transformer layer at a time...

About

This is a compilation about excercises to learn how to implement a transformer model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published