This Jupyter Notebook implements a basic Transformer model from scratch using PyTorch to perform English to French translation. Itโs designed as an foundatioal project to understand the inner workings of transformer-based architectures.
- Tokenization using NLTK
- Custom vocabulary generation
- Manual implementation of:
- Scaled Dot-Product Attention
- Multi-Head Attention
- Positional Encoding
- Encoder and Decoder blocks
- Basic training loop
- Translation inference on toy data
A small toy dataset of English-French sentence pairs is included directly in the notebook:
data = [
("I am a student", "Je suis un รฉtudiant"),
("She is reading a book", "Elle lit un livre"),
("The sun is shining", "Le soleil brille"),
("He loves football", "Il aime le football"),
]
Install the required libraries before running:
pip install torch nltk tqdm
The notebook also downloads NLTK tokenizers automatically:
nltk.download('punkt')
This implementation includes:
- Positional Encoding
- Multi-Head Attention from scratch
- Encoder & Decoder layers
- Final Linear + Softmax projection
- Open the notebook in Jupyter or Colab.
- Run all cells sequentially.
- Observe model training on the toy dataset.
- Check translation predictions like:
Input: I am a student
Prediction: Je suis un รฉtudiant
- Use a larger dataset (e.g., Tatoeba, Europarl)
- Implement subword tokenization (BPE, WordPiece)
- Add batching and masking
- Train on GPU for better performance
- Support more languages (e.g., English-German)
MIT License.
Feel free to use, modify, and share!
โ๏ธ Author: Jebin Jolly Abraham