🌍 Transformer-based Language Translation

This Jupyter Notebook implements a basic Transformer model from scratch using PyTorch to perform English to French translation. It’s designed as an foundatioal project to understand the inner workings of transformer-based architectures.

📚 Features

Tokenization using NLTK
Custom vocabulary generation
Manual implementation of:
- Scaled Dot-Product Attention
- Multi-Head Attention
- Positional Encoding
- Encoder and Decoder blocks
Basic training loop
Translation inference on toy data

📦 Dataset

A small toy dataset of English-French sentence pairs is included directly in the notebook:

data = [
    ("I am a student", "Je suis un étudiant"),
    ("She is reading a book", "Elle lit un livre"),
    ("The sun is shining", "Le soleil brille"),
    ("He loves football", "Il aime le football"),
]

🚀 Requirements

Install the required libraries before running:

pip install torch nltk tqdm

The notebook also downloads NLTK tokenizers automatically:

nltk.download('punkt')

🧠 Model Overview

This implementation includes:

Positional Encoding
Multi-Head Attention from scratch
Encoder & Decoder layers
Final Linear + Softmax projection

🏃‍♂️ How to Run

Open the notebook in Jupyter or Colab.
Run all cells sequentially.
Observe model training on the toy dataset.
Check translation predictions like:

Input: I am a student
Prediction: Je suis un étudiant

🔮 Improvements & Ideas

Use a larger dataset (e.g., Tatoeba, Europarl)
Implement subword tokenization (BPE, WordPiece)
Add batching and masking
Train on GPU for better performance
Support more languages (e.g., English-German)

📜 License

MIT License.
Feel free to use, modify, and share!

✍️ Author: Jebin Jolly Abraham

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌍 Transformer-based Language Translation

📚 Features

📦 Dataset

🚀 Requirements

🧠 Model Overview

🏃‍♂️ How to Run

🔮 Improvements & Ideas

📜 License

About

Uh oh!

Releases

Packages

JebinAbraham/Transformer-based-Language-Transilation

Folders and files

Latest commit

History

Repository files navigation

🌍 Transformer-based Language Translation

📚 Features

📦 Dataset

🚀 Requirements

🧠 Model Overview

🏃‍♂️ How to Run

🔮 Improvements & Ideas

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages