Skip to content

JebinAbraham/Transformer-based-Language-Transilation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 

Repository files navigation

๐ŸŒ Transformer-based Language Translation

This Jupyter Notebook implements a basic Transformer model from scratch using PyTorch to perform English to French translation. Itโ€™s designed as an foundatioal project to understand the inner workings of transformer-based architectures.


๐Ÿ“š Features

  • Tokenization using NLTK
  • Custom vocabulary generation
  • Manual implementation of:
    • Scaled Dot-Product Attention
    • Multi-Head Attention
    • Positional Encoding
    • Encoder and Decoder blocks
  • Basic training loop
  • Translation inference on toy data

๐Ÿ“ฆ Dataset

A small toy dataset of English-French sentence pairs is included directly in the notebook:

data = [
    ("I am a student", "Je suis un รฉtudiant"),
    ("She is reading a book", "Elle lit un livre"),
    ("The sun is shining", "Le soleil brille"),
    ("He loves football", "Il aime le football"),
]

๐Ÿš€ Requirements

Install the required libraries before running:

pip install torch nltk tqdm

The notebook also downloads NLTK tokenizers automatically:

nltk.download('punkt')

๐Ÿง  Model Overview

This implementation includes:

  • Positional Encoding
  • Multi-Head Attention from scratch
  • Encoder & Decoder layers
  • Final Linear + Softmax projection

๐Ÿƒโ€โ™‚๏ธ How to Run

  1. Open the notebook in Jupyter or Colab.
  2. Run all cells sequentially.
  3. Observe model training on the toy dataset.
  4. Check translation predictions like:
Input: I am a student
Prediction: Je suis un รฉtudiant

๐Ÿ”ฎ Improvements & Ideas

  • Use a larger dataset (e.g., Tatoeba, Europarl)
  • Implement subword tokenization (BPE, WordPiece)
  • Add batching and masking
  • Train on GPU for better performance
  • Support more languages (e.g., English-German)

๐Ÿ“œ License

MIT License.
Feel free to use, modify, and share!


โœ๏ธ Author: Jebin Jolly Abraham

About

Implementation note on Language Translation Transformer Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published