Skip to content

WoC: Hinglish transformer

Gautam edited this page Jan 26, 2023 · 5 revisions

Product Explanation

Build a hugging face pipeline to train and fine-tune a transformer to translate Hinglish sentences (text containing Hindi and English words in Roman/Latin script) into English. The sentence will usually be a Hindi sentence transliterated into the Roman script with a few nouns in English. The task will involve generating synthetic Hinglish-English sentence pairs using 'Bhashini' Hindi<-> English translation models. The transformer should be available both as a hosted API service for streaming translation tasks and as a downloadable model for tabular data batch translations.

End Goal : Feedback data received in Hinglish should be translated to English

Learning Path

  • Create corpus of Hinglish- English sentence pairs
  • Create pipeline for training transformer on corpus
  • Create model finetuning pipeline for pre-trained model
  • Create deployment setup

Issues can be raised here

Category Rating
Difficulty Medium
Risk/Exploratory High
Core Development Python, PyTorch
Skills NLP
Mentors Gautam Rajeev
Project size

Clone this wiki locally