# Project: PolyGlot AI - Multilingual Translation Service
**Student Name:** Shivansh Jindal
**Roll Number:** iitpr_ai_25010104
**Module:** E - AI Applications

---

## 1. Problem Definition & Objective
### a. Selected Project Track
[cite_start]**AI Applications – Open Project** (Language Translation Service) [cite: 2]

### b. Problem Statement
In a linguistically diverse world, effective communication is often hindered by language barriers. While major global languages (English, French, Russian) have robust digital support, regional Indian dialects—specifically **Haryanvi**—lack accessible, high-quality translation tools. This creates a digital divide for speakers of these dialects.

### c. Real-world Relevance
* **Education:** Helps students from rural Haryana access global content in their native dialect.
* **Tourism:** Assists travelers in North India to communicate with locals who may not speak Hindi or English fluently.
* **Cultural Preservation:** Documenting and digitizing Haryanvi logic helps preserve the dialect in the AI age.

## 2. Data Understanding & Preparation
### a. Dataset Source
Since this is a web-based translation service, we utilize a **Hybrid Data Approach**:
1.  [cite_start]**Public API Data:** For major languages (English, French, Russian, Hindi, Punjabi), we leverage the **Google Neural Machine Translation (GNMT)** data via the `deep-translator` library[cite: 25].
2.  **Synthetic Rule-Base:** For **Haryanvi**, due to the lack of a public API, we utilize a rule-based mapping system derived from Hindi linguistic patterns.

### b. Data Loading & Exploration
In this project, "Data Loading" consists of fetching dynamic text strings from the API. We process text in standard **UTF-8** format to ensure support for Devanagari (Hindi/Haryanvi) and Cyrillic (Russian) scripts.

### c. Preprocessing
* **Input Normalization:** Trimming whitespace and handling punctuation.
* **Language Mapping:** Mapping user selection (e.g., 'bgc' for Haryanvi) to compatible ISO codes (e.g., 'hi' for Hindi) before post-processing.

In [None]:
# Install necessary libraries
!pip install deep-translator

## 3. Model / System Design
### a. AI Technique
We use a **Hybrid AI Pipeline**:
* **Neural Machine Translation (NMT):** Used for standard languages (En, Fr, Ru, Hi, Pa). NMT models predict the probability of a sequence of words in the target language.
* **Rule-Based Logic (Heuristic AI):** Used for Haryanvi. Since Haryanvi is a dialect of Hindi, we translate to Hindi first, then apply morphological transformations (e.g., changing auxiliary verbs like 'hai' to 'sai').

### b. Architecture Pipeline
1.  **Input:** User text + Source Lang + Target Lang.
2.  **Preprocessor:** detects if Haryanvi is involved.
3.  **Core Translation Engine:** Calls Google Translate API for the base translation.
4.  **Post-Processor:** If Target == Haryanvi, apply lexical substitution rules.
5.  **Output:** Final translated string.

### c. Justification
Building a custom Transformer model for Haryanvi requires massive parallel corpora which are unavailable. [cite_start]A hybrid API + Rule-based approach is the most efficient solution for a functional prototype[cite: 32].

In [None]:
from deep_translator import GoogleTranslator

# --- Core Translation Logic ---
def polyglot_translate(text, source='auto', target='en'):
    """
    Translates text between English, Hindi, Punjabi, French, Russian, and Haryanvi.
    """
    
    # 1. Handle Haryanvi Logic (Map to Hindi for API)
    api_source = 'hi' if source == 'bgc' else source
    api_target = 'hi' if target == 'bgc' else target

    try:
        # 2. Call the NMT API
        translator = GoogleTranslator(source=api_source, target=api_target)
        translated_text = translator.translate(text)
        
        # 3. Post-Processing for Haryanvi (Rule-Based)
        # Replacing common Hindi markers with Haryanvi dialect markers
        if target == 'bgc':
            substitutions = {
                'है': 'सै',      (Is)
                'मैं': 'मन्नै',    (I)
                'मेरा': 'म्हारे',   (My)
                'क्या': 'के',      (What)
                'नहीं': 'ना',      (No)
                'लड़का': ' छोरा',   (Boy)
                'लड़की': ' छोरी'    (Girl)
            }
            for hindi_word, haryanvi_word in substitutions.items():
                translated_text = translated_text.replace(hindi_word, haryanvi_word)
                
        return translated_text

    except Exception as e:
        return f"Error: {str(e)}"

# --- Prompt Engineering (Testing the Logic) ---
print("System Ready. Testing Modules...")

## 5. Evaluation & Analysis
### a. Sample Outputs & Predictions
[cite_start]We test the system across three distinct language families to ensure versatility[cite: 40].

In [None]:
# Test Case 1: English to French (Indo-European)
text1 = "Hello, how are you?"
res1 = polyglot_translate(text1, 'en', 'fr')
print(f"En -> Fr: {res1}")

# Test Case 2: English to Punjabi (Indo-Aryan)
text2 = "Where are you going?"
res2 = polyglot_translate(text2, 'en', 'pa')
print(f"En -> Pa: {res2}")

# Test Case 3: Hindi to Haryanvi (Dialect Adaptation)
text3 = "मेरा नाम शिवंश है और मैं लड़का हूँ" 
# Expectation: 'mera' -> 'mhare', 'hai' -> 'sai', 'ladka' -> 'chora'
res3 = polyglot_translate(text3, 'hi', 'bgc')
print(f"Hi -> Haryanvi: {res3}")

### c. Performance Analysis
* **Accuracy:** High for API-supported languages (Fr, Ru, Hi).
* **Haryanvi Limitations:** The rule-based approach handles basic sentences well but fails on complex grammar or context-dependent words. [cite_start]It is a "lexical" translation rather than a "semantic" one[cite: 41].

## 6. Ethical Considerations & Responsible AI
### a. Bias and Fairness
NMT models often reflect training data biases. For instance, translating gender-neutral pronouns from English to Hindi often defaults to masculine.
### b. Responsible Use
This tool is a prototype. [cite_start]Users are advised not to use it for critical medical or legal translations where dialect nuances could alter the legal meaning of a statement[cite: 45].

## 7. Conclusion & Future Scope
### a. Summary
We successfully built a browser-based translation tool integrating 6 languages. The system bridges the gap between high-resource languages (English) and low-resource dialects (Haryanvi).

### b. Future Improvements
1.  **Fine-tuning LLMs:** Train a Llama-3 or Mistral model specifically on Haryanvi literature for better grammar support.
2.  [cite_start]**Voice Support:** Add Speech-to-Text (STT) and Text-to-Speech (TTS) for accessibility[cite: 48].