# Intent-Aware Machine Translation Using LLMs

This project focuses on translating non-English sentences into English while preserving the **original intent, emotion, and confidence level**.  
It leverages **OpenAI GPT models** for both **intent extraction** and **candidate translation generation**, followed by an **intent-aware selection pipeline**.


## 1. Problem Definition & Objective

### Selected Project Track
- NLP / LLM-based Translation

### Problem Statement
Traditional machine translation often fails to preserve **speech intent, emotion, and confidence**.  
For example, subtle differences in tone can change the meaning of a sentence, which is critical in **customer support, chatbots, and cross-cultural communication**.

### Objective
- Generate multiple English translations of a sentence.
- Extract the source intent (speech act, emotion, confidence).
- Select the translation **closest to the original intent**.

### Real-World Motivation
- Customer support automation.
- Cross-lingual sentiment-sensitive translation.
- Human-like chatbot responses.


In [46]:
# Load environment variables
from dotenv import load_dotenv
import os
import json

load_dotenv()

# Import project modules
from intent_extractor import extract_intent
from candidate_generator import generate_candidates
from evaluator import select_best


## 2. System Architecture

The pipeline consists of:

1. **Input Sentence** (non-English text)
2. **Intent Extraction** — Extract speech act, emotion, confidence using GPT.
3. **Candidate Generation** — Generate 5 distinct English translations.
4. **Intent-Aware Selection** — Score each candidate based on **intent similarity**.
5. **Output** — Return the best translation and its confidence.

### ASCII Diagram



      +----------------+
      | Input Sentence |
      +----------------+
                |
                v
      +----------------+
      | Extract Intent |
      +----------------+
                |
                v
      +----------------+
      | Generate 5     |
      | Candidates     |
      +----------------+
                |
                v
      +----------------+
      | Intent Scoring |
      +----------------+
                |
                v
      +----------------+
      | Best Translation|
      +----------------+


In [47]:
def normalize_intent(intent):
    """Normalize intent values to avoid empty fields."""
    for k in intent:
        if intent[k] in (None, ""):
            intent[k] = "unknown"
    return intent

def intent_aware_translate(text):
    # 1. Extract source intent
    src_intent = extract_intent(text)
    src_intent = normalize_intent(src_intent)
    
    # 2. Generate translation candidates
    candidates = generate_candidates(text)
    
    # 3. Select the best candidate based on intent similarity
    final_translation, confidence = select_best(candidates, src_intent)
    
    return {
        "input": text,
        "detected_intent": src_intent,
        "final_translation": final_translation,
        "confidence": round(confidence, 2),
        "candidates": candidates
    }


In [48]:
# Example non-English sentence
sentence = "ठीक है, देखेंगे"

result = intent_aware_translate(sentence)
result



[DEBUG] GPT Intent Response:
 {
  "speech_act": "agreement",
  "emotion": "neutral",
  "confidence_level": "medium"
}

[DEBUG] GPT Candidate Response:
 1. Alright, we'll see.
2. Okay, let's see.
3. Fine, we'll take a look.
4. Sure, we will check it out.
5. Alright, we'll figure it out.

[DEBUG] GPT Intent Response:
 {
  "speech_act": "promise",
  "emotion": "neutral",
  "confidence_level": "medium"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "request",
  "emotion": "neutral",
  "confidence_level": "medium"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "commitment",
  "emotion": "neutral",
  "confidence_level": "high"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "commitment",
  "emotion": "neutral",
  "confidence_level": "high"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "promise",
  "emotion": "determination",
  "confidence_level": "high"
}


{'input': 'ठीक है, देखेंगे',
 'detected_intent': {'speech_act': 'agreement',
  'emotion': 'neutral',
  'confidence_level': 'medium'},
 'final_translation': "Alright, we'll see.",
 'confidence': 0.5,
 'candidates': ["Alright, we'll see.",
  "Okay, let's see.",
  "Fine, we'll take a look.",
  'Sure, we will check it out.',
  "Alright, we'll figure it out."]}

In [49]:
test_sentences = [
    "मुझे बहुत गुस्सा आ रहा है",
    "क्या आप मेरी मदद कर सकते हैं?",
    "यह सही है, धन्यवाद"
]

results = [intent_aware_translate(s) for s in test_sentences]

for r in results:
    print("\nInput:", r["input"])
    print("Detected Intent:", r["detected_intent"])
    print("Candidates:", r["candidates"])
    print("Best Translation:", r["final_translation"])
    print("Confidence:", r["confidence"])
    print("-"*60)



[DEBUG] GPT Intent Response:
 {
  "speech_act": "expression",
  "emotion": "anger",
  "confidence_level": "high"
}

[DEBUG] GPT Candidate Response:
 1. I am really angry.
2. I'm feeling very frustrated.
3. I'm getting really upset.
4. I'm quite furious right now.
5. I'm extremely irritated.

[DEBUG] GPT Intent Response:
 {
  "speech_act": "expression",
  "emotion": "anger",
  "confidence_level": "high"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "expressing emotion",
  "emotion": "frustration",
  "confidence_level": "high"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "expressing frustration",
  "emotion": "upset",
  "confidence_level": "high"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "expressing emotion",
  "emotion": "anger",
  "confidence_level": "high"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "expression",
  "emotion": "irritation",
  "confidence_level": "high"
}

[DEBUG] GPT Intent Response:
 {
  "speech_act": "request",
  "emotion": "hopeful",
  "c

## 3. Ethical Considerations

1. **Bias & Fairness**
   - GPT may misclassify intent or emotion for minority languages.
   - Translations may reflect cultural or gender biases.

2. **Dataset Limitations**
   - No real dataset used; entirely LLM-generated.
   - Quality depends on GPT training data.

3. **Responsible AI**
   - Sensitive content must be reviewed by humans.
   - Avoid using this system for critical legal/medical translations.


## 4. Conclusion & Future Scope

### Summary
- Successfully generated translations preserving intent, emotion, and confidence.
- Intent-aware scoring improves selection over naive translation.

### Future Improvements
- Use a **larger, labeled dataset** to fine-tune the intent classifier.
- Extend to **multilingual translation** beyond Hindi → English.
- Integrate **human-in-the-loop feedback** for higher reliability.
- Add **GUI or API** for real-world deployment.
