# AI ENGINEER ASSIGNMENT CODE

So, there will be 2 different sections in this notebook, the first section will contain, hindi-to-english translation, summarization and sentiment analysis of the subsequent conversation (which was translated to English), and the second section, which contains direct sentiment analysis in Hindi using different models through the HuggingFace API, models that understand Hindi as well.

## SECTION 1: HINDI-TO-ENGLISH TRANSLATION, SUMMARIZATION AND SENTIMENT ANALYSIS

In [1]:
import transformers

In [10]:
conversation_hindi = """Recovery Agent (RA): नमस्ते श्री कुमार, मैं एक्स वाई जेड फाइनेंस से बोल रहा हूं। आपके लोन के बारे में बात करनी थी।
Borrower (B): हां, बोलिए। क्या बात है?
RA: सर, आपका पिछले महीने का EMI अभी तक नहीं आया है। क्या कोई समस्या है?
B: हां, थोड़ी दिक्कत है। मेरी नौकरी चली गई है और मैं नया काम ढूंढ रहा हूं।
RA: ओह, यह तो बुरा हुआ। लेकिन सर, आपको समझना होगा कि लोन का भुगतान समय पर करना बहुत जरूरी है।
B: मैं समझता हूं, लेकिन अभी मेरे पास पैसे नहीं हैं। क्या कुछ समय मिल सकता है?
RA: हम समझते हैं आपकी स्थिति। क्या आप अगले हफ्ते तक कुछ भुगतान कर सकते हैं?
B: मैं कोशिश करूंगा, लेकिन पूरा EMI नहीं दे पाऊंगा। क्या आधा भुगतान चलेगा?
RA: ठीक है, आधा भुगतान अगले हफ्ते तक कर दीजिए। बाकी का क्या प्लान है आपका?
B: मुझे उम्मीद है कि अगले महीने तक मुझे नया काम मिल जाएगा। तब मैं बाकी बकाया चुका दूंगा।
RA: ठीक है। तो हम ऐसा करते हैं - आप अगले हफ्ते तक आधा EMI जमा कर दीजिए, और अगले महीने के 15 तारीख तक बाकी का भुगतान कर दीजिए। क्या यह आपको स्वीकार है?
B: हां, यह ठीक रहेगा। मैं इस प्लान का पालन करने की पूरी कोशिश करूंगा।
RA: बहुत अच्छा। मैं आपको एक SMS भेज रहा हूं जिसमें भुगतान की डिटेल्स होंगी। कृपया इसका पालन करें और समय पर भुगतान करें।
B: ठीक है, धन्यवाद आपके समझने के लिए।
RA: आपका स्वागत है। अगर कोई और सवाल हो तो मुझे बताइएगा। अलविदा।
B: अलविदा।
"""

### TRANSLATION

In [2]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

#using the mbart translation model from facebook, as at some point the conversation would have to be translated to english to summarize it.

model_name = "facebook/mbart-large-50-many-to-many-mmt"
model = MBartForConditionalGeneration.from_pretrained(model_name)
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)

#to avoid exceeding input token length, I have split the conversation into segments, each segment separated by a newline and after ignoring empty strings, the code translates each
#line into Hindi and appends it to the full_translation variable

conv_segments = conversation_hindi.split("\n")
translated_conv_segments = []

for segment in conv_segments:
    if segment.strip(): 
        tokenizer.src_lang = "hi_IN"
        encoded_hi = tokenizer(segment, return_tensors="pt") #forms tokens in Hindi language
        generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"])
        translation = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0] #decodes the generated tokens for each segment
        translated_conv_segments.append(translation)

full_translation = "\n".join(translated_conv_segments)

print("Full Translation:\n", full_translation)

Full Translation:
 Recovery Agent (RA): Hello Mr. Kumar, I'm talking to X Y Z Finance. I had to talk about your loan.
Borrower: Yes, speak. What's the matter?
RA: Sir, your last month's EMI hasn' t arrived yet. Is there a problem?
B: Yeah, I'm having a little trouble. My job' s gone and I'm looking for a new job.
RA: Oh, that's bad. But sir, you have to understand that it' s very important to pay the loan on time.
B: I understand, but I don 't have the money yet. Can I have some time?
RA: We understand your situation. Can you make some payments by next week?
B: I 'll try, but I can' t give the full EMI. Will half the payment go?
RA: Okay, make half the payment by next week. What's your plan for the rest?
B: I hope to get a new job by next month. Then I 'll pay the rest.
RA: All right. So we 'll do that - you deposit half of the EMI up to the next week, and pay the rest up to the 15th of next month. Do you accept that?
B: Yeah, it 'll be fine. I' ll do my best to follow this plan.
RA: V

### CONVERSATION SUMMARIZATION

In [7]:
from transformers import pipeline

#using the default distilbart summarizer model as it is sufficiently good at summarizing english text.
summarizer = pipeline("summarization")
summary = summarizer(full_translation, max_length=200, min_length=110, do_sample=False) 
print("\nSummary:\n", summary[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.



Summary:
  Borrower's last month's EMI hasn't arrived yet, says X Y Z Finance . Recovery Agent (RA) asks borrower to make payments by next week . He says he hopes to get a new job by next month, then pay the rest of the money . The borrower says he is 'having a little trouble' because he can't give the full EMI . The debt collector says it is important to pay the loan on time and deposit half of the EMI up to the next week, and pay rest of next month .


### SENTIMENT ANALYSIS OF THE CONVERSATION

In [4]:
sentiment_analyzer_instance = pipeline("sentiment-analysis")
#we stick to the default sentiment analysis model

conversation_lines = full_translation.split("\n")
#extracting line by line sentiment of the text
conversation_sentiments = []

for line in conversation_lines:
    if line.strip():
        sentiment = sentiment_analyzer_instance(line)
        conversation_sentiments.append((line.strip(), sentiment))

for line, sentiment in conversation_sentiments:
    print(f"Text: {line}")
    #printing sentiments line-by-line
    print(f"Sentiment: {sentiment}\n")


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Text: Recovery Agent (RA): Hello Mr. Kumar, I'm talking to X Y Z Finance. I had to talk about your loan.
Sentiment: [{'label': 'NEGATIVE', 'score': 0.8278596997261047}]

Text: Borrower: Yes, speak. What's the matter?
Sentiment: [{'label': 'NEGATIVE', 'score': 0.9892157316207886}]

Text: RA: Sir, your last month's EMI hasn' t arrived yet. Is there a problem?
Sentiment: [{'label': 'NEGATIVE', 'score': 0.9990589022636414}]

Text: B: Yeah, I'm having a little trouble. My job' s gone and I'm looking for a new job.
Sentiment: [{'label': 'NEGATIVE', 'score': 0.9995023012161255}]

Text: RA: Oh, that's bad. But sir, you have to understand that it' s very important to pay the loan on time.
Sentiment: [{'label': 'NEGATIVE', 'score': 0.6003400683403015}]

Text: B: I understand, but I don 't have the money yet. Can I have some time?
Sentiment: [{'label': 'NEGATIVE', 'score': 0.9988842606544495}]

Text: RA: We understand your situation. Can you make some payments by next week?
Sentiment: [{'label': 

## SECTION TWO: SENTIMENT ANALYSIS IN THE BASE HINDI LANGUAGE AS OUTPUTTED BY 2 DIFFERENT MULTILINGUAL MODELS

### SENTIMENT ANALYSIS USING THE XLM-Roberta model

In [11]:
sentiment_analyzer_1 = pipeline("sentiment-analysis", model="nlptown/bert-base-multilingual-uncased-sentiment")

conversation_lines = conversation_hindi.split("\n")
sentiments = []

#the below mapping is done to ensure interpretable results.
star_to_sentiment = {
    "1 star": "very negative",
    "2 stars": "negative",
    "3 stars": "neutral",
    "4 stars": "positive",
    "5 stars": "very positive"
}

# Analyze sentiment for each line
for line in conversation_lines:
    #ignoring any potential empty lines in the conv
    if line.strip():
        sentiment = sentiment_analyzer_1(line)
        sentiment_label = star_to_sentiment[sentiment[0]['label']]
        sentiments.append((line.strip(), sentiment_label, sentiment[0]['score']))

for line, sentiment, score in sentiments:
    print(f"Text: {line}")
    print(f"Sentiment: {sentiment} (Score: {score})\n")
    #prints sentiments detected line by line

Text: Recovery Agent (RA): नमस्ते श्री कुमार, मैं एक्स वाई जेड फाइनेंस से बोल रहा हूं। आपके लोन के बारे में बात करनी थी।
Sentiment: positive (Score: 0.2646923065185547)

Text: Borrower (B): हां, बोलिए। क्या बात है?
Sentiment: very negative (Score: 0.38013380765914917)

Text: RA: सर, आपका पिछले महीने का EMI अभी तक नहीं आया है। क्या कोई समस्या है?
Sentiment: very negative (Score: 0.48783984780311584)

Text: B: हां, थोड़ी दिक्कत है। मेरी नौकरी चली गई है और मैं नया काम ढूंढ रहा हूं।
Sentiment: negative (Score: 0.3589804470539093)

Text: RA: ओह, यह तो बुरा हुआ। लेकिन सर, आपको समझना होगा कि लोन का भुगतान समय पर करना बहुत जरूरी है।
Sentiment: neutral (Score: 0.4536367952823639)

Text: B: मैं समझता हूं, लेकिन अभी मेरे पास पैसे नहीं हैं। क्या कुछ समय मिल सकता है?
Sentiment: negative (Score: 0.4188579320907593)

Text: RA: हम समझते हैं आपकी स्थिति। क्या आप अगले हफ्ते तक कुछ भुगतान कर सकते हैं?
Sentiment: neutral (Score: 0.3204396963119507)

Text: B: मैं कोशिश करूंगा, लेकिन पूरा EMI नहीं दे पाऊंगा

In [12]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model_name = "j-hartmann/emotion-english-distilroberta-base"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

sentiment_analyzer_2 = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

conversation_lines = conversation_hindi.split("\n")
sentiments = []

for line in conversation_lines:
    if line.strip():  # Skip empty lines as it gives garbage output
        sentiment = sentiment_analyzer_2(line)
        sentiments.append((line.strip(), sentiment))

for line, sentiment in sentiments:
    print(f"Text: {line}")
    print(f"Sentiment: {sentiment}\n")

Text: Recovery Agent (RA): नमस्ते श्री कुमार, मैं एक्स वाई जेड फाइनेंस से बोल रहा हूं। आपके लोन के बारे में बात करनी थी।
Sentiment: [{'label': 'neutral', 'score': 0.6516580581665039}]

Text: Borrower (B): हां, बोलिए। क्या बात है?
Sentiment: [{'label': 'neutral', 'score': 0.759221076965332}]

Text: RA: सर, आपका पिछले महीने का EMI अभी तक नहीं आया है। क्या कोई समस्या है?
Sentiment: [{'label': 'neutral', 'score': 0.7611526846885681}]

Text: B: हां, थोड़ी दिक्कत है। मेरी नौकरी चली गई है और मैं नया काम ढूंढ रहा हूं।
Sentiment: [{'label': 'neutral', 'score': 0.7226436734199524}]

Text: RA: ओह, यह तो बुरा हुआ। लेकिन सर, आपको समझना होगा कि लोन का भुगतान समय पर करना बहुत जरूरी है।
Sentiment: [{'label': 'neutral', 'score': 0.7633628249168396}]

Text: B: मैं समझता हूं, लेकिन अभी मेरे पास पैसे नहीं हैं। क्या कुछ समय मिल सकता है?
Sentiment: [{'label': 'neutral', 'score': 0.7557825446128845}]

Text: RA: हम समझते हैं आपकी स्थिति। क्या आप अगले हफ्ते तक कुछ भुगतान कर सकते हैं?
Sentiment: [{'label': 'neu