# Task 3: Multi-lingual Sentiment Analysis (English vs Hindi)

## Objective
To build an end-to-end NLP application that performs sentiment analysis on
English and Hindi text using pre-trained Hugging Face transformer models,
and compares their outputs.


In [1]:
!pip install transformers sentencepiece torch




In [5]:
from transformers import pipeline
test = pipeline("sentiment-analysis")
test("I love this project")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f.
Using a pipeline without specifying a model name and revision in production is not recommended.


Loading weights:   0%|          | 0/104 [00:00<?, ?it/s]

[{'label': 'POSITIVE', 'score': 0.999884843826294}]

In [3]:
!pip install transformers torch sentencepiece --no-cache-dir




# Task 3: Multi-lingual Sentiment Analysis (English vs Hindi)

This notebook compares sentiment analysis results for English and Hindi text using Hugging Face transformer models.


In [7]:
from transformers import pipeline

english_sa = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english"
)


english_texts = [
    "I love this project",
    "This movie was terrible",
    "The product quality is average"
]

for text in english_texts:
    print(text, "→", english_sa(text))


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/104 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

I love this project → [{'label': 'POSITIVE', 'score': 0.999884843826294}]
This movie was terrible → [{'label': 'NEGATIVE', 'score': 0.9996950626373291}]
The product quality is average → [{'label': 'NEGATIVE', 'score': 0.9847309589385986}]


In [8]:
from transformers import pipeline

hindi_sa = pipeline(
    "sentiment-analysis",
    model="nlptown/bert-base-multilingual-uncased-sentiment"
)

hindi_texts = [
    "यह फिल्म बहुत अच्छी है",
    "यह प्रोजेक्ट बहुत खराब है",
    "मुझे यह पसंद नहीं आया"
]

for text in hindi_texts:
    print(text, "->", hindi_sa(text))


config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/669M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

यह फिल्म बहुत अच्छी है -> [{'label': '5 stars', 'score': 0.4249618649482727}]
यह प्रोजेक्ट बहुत खराब है -> [{'label': '5 stars', 'score': 0.33137762546539307}]
मुझे यह पसंद नहीं आया -> [{'label': '1 star', 'score': 0.46384197473526}]


## Task 3: Multilingual Sentiment Analysis – Explanation

In this task, sentiment analysis was performed using Hugging Face pretrained
transformer models.

For English text, a DistilBERT model fine-tuned on SST-2 was used to classify
sentiment as positive or negative.

For Hindi text, a multilingual BERT model was used, which supports multiple
languages and can analyze sentiment in Hindi sentences.

The results show that transformer-based models can effectively handle sentiment
analysis across different languages with high accuracy.
