<a href="https://colab.research.google.com/github/Sathwik612/BMSCE/blob/main/Translation%26_Inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#Necessary Modules!!!
!pip install googletrans==4.0.0-rc1 better_profanity nltk
!pip install wordfilter



Language translation


In [11]:
import nltk
from googletrans import Translator
from better_profanity import profanity
from wordfilter import Wordfilter
from nltk.corpus import wordnet

# Download required NLTK dataset
nltk.download('wordnet')

# Initialize word filter (for multiple languages)
word_filter = Wordfilter()

# Extend profanity filter with Indian slang
CUSTOM_PROFANITY = [
    "chutiya", "bhosdike", "gaandu", "madarchod", "behnchod",
    "harami", "randi", "lund", "chodu", "kamina", "kutti"
]
profanity.add_censor_words(CUSTOM_PROFANITY)

class LanguageTranslator:
    def __init__(self):
        self.translator = Translator()

    def is_slang_or_profanity(self, text):
        """
        Detects slang or profanity in any language.
        :param text: Input text.
        :return: Boolean (True if slang/profanity detected, False otherwise)
        """
        words = text.split()
        for word in words:
            if not wordnet.synsets(word):
                return True  # If no dictionary meaning, assume slang
            if profanity.contains_profanity(word):
                return True  # If flagged as profane, return True
            if word_filter.blacklisted(word):
                return True  # If word is blacklisted, return True
        return False

    def detect_and_translate(self, text):
        """
        Detects language, translates to English, and flags slang/profanity.
        :param text: User input.
        :return: Dictionary with detected language, confidence, translation, and slang detection.
        """
        detected = self.translator.detect(text)

        # Handle NoneType confidence score
        confidence_score = round(detected.confidence, 2) if detected.confidence else "Unknown"

        translated_text = text  # Default to original text

        if detected.lang != 'en':
            translated = self.translator.translate(text, dest='en')
            translated_text = translated.text  # Standard translation

        response = {
            "detected_language": detected.lang,
            "confidence": confidence_score,
            "original_text": text,
            "translated_text": translated_text,
            "contains_slang_or_profanity": self.is_slang_or_profanity(text)
        }

        return response

    def process_for_summarizer(self, text):
        """
        Translates text to English before sending it to the summarizer.
        :param text: Input text.
        :return: Translated text (ready for summarization).
        """
        result = self.detect_and_translate(text)
        print(f"Processed Text for Summarization: {result['translated_text']}")
        return result['translated_text']  # Return translated text only

# Example usage (for testing)
if __name__ == "__main__":
    translator = LanguageTranslator()
    sample_text = input("Enter text: ")
    translated_text = translator.process_for_summarizer(sample_text)
    print(f"Final Output (For Summarization): {translated_text}")


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Enter text: "madarchod kya kar raha hai
Processed Text for Summarization: "What is Madarachod doing
Final Output (For Summarization): "What is Madarachod doing


Inference drawer!!!!!


In [13]:
import torch
from transformers import RobertaTokenizer, RobertaForSequenceClassification
from scipy.special import softmax

class ElectionSentimentAnalyzer:
    def __init__(self, model_name="cardiffnlp/twitter-roberta-base-sentiment"):
        """
        Initializes the sentiment analyzer using a transformer model.
        Default: RoBERTa trained on Twitter sentiment analysis.
        """
        self.tokenizer = RobertaTokenizer.from_pretrained(model_name)
        self.model = RobertaForSequenceClassification.from_pretrained(model_name)
        self.labels = ["Negative", "Neutral", "Positive"]

    def analyze_tweet_sentiment(self, text):
        """
        Analyzes sentiment of an election-related tweet.
        :param text: Summarized election-related tweet.
        :return: Sentiment label and confidence scores.
        """
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, padding=True)
        with torch.no_grad():
            outputs = self.model(**inputs)
            scores = outputs.logits.numpy()[0]
            scores = softmax(scores)

        sentiment = self.labels[scores.argmax()]
        confidence = round(scores.max(), 2)

        return {
            "sentiment": sentiment,
            "confidence": confidence,
            "scores": {self.labels[i]: round(float(scores[i]), 2) for i in range(len(scores))}
        }

# Example usage (for testing)
if __name__ == "__main__":
    analyzer = ElectionSentimentAnalyzer()
    sample_tweet = input("Enter summarized election tweet: ")
    result = analyzer.analyze_tweet_sentiment(sample_tweet)
    print(f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']})")


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Enter summarized election tweet: "Mixed reactions to the ruling party's economic policy—some praise it, others call it unfair. Protests expected."


Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Sentiment: Neutral (Confidence: 0.47999998927116394)
