## ✅Section 1: Dataset Preparation
Plan: Create a labeled dataset for stance classification using the uploaded CSV.

Inputs: H/T column = homophobia/transphobia
CS column = counter-speech

Output: stance_data with two columns: text and label (HATE or COUNTER)

✔️ Status: ✅ Done
📁 Dataset ready for training and filtering

## ✅ Section 2: Train Stance Classifier (STA Model)
Plan: Fine-tune xlm-roberta-base to classify any text as HATE (0) or COUNTER (1)

Uses Hugging Face Trainer

Trains on stance_data split into train/test

Saves model to ./stance_classifier_xlmroberta

✔️ Status: ✅ Done
🤖 Trained model available for embedding extraction

## ✅ Section 3: Semantic Retrieval (SEM)
Plan: Retrieve candidate counter-speech using LaBSE + FAISS

Create embeddings for all CS

At inference, embed HS → retrieve top-k most similar CS

✔️ Status: ✅ Done
🔎 Retrieval working based on semantic similarity

## 🔄 Section 4: Stance Filtering (STA)
Plan: Rank retrieved CS candidates by stance dissimilarity to input HS

Load trained XLM-RoBERTa

Extract [CLS] embeddings for HS and each CS

Compute cosine similarity → retain lowest similarity = most opposing stance

🧠 Purpose: Ensures retrieved CS holds the opposite view to HS
📤 Output: Ranked CS list based on stance gap

✔️ Status: ✅ Code provided (ready to plug into SEM results)

## ⏳ Section 5: Fitness Filtering (FIT)
Plan: Rank CS candidates based on how well they fit fluently with the HS prompt

Use mT5 or mBART

Compute perplexity of:
"HS. However, I disagree. CS"

Retain fluent, natural completions

🧠 Purpose: Selects counter-speech that flows well in context
📤 Output: CS ranked by fluency

⏳ Status: Not implemented yet

## ⏳ Section 6: Counter Speech Generation
Plan: Use mT5 or mBART to generate new counter-speech:

Input: HS + top-k retrieved CS

Prompt: "However, I disagree. ..." + CS context

Output: New counter-speech

🧠 Purpose: Create dynamic, contextual counter-responses

⏳ Status: Not implemented yet

## ⏳ Section 7: Evaluation
Plan: Automatically or manually score counter-speech responses on:

Metric	Method
Relevance	Cosine sim between HS & CS
Countering	Check if CS contradicts HS (via classifier)
Fluency	Perplexity or grammar model score
Toxicity	Optional: Use multilingual toxicity model

⏳ Status: Planned — to be added after generation



In [None]:
!pip install transformers accelerate
!pip install transformers datasets scikit-learn
!pip install faiss-cpu


In [None]:
import pandas as pd

# Load the dataset
file_path = "GPT_Combined_Original_and_Generated_HT_CS_Dataset - Combined_Original_and_Generated_HT_CS_Dataset.csv.csv"
df = pd.read_csv(file_path)

# Prepare dataset for stance classification (label H/T as "HATE" and CS as "COUNTER")
stance_data = pd.DataFrame({
    "text": df["H/T"].astype(str).tolist() + df["CS"].astype(str).tolist(),
    "label": ["HATE"] * len(df) + ["COUNTER"] * len(df)
})

# Shuffle and reset index
stance_data = stance_data.sample(frac=1.0, random_state=42).reset_index(drop=True)

# Display a sample of the stance classification dataset
# import ace_tools as tools; tools.display_dataframe_to_user(name="Stance Classification Dataset", dataframe=stance_data)


In [None]:
stance_data.head()

Unnamed: 0,text,label
0,"അവരെ കുറ്റപ്പെടുത്തുന്നതിനു മുൻപ്, അവരുടെ ജീവി...",COUNTER
1,"അവരെ തടയണം, പിടികൂടണം. ഇത്തരം സ്വാതന്ത്ര്യം അപ...",HATE
2,വിവിധത്വം അംഗീകരിക്കാൻ നമ്മൾ പഠിക്കേണ്ട സമയം ക...,COUNTER
3,"വ്യത്യസ്തതകൾ സമൂഹത്തിന്റെ സൗന്ദര്യമാണ്, അത് മാ...",COUNTER
4,ഇവരോട് മാന്യതയില്ലാതെ പെരുമാറുന്നത് നമുക്ക് വേ...,COUNTER




In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from datasets import Dataset
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification, Trainer, TrainingArguments
from transformers import DataCollatorWithPadding
import numpy as np
import torch
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

import os
os.environ["WANDB_DISABLED"] = "true"

# Step 1: Load and split the data
df = pd.read_csv(file_path)
data = pd.DataFrame({
    "text": df["H/T"].astype(str).tolist() + df["CS"].astype(str).tolist(),
    "label": [0] * len(df) + [1] * len(df)  # 0 = HATE, 1 = COUNTER
})
data = data.sample(frac=1.0, random_state=42).reset_index(drop=True)
train_df, test_df = train_test_split(data, test_size=0.2, stratify=data["label"])

# Step 2: Tokenize
tokenizer = XLMRobertaTokenizer.from_pretrained("xlm-roberta-base")

def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True)

train_ds = Dataset.from_pandas(train_df)
test_ds = Dataset.from_pandas(test_df)
train_ds = train_ds.map(tokenize_function, batched=True)
test_ds = test_ds.map(tokenize_function, batched=True)

# Step 3: Model and Trainer
model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels=2)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    acc = accuracy_score(labels, predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, predictions, average='binary')
    return {"accuracy": acc, "precision": precision, "recall": recall, "f1": f1}

training_args = TrainingArguments(
    output_dir="./stance_model",
    eval_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=4,
    weight_decay=0.01,
    save_strategy="epoch",
    logging_dir="./logs",
    logging_steps=10,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=test_ds,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

# Step 4: Train
trainer.train()

# Step 5: Save the model
model.save_pretrained("./stance_classifier_xlmroberta")
tokenizer.save_pretrained("./stance_classifier_xlmroberta")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

Map:   0%|          | 0/8160 [00:00<?, ? examples/s]

Map:   0%|          | 0/2040 [00:00<?, ? examples/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0001,0.017722,0.996078,0.999014,0.993137,0.996067
2,0.0001,0.014268,0.996078,1.0,0.992157,0.996063
3,0.0,0.016835,0.995588,0.999013,0.992157,0.995573
4,0.0,0.018843,0.997059,1.0,0.994118,0.99705


('./stance_classifier_xlmroberta/tokenizer_config.json',
 './stance_classifier_xlmroberta/special_tokens_map.json',
 './stance_classifier_xlmroberta/sentencepiece.bpe.model',
 './stance_classifier_xlmroberta/added_tokens.json')

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl (31.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m77.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.11.0


In [None]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
import faiss

# Load your dataset (same one used earlier)
df = pd.read_csv(file_path)

# Step 1: Extract counter-speech sentences
cs_list = df["CS"].astype(str).tolist()

# Step 2: Load LaBSE model
embedder = SentenceTransformer("sentence-transformers/LaBSE")

# Step 3: Compute embeddings for all counter-speech sentences
cs_embeddings = embedder.encode(cs_list, convert_to_numpy=True, show_progress_bar=True)

# Step 4: Build FAISS index
index = faiss.IndexFlatL2(cs_embeddings.shape[1])  # L2 = Euclidean
index.add(cs_embeddings)

# Step 5: Define function to retrieve top-k CS responses for a given HS
def retrieve_topk_semantic(hs_text, top_k=5):
    hs_embedding = embedder.encode([hs_text], convert_to_numpy=True)
    distances, indices = index.search(hs_embedding, top_k)
    retrieved = [cs_list[i] for i in indices[0]]
    return retrieved

# 🔍 Example: Retrieve for a sample hate speech input
hs_sample = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
top_cs_candidates = retrieve_topk_semantic(hs_sample, top_k=5)

# Print retrieved CS
for i, cs in enumerate(top_cs_candidates, 1):
    print(f"{i}. {cs}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Batches:   0%|          | 0/160 [00:00<?, ?it/s]

1. ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.
2. ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.
3. ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.
4. ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.
5. ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.


In [None]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
import faiss
import torch
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
from sklearn.metrics.pairwise import cosine_similarity

# === Load Dataset ===
file_path = "GPT_Combined_Original_and_Generated_HT_CS_Dataset - Combined_Original_and_Generated_HT_CS_Dataset.csv.csv"
df = pd.read_csv(file_path)
cs_list = df["CS"].astype(str).tolist()

# === Section 3: Semantic Retrieval Setup ===
embedder = SentenceTransformer("sentence-transformers/LaBSE")
cs_embeddings = embedder.encode(cs_list, convert_to_numpy=True, show_progress_bar=True)
index = faiss.IndexFlatL2(cs_embeddings.shape[1])
index.add(cs_embeddings)

def retrieve_topk_semantic(hs_text, top_k=10):
    hs_embedding = embedder.encode([hs_text], convert_to_numpy=True)
    distances, indices = index.search(hs_embedding, top_k)
    retrieved = [(cs_list[i], i) for i in indices[0]]
    return retrieved

# === Section 4: Stance Filtering Setup ===
model_path = "./stance_classifier_xlmroberta"
stance_tokenizer = XLMRobertaTokenizer.from_pretrained(model_path)
stance_model = XLMRobertaForSequenceClassification.from_pretrained(model_path)
stance_model.eval()
stance_model.to("cuda" if torch.cuda.is_available() else "cpu")

def get_cls_embedding(text):
    inputs = stance_tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    inputs = {k: v.to(stance_model.device) for k, v in inputs.items()}
    with torch.no_grad():
        outputs = stance_model.base_model(**inputs)
    return outputs.last_hidden_state[:, 0, :].squeeze().cpu().numpy()

def rank_by_stance(hs_text, retrieved_cs, top_k=5):
    hs_vec = get_cls_embedding(hs_text)
    cs_vecs = [get_cls_embedding(cs) for cs, _ in retrieved_cs]
    sims = cosine_similarity([hs_vec], cs_vecs)[0]
    sorted_indices = np.argsort(sims)  # Low = more opposing
    results = []
    for i in sorted_indices[:top_k]:
        results.append({
            "candidate_cs": retrieved_cs[i][0],
            "stance_similarity": sims[i],
            "index_in_dataset": retrieved_cs[i][1]
        })
    return pd.DataFrame(results)

# === Example Usage ===
hs_input = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
sem_candidates = retrieve_topk_semantic(hs_input, top_k=10)
ranked_cs = rank_by_stance(hs_input, sem_candidates, top_k=5)

print(ranked_cs)


Batches:   0%|          | 0/160 [00:00<?, ?it/s]

                                        candidate_cs  stance_similarity  \
0  ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണ...           0.980236   
1  ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണ...           0.980236   
2  ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണ...           0.980237   
3  ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണ...           0.980237   
4  ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണ...           0.980237   

   index_in_dataset  
0              1048  
1              1046  
2              1008  
3              1013  
4              1032  


In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import numpy as np
import math

# Load multilingual model (mT5 or mBART50)
model_name = "google/mt5-small"  # Replace with a finetuned model if available
# Load multilingual model (mT5) for fluency/perplexity
gen_tokenizer = AutoTokenizer.from_pretrained("google/mt5-small")
gen_model = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-small")
gen_model.eval()
gen_model.to("cuda" if torch.cuda.is_available() else "cpu")


# Compute perplexity of a full prompt
def compute_perplexity(prompt):
    encodings = gen_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True, max_length=128)
    input_ids = encodings.input_ids.to(gen_model.device)
    with torch.no_grad():
        outputs = gen_model(input_ids=input_ids, labels=input_ids)
        loss = outputs.loss
    return math.exp(loss.item()) if loss is not None else float("inf")

# Fitness filter: rank CS by lowest perplexity with HS context
def rank_by_fitness(hs_text, cs_candidates, top_k=3):
    results = []
    for cs in cs_candidates:
        prompt = f"{hs_text} അതിനെ എതിർക്കുന്നു: {cs}"
        print(f"\nPrompt for PPL: {prompt}")
        ppl = compute_perplexity(prompt)
        results.append((cs, ppl))
    results.sort(key=lambda x: x[1])  # sort by PPL
    return pd.DataFrame(results[:top_k], columns=["candidate_cs", "perplexity"])


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [None]:
# Assume ranked_cs from Section 4 (STA output)
cs_inputs = ranked_cs["candidate_cs"].tolist()
hs_input = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"

fitness_ranked = rank_by_fitness(hs_input, cs_inputs, top_k=3)
print(fitness_ranked)


Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.



Prompt for PPL: അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല അതിനെ എതിർക്കുന്നു: ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.

Prompt for PPL: അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല അതിനെ എതിർക്കുന്നു: ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.

Prompt for PPL: അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല അതിനെ എതിർക്കുന്നു: ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.

Prompt for PPL: അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല അതിനെ എതിർക്കുന്നു: ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.

Prompt for PPL: അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല അതിനെ എതിർക്കുന്നു: ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണമെന്ന് തീരുമാനിക്കാനുള്ള അവകാശം ഉണ്ട്.
                                        candidate_cs    perplexity
0  ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണ...  1.697433e+14
1  ഓരോ വ്യക്തിക്കും സ്വന്തം ജീവിതം എങ്ങനെ നയിക്കണ...  1.69743

## Generation

In [None]:
# from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# # Load mT5 (or any multilingual seq2seq model)
# gen_model_name = "google/mt5-small"
# gen_tokenizer = AutoTokenizer.from_pretrained(gen_model_name)
# gen_model = AutoModelForSeq2SeqLM.from_pretrained(gen_model_name)
# gen_model.to(device)
# gen_model.eval()

# # Function to generate counter-speech
# def generate_counter_speech(hs_text, top_cs_list, max_new_tokens=75):
#     context = " ".join([f"{cs}" for cs in top_cs_list])
#     prompt = f"Generate a respectful response that disagrees with the following hate speech:\n\"{hs_text}\"\nHere are some positive counterpoints:\n{context}\nResponse:"

#     inputs = gen_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True, max_length=512).to(device)
#     output = gen_model.generate(
#         **inputs,
#         max_new_tokens=max_new_tokens,
#         do_sample=True,
#         top_p=0.9,
#         no_repeat_ngram_size=2
#     )
#     return gen_tokenizer.decode(output[0], skip_special_tokens=True)



In [None]:
# from transformers import AutoTokenizer, AutoModelForCausalLM
# import torch

# # Load tokenizer and model
# tokenizer = AutoTokenizer.from_pretrained("VinkuraAI/KunoRZN-Llama-3-3B")
# model = AutoModelForCausalLM.from_pretrained(
#     "VinkuraAI/KunoRZN-Llama-3-3B",
#     torch_dtype=torch.float16,
#     device_map="auto"
#     # attn_implementation="flash_attention_2"
# )

# # Sample input
# hs_text = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
# top_cs = [
#     "ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്.",
#     "അവർക്കും മനുഷ്യത്വം ഉണ്ട്.",
#     "പ്രീതി ഒരു കുറ്റമല്ല."
# ]

# # Construct chat-style message
# context = " ".join(top_cs)
# messages = [
#     {"role": "system", "content": "You are a respectful assistant that generates counter-speech against harmful or hateful messages, especially in Malayalam and Manglish."},
#     {"role": "user", "content": f"Hate speech: {hs_text}\nCounterpoints: {context}\nPlease respond with a calm and respectful counter-speech."}
# ]

# # Tokenize and generate
# input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to(model.device)
# output_ids = model.generate(input_ids, max_new_tokens=150, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)

# # Decode and print response
# response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
# print("Generated Counter-Speech:\n", response)


In [None]:
# from transformers import AutoTokenizer, AutoModelForCausalLM
# import torch


# # Load tokenizer and model
# model_name = "NousResearch/Meta-Llama-3-8B-Instruct"
# tokenizer = AutoTokenizer.from_pretrained(model_name)
# model = AutoModelForCausalLM.from_pretrained(
#     model_name,
#     torch_dtype=torch.float16,
#     device_map="auto"
# )
# model.eval()

# # Hate speech input and top-k retrieved CS (from Section 5)
# hs_input = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
# top_cs = [
#     "ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്.",
#     "അവർക്കും മനുഷ്യത്വം ഉണ്ട്.",
#     "പ്രീതി ഒരു കുറ്റമല്ല."
# ]

# # Format into a chat message
# context = " ".join(top_cs)
# system_message = "You are a helpful assistant that generates respectful, multilingual counter-speech against hate or discrimination."
# user_message = f"Hate speech: \"{hs_input}\"\nSupporting counterpoints: {context}\nNow respond with a respectful, fluent counter-speech."

# # Tokenize and generate
# messages = [
#     {"role": "system", "content": system_message},
#     {"role": "user", "content": user_message}
# ]

# input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to(model.device)
# output_ids = model.generate(
#     input_ids,
#     max_new_tokens=200,
#     temperature=0.7,
#     top_p=0.95,
#     repetition_penalty=1.1,
#     do_sample=True,
#     eos_token_id=tokenizer.eos_token_id
# )

# response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
# print("🗣️ Generated Counter-Speech:\n", response)


## Zephyr Model

In [None]:
# from transformers import AutoTokenizer, AutoModelForCausalLM
# import torch

# # Load Zephyr 7B model (lightweight and instruction-tuned)
# zephyr_model_name = "HuggingFaceH4/zephyr-7b-beta"
# zephyr_tokenizer = AutoTokenizer.from_pretrained(zephyr_model_name)
# zephyr_model = AutoModelForCausalLM.from_pretrained(
#     zephyr_model_name,
#     torch_dtype=torch.float16,
#     device_map="auto"
# )
# zephyr_model.eval()

# # Function to generate counter-speech using Zephyr
# def generate_counter_speech_zephyr(hs_text, top_cs_list, max_new_tokens=150):
#     context = " ".join(top_cs_list)

#     messages = [
#         {
#             "role": "system",
#             "content": "You are a respectful and multilingual assistant who writes calm, fluent counter-speech against hateful or discriminatory statements, especially in Malayalam and Manglish. Your tone should be empathetic, polite, and constructive."
#         },
#         {
#             "role": "user",
#             "content": f"""
# Hateful Statement (in Malayalam):
# \"{hs_text}\"

# Supporting Counterpoints:
# {context}

# Now write a fluent, respectful, polite, non-aggressive and informed response in Malayalam that opposes the hateful statement using the above counterpoints.
# """
#         }
#     ]

#     # Apply chat template and generate response
#     input_ids = zephyr_tokenizer.apply_chat_template(
#         messages,
#         tokenize=True,
#         add_generation_prompt=True,
#         return_tensors='pt'
#     ).to(zephyr_model.device)

#     output_ids = zephyr_model.generate(
#         input_ids,
#         max_new_tokens=max_new_tokens,
#         temperature=0.7,
#         top_p=0.95,
#         do_sample=True,
#         repetition_penalty=1.1,
#         eos_token_id=zephyr_tokenizer.eos_token_id
#     )


#     response = zephyr_tokenizer.decode(output_ids[0], skip_special_tokens=True)
#     return response


from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load Zephyr 7B model (lightweight and instruction-tuned)
zephyr_model_name = "HuggingFaceH4/zephyr-7b-beta"
zephyr_tokenizer = AutoTokenizer.from_pretrained(zephyr_model_name)
zephyr_model = AutoModelForCausalLM.from_pretrained(
    zephyr_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
zephyr_model.eval()

# Function to generate counter-speech using Zephyr
def generate_counter_speech_zephyr(hs_text, top_cs_list, max_new_tokens=150, use_manglish=True):
    context = " ".join(top_cs_list)

    # Language control: Malayalam or Manglish instruction
    language_instruction = (
        "Now write a respectful counter-speech in Manglish (Malayalam using English letters)."
        if use_manglish else
        "Now write a respectful counter-speech in Malayalam script."
    )

    messages = [
        {
            "role": "system",
            "content": (
                "You are a respectful and multilingual assistant who writes calm, fluent counter-speech "
                "against hateful or discriminatory statements, especially in Malayalam and Manglish. "
                "Your tone should be empathetic, polite, and constructive."
            )
        },
        {
            "role": "user",
            "content": f"""
Hateful Statement (Malayalam):
\"{hs_text}\"

Supporting Counterpoints:
{context}

Now write a fluent, respectful, polite, non-aggressive and informed response in Manglish (Malayalam in English) that opposes the hateful statement using the above counterpoints.

"""
        }
    ]

    # Tokenize using chat template
    inputs = zephyr_tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors='pt'
    ).to(zephyr_model.device)

    # Add attention mask
    attention_mask = inputs != zephyr_tokenizer.pad_token_id

    # Generate response with proper config
    output_ids = zephyr_model.generate(
        input_ids=inputs,
        attention_mask=attention_mask,
        max_new_tokens=max_new_tokens,
        temperature=0.7,
        top_p=0.95,
        do_sample=True,
        repetition_penalty=1.1,
        pad_token_id=zephyr_tokenizer.eos_token_id,
        eos_token_id=zephyr_tokenizer.eos_token_id
    )

    # Decode and clean output
    response = zephyr_tokenizer.decode(output_ids[0], skip_special_tokens=True)
    response = response.split("<|assistant|>")[-1].strip()  # Remove system prompt if present
    return response



model.safetensors:   0%|          | 0.00/1.20G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not in

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

KeyboardInterrupt: 

In [None]:
# Example HS and filtered CS from Section 5
hs_input_zephyr = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
top_cs_list_zephyr = [
    "ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്.",
    "അവർക്കും മനുഷ്യത്വം ഉണ്ട്.",
    "പ്രീതി ഒരു കുറ്റമല്ല."
]

# Generate the response
final_response_zephyr = generate_counter_speech_zephyr(hs_input_zephyr, top_cs_list_zephyr)
print("🗣️ Generated Counter-Speech:\n", final_response_zephyr)


## MBART

In [None]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
import torch

mbart_model_name = "facebook/mbart-large-50-many-to-many-mmt"
mbart_tokenizer = MBart50TokenizerFast.from_pretrained(mbart_model_name)
mbart_model = MBartForConditionalGeneration.from_pretrained(mbart_model_name)
mbart_model.to("cuda" if torch.cuda.is_available() else "cpu")

# Set source and target language to Malayalam
mbart_tokenizer.src_lang = "ml_IN"
# tokenizer.src_lang = "ml_IN"  # ✅ Source language = Malayalam
forced_bos_token_id = mbart_tokenizer.lang_code_to_id["ml_IN"]  # ✅ Target = Malayalam



In [None]:
def generate_mbart_counterspeech_malayalam(hs_text, top_cs_list, max_length=128):
    context = " ".join(top_cs_list)

    # Better instruction-based Malayalam prompt
    prompt = (
       f"ദ്വേഷപരമായ പ്രസ്താവന: \"{hs_text}\"\n"
    f"പിന്തുണയുള്ള വാക്കുകള്‍: {context}\n"
    f"ഈ പ്രസ്താവനയ്ക്ക് എതിരായ മാന്യമായ മറുപടി എഴുതുക."
    )

    inputs = mbart_tokenizer(prompt, return_tensors="pt").to(mbart_model.device)
    forced_bos_token_id = mbart_tokenizer.lang_code_to_id["ml_IN"]

    outputs = mbart_model.generate(
        **inputs,
        forced_bos_token_id=forced_bos_token_id,
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.85,
        top_p=0.95,
        repetition_penalty=1.1,
        no_repeat_ngram_size=2
    )

    return mbart_tokenizer.decode(outputs[0], skip_special_tokens=True).strip()


In [None]:
hs_input = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
top_cs_list = [
    "ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്.",
    "അവർക്കും മനുഷ്യത്വം ഉണ്ട്.",
    "പ്രീതി ഒരു കുറ്റമല്ല."
]

response = generate_mbart_counterspeech_malayalam(hs_input, top_cs_list)
print("🗣️ Generated Counter-Speech (Malayalam):\n", response)


🗣️ Generated Counter-Speech (Malayalam):
 വേഷികമായ പ്രസ്താവന: "അവര് ക്ക് വിവാഹം ചെയ്യാന് അവകാശമില്ല" Supporting Words: Everyone has the right to lead their lives.


## VinkuraAI/KunoRZN-Llama-3-3B

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load KunoRZN model and tokenizer
kunoRZN_model_id = "VinkuraAI/KunoRZN-Llama-3-3B"
kunoRZN_tokenizer = AutoTokenizer.from_pretrained(kunoRZN_model_id)
kunoRZN_model = AutoModelForCausalLM.from_pretrained(
    kunoRZN_model_id,
    torch_dtype=torch.float16,    # Use float16 for efficiency
    device_map="auto"             # Automatically map to GPU/CPU
)
kunoRZN_model.eval()

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:  35%|###4      | 2.67G/7.64G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/264 [00:00<?, ?B/s]

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 3072, padding_idx=128004)
    (layers): ModuleList(
      (0-27): 28 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (k_proj): Linear(in_features=3072, out_features=1024, bias=False)
          (v_proj): Linear(in_features=3072, out_features=1024, bias=False)
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=3072, out_features=8192, bias=False)
          (up_proj): Linear(in_features=3072, out_features=8192, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((3072,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((3072,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((3072,), eps=1e

In [None]:
def generate_counterspeech_kuno(hs_text, top_cs_list, max_new_tokens=250):
    context = " ".join(top_cs_list)

    # Updated prompt with clear instruction on tone
    messages = [
        {
            "role": "system",
            "content": (
                "You are a knowledgeable and respectful assistant. Your job is to respond to hateful or "
                "discriminatory statements with calm, polite, well-informed counter-speech in Malayalam. "
                "Always use a non-aggressive and empathetic tone."
            )
        },
        {
            "role": "user",
            "content": f"""
ദ്വേഷപരമായ പ്രസ്താവന:
\"{hs_text}\"

പിന്തുണയുള്ള വാക്കുകൾ:
{context}

ഈ പ്രസ്താവനയ്ക്ക് എതിരായ ശാന്തമായ, മാന്യമായ, അക്രമരഹിതമായ മറുപടി മലയാളത്തിൽ എഴുതുക. വസ്തുനിഷ്ഠമായതും വിവരസമൃദ്ധമായതുമാകണം.
"""
        }
    ]

    input_ids = kunoRZN_tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(kunoRZN_model.device)

    output_ids = kunoRZN_model.generate(
        input_ids,
        max_new_tokens=max_new_tokens,
        temperature=0.7,
        top_p=0.95,
        do_sample=True,
        repetition_penalty=1.1,
        eos_token_id=kunoRZN_tokenizer.eos_token_id
    )

    response = kunoRZN_tokenizer.decode(output_ids[0], skip_special_tokens=True)

    # Split only at <|assistant|> and remove system/user text
    if "<|assistant|>" in response:
        response = response.split("<|assistant|>")[-1].strip()

    return response



In [None]:
def generate_counterspeech_kuno1(hs_text, top_cs_list, max_new_tokens=250):
    context = " ".join(top_cs_list)

    # Clear and culturally aligned instruction
    messages = [
        {
            "role": "system",
            "content": (
                "നീ കരുതലോടെയും മാന്യതയോടെയും സംസാരിക്കുന്ന മലയാള സഹായിയാണ്. "
                "വിദ്വേഷപ്രകാരമായ പ്രസ്താവനകൾക്ക് നമ്രതയും ബുദ്ധിമുട്ടുകളെ മനസ്സിലാക്കുന്ന മനോഭാവവുമുള്ള മറുപടികളാണ് നീ നൽകേണ്ടത്. "
                "മറുപടി ശാന്തമായതും മാന്യമായതും ആയിരിക്കണം."
            )
        },
        {
            "role": "user",
            "content": f"""
വിദ്വേഷ പ്രസ്താവനം:
\"{hs_text}\"

സഹായകമായ ആശയങ്ങൾ:
{context}

ഈ പ്രസ്താവനയ്ക്ക് എതിരായ, വിവരസമൃദ്ധമായ, ശാന്തമായ മലയാള മറുപടി എഴുതുക. മറുപടി വ്യക്തതയോടെയും മാന്യതയോടെയും തീർച്ചയായിരിക്കണം.
"""
        }
    ]

    # Tokenize and generate
    input_ids = kunoRZN_tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(kunoRZN_model.device)

    output_ids = kunoRZN_model.generate(
        input_ids,
        max_new_tokens=max_new_tokens,
        temperature=0.7,
        top_p=0.95,
        do_sample=True,
        repetition_penalty=1.1,
        eos_token_id=kunoRZN_tokenizer.eos_token_id
    )

    response = kunoRZN_tokenizer.decode(output_ids[0], skip_special_tokens=True)
    # Clean response: extract only assistant's output
    if "<|assistant|>" in response:
        response = response.split("<|assistant|>")[-1].strip()
    elif "<|user|>" in response:
        response = response.split("<|user|>")[-1].strip()
    else:
        # Fallback: remove known prefixes like 'system' if present
        for token in ["system", "user"]:
            if response.lower().startswith(token):
                response = response.partition("\n")[-1].strip()
    return response



In [None]:
hs_input = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
top_cs_list = [
    "ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്.",
    "അവർക്കും മനുഷ്യത്വം ഉണ്ട്.",
    "പ്രീതി ഒരു കുറ്റമല്ല."
]

output = generate_counterspeech_kuno1(hs_input, top_cs_list)
print("🗣️ Generated Counter-Speech:\n", output)


🗣️ Generated Counter-Speech:
 നീ一个 കരുതലോടെയും മാന്യതയോടെയും സംസാരിക്കുന്ന മലയാള സഹായിയാണ്. വിദ്വേഷപ്രകാരമായ പ്രസ്താവനകൾക്ക് നമ്രതയും ബുദ്ധിമുട്ടുകളെ മനസ്സിലാക്കുന്ന മനോഭാവവുമുള്ള മറുപടികളാണ് നീ നൽകേണ്ടത്. മറുപടി ശാന്തമായതും മാന്യമായതും ആയിരിക്കണം.user

വിദ്വേഷ പ്രസ്താവനം:
"അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"

സഹായകമായ ആശയങ്ങൾ:
ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്. അവർക്കും മനുഷ്യത്വം ഉണ്ട്. പ്രീതി ഒരു കുറ്റമല്ല.

ഈ പ്രസ്താവനയ്ക്ക് എതിരായ, വിവരസമൃദ്ധമായ, ശാന്തമായ മലയാള മറുപടി എഴുതുക. മറുപടി വ്യക്തതയോടെയും മാന്യതയോടെയും തീർച്ചയായിരിക്കണം.assistant

അവിടെയുള്ള പ്രസ്താവനയെ മനസ്സിലാക്കി മറുപടി അനുവദിക്കുന്നുണ്ട്. 

"ഞങ്ങളുടെ ജീവിതത്തിൽ വിവാഹം നടത്തുന്നത് മനുഷ്യത്വത്തിന്റെ അടിസ്ഥാനത്തിൽ ആണ്. അവർക്കും തങ്ങള�


## open chat

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load OpenChat model
open_model_name = "OpenChat/openchat-3.5-1210"
open_tokenizer = AutoTokenizer.from_pretrained(open_model_name)
open_model = AutoModelForCausalLM.from_pretrained(
    open_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
open_model.eval()

# Function to generate counter-speech using OpenChat
def generate_counter_speech_openchat(hs_text, top_cs_list, max_new_tokens=150):
    context = " ".join(top_cs_list)

    messages = [
        {
            "role": "system",
            "content": "You are a polite, multilingual assistant that writes calm and respectful counter-speech in response to hateful or discriminatory statements. Use Malayalam or Manglish appropriately."
        },
        {
            "role": "user",
            "content": f"""
Hateful Statement:
\"{hs_text}\"

Supporting Counterpoints:
{context}

Now write a respectful and fluent counter-speech that responds to the hateful statement using the counterpoints.
"""
        }
    ]

    # Apply chat template and generate
    input_ids = open_tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors='pt'
    ).to(open_model.device)

    output_ids = open_model.generate(
        input_ids,
        max_new_tokens=max_new_tokens,
        temperature=0.7,
        top_p=0.95,
        do_sample=True,
        repetition_penalty=1.1,
        eos_token_id=open_tokenizer.eos_token_id
    )

    response = open_tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return response


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



In [None]:
hs_input = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
top_cs = [
    "ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്.",
    "അവർക്കും മനുഷ്യത്വം ഉണ്ട്.",
    "പ്രീതി ഒരു കുറ്റമല്ല."
]

response = generate_counter_speech_openchat(hs_input, top_cs)
print("🗣️ Generated Counter-Speech:\n", response)


🗣️ Generated Counter-Speech:
 GPT4 Correct System: You are a polite, multilingual assistant that writes calm and respectful counter-speech in response to hateful or discriminatory statements. Use Malayalam or Manglish appropriately. GPT4 Correct User: 
Hateful Statement:
"അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"

Supporting Counterpoints:
ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്. അവർക്കും മനുഷ്യത്വം ഉണ്ട്. പ്രീതി ഒരു കുറ്റമല്ല.

Now write a respectful and fluent counter-speech that responds to the hateful statement using the counterpoints.
 GPT4 Correct Assistant: സന്തോഷത്തിന്റെ അവധികമായ പങ്കാടിയുടെ അഭിപ്രായവും എഴുപത്‌ചേര്‍ന്നതി��


In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

gen_tokenizer = AutoTokenizer.from_pretrained("google/mt5-small")
gen_model = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-small")
gen_model.eval()
gen_model.to("cuda" if torch.cuda.is_available() else "cpu")


## Evaluation Metrics

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Relevance (semantic similarity between HS and generated CS)
def compute_relevance(hs_text, generated_cs):
    hs_vec = embedder.encode([hs_text])
    gen_vec = embedder.encode([generated_cs])
    return cosine_similarity(hs_vec, gen_vec)[0][0]

# Stance Score (SROC) — using your stance classifier
def predict_stance(text):
    inputs = stance_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(stance_model.device)
    with torch.no_grad():
        logits = stance_model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    label = torch.argmax(probs, dim=-1).item()
    return "COUNTER" if label == 1 else "HATE", probs[0][1].item()


# Fluency (Perplexity from mT5)
def compute_fluency(text):
    inputs = kunoRZN_tokenizer(text, return_tensors="pt", truncation=True).to(kunoRZN_model.device)
    with torch.no_grad():
        outputs = kunoRZN_model(input_ids=inputs["input_ids"], labels=inputs["input_ids"])
        loss = outputs.loss
    return np.exp(loss.item())



In [None]:
hs_input = "അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"
generated_cs = output
cleaned_cs = generated_cs.strip()
if "system" in cleaned_cs.lower():
    cleaned_cs = cleaned_cs.split("user")[-1].strip()
print(cleaned_cs)

# print("🔹 Relevance:", compute_relevance(hs_input, generated_cs))
# stance_label, stance_score = predict_stance(generated_cs)
# print("🔹 Stance:", stance_label, "(confidence =", stance_score, ")")
# print("🔹 Fluency (PPL):", compute_fluency(generated_cs))



ദ്വേഷപരമായ പ്രസ്താവന:
"അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"

പിന്തുണയുള്ള വാക്കുകൾ:
ഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്. അവർക്കും മനുഷ്യത്വം ഉണ്ട്. പ്രീതി ഒരു കുറ്റമല്ല.

ഈ പ്രസ്താവനയ്ക്ക് എതിരായ ശാന്തമായ, മാന്യമായ, അക്രമരഹിതമായ മറുപടി മലയാളത്തിൽ എഴുതുക. വസ്തുനിഷ്ഠമായതും വിവരസമൃദ്ധമായതുമാകണം.assistant

ഞാൺ കരുതുക്കുന്നുവെ, എല്ലാ വ്യക്തിയും അവരുടെ ജീവിതം നയിക്കാൻ അവകാശമുണ്ട്. അവർക്കും സ്വന്തമായ ഭാവനകൾ, അഭിപ്രായങ്ങൾ, പ്രതിജ്ഞകൾ ഉണ്ടായിരിക്കുകയും അവ അവരു


In [None]:
def evaluate_generated_output(hs_text, generated_cs):
    rel = compute_relevance(hs_text, generated_cs)
    stance, sroc = predict_stance(generated_cs)
    fluency = compute_fluency(generated_cs)

    return {
        "HS": hs_text,
        "Generated_CS": generated_cs,
        "Relevance": rel,
        "SROC_Prob": sroc,
        "SROC_Label": stance,
        "Fluency": fluency
    }
metrics = evaluate_generated_output(hs_input, output)
print(metrics)

{'HS': 'അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല', 'Generated_CS': 'system\n\nYou are a knowledgeable and respectful assistant. Your job is to respond to hateful or discriminatory statements with calm, polite, well-informed counter-speech in Malayalam. Always use a non-aggressive and empathetic tone.user\n\nദ്വേഷപരമായ പ്രസ്താവന:\n"അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല"\n\nപിന്തുണയുള്ള വാക്കുകൾ:\nഓരോർക്കും തങ്ങളുടെ ജീവിതം നയിക്കാൻ അവകാശം ഉണ്ട്. അവർക്കും മനുഷ്യത്വം ഉണ്ട്. പ്രീതി ഒരു കുറ്റമല്ല.\n\nഈ പ്രസ്താവനയ്ക്ക് എതിരായ ശാന്തമായ, മാന്യമായ, അക്രമരഹിതമായ മറുപടി മലയാളത്തിൽ എഴുതുക. വസ്തുനിഷ്ഠമായതും വിവരസമൃദ്ധമായതുമാകണം.assistant\n\nഞാൺ കരുതുക്കുന്നുവെ, എല്ലാ വ്യക്തിയും അവരുടെ ജീവിതം നയിക്കാൻ അവകാശമുണ്ട്. അവർക്കും സ്വന്തമായ ഭാവനകൾ, അഭിപ്രായങ്ങൾ, പ്രതിജ്ഞകൾ ഉണ്ടായിരിക്കുകയും അവ അവരു', 'Relevance': np.float32(0.35689977), 'SROC_Prob': 0.9999895095825195, 'SROC_Label': 'COUNTER', 'Fluency': np.float64(2.081253377018097)}


In [None]:
# Re-import required libraries after kernel reset
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer

# # Example data
# hs_examples = list(hs_input)

# cs_generated = list(output)

# training_cs = list(top_cs_list)

# Convert to list if accidentally a string
if isinstance(hs_input, str):
    hs_examples = [hs_input]

if isinstance(output, str):
    cs_generated = [output]

if isinstance(top_cs_list, str):
    training_cs = [top_cs_list]


# Load LaBSE for embedding-based relevance and novelty
embedder = SentenceTransformer("sentence-transformers/LaBSE")

def compute_relevance(hs_text, generated_cs):
    hs_vec = embedder.encode([hs_text])
    gen_vec = embedder.encode([generated_cs])
    return cosine_similarity(hs_vec, gen_vec)[0][0]

def compute_novelty(generated_cs, all_train_cs):
    gen_vec = embedder.encode([generated_cs])
    train_vecs = embedder.encode(all_train_cs)
    sims = cosine_similarity(gen_vec, train_vecs)[0]
    return 1 - np.max(sims)  # Higher = more novel

# Prepare results
results = []
for hs, cs in zip(hs_examples, cs_generated):
    relevance = compute_relevance(hs, cs)
    novelty = compute_novelty(cs, training_cs)
    results.append({
        "hate_speech": hs,
        "generated_cs": cs,
        "relevance": relevance,
        "novelty": novelty
    })

df_eval = pd.DataFrame(results)
print(df_eval)

# import ace_tools as tools; tools.display_dataframe_to_user(name="Evaluation Metrics (Relevance & Novelty)", dataframe=df_eval)


                              hate_speech  \
0  അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല   

                                        generated_cs  relevance   novelty  
0  നീ一个 കരുതലോടെയും മാന്യതയോടെയും സംസാരിക്കുന്ന മ...    0.34249  0.587758  


In [None]:
import pandas as pd
import numpy as np
import torch
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("sentence-transformers/LaBSE")


if isinstance(hs_input, str): hs_examples = [hs_input]
else: hs_examples = hs_input

if isinstance(cleaned_cs, str): cs_generated = [cleaned_cs]
else: cs_generated = cleaned_cs

if isinstance(top_cs_list, str): training_cs = [top_cs_list]
else: training_cs = top_cs_list

#  Metric Functions
def compute_relevance(hs_text, generated_cs):
    hs_vec = embedder.encode([hs_text])
    gen_vec = embedder.encode([generated_cs])
    return cosine_similarity(hs_vec, gen_vec)[0][0]

def compute_novelty(generated_cs, all_train_cs):
    gen_vec = embedder.encode([generated_cs])
    train_vecs = embedder.encode(all_train_cs)
    sims = cosine_similarity(gen_vec, train_vecs)[0]
    return 1 - np.max(sims)

def predict_stance(text):
    inputs = stance_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(stance_model.device)
    with torch.no_grad():
        logits = stance_model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    label = torch.argmax(probs, dim=-1).item()
    return "COUNTER" if label == 1 else "HATE", probs[0][1].item()

def compute_fluency(text):
    inputs = kunoRZN_tokenizer(text, return_tensors="pt", truncation=True).to(kunoRZN_model.device)
    with torch.no_grad():
        outputs = kunoRZN_model(input_ids=inputs["input_ids"], labels=inputs["input_ids"])
        loss = outputs.loss
    return np.exp(loss.item())

# ✅ Evaluation Loop
results = []
for hs, cs in zip(hs_examples, cs_generated):
    relevance = compute_relevance(hs, cs)
    novelty = compute_novelty(cs, training_cs)
    stance_label, stance_conf = predict_stance(cs)
    fluency = compute_fluency(cs)

    results.append({
        "hate_speech": hs,
        "generated_cs": cs,
        "relevance": relevance,
        "novelty": novelty,
        "stance": stance_label,
        "stance_confidence": stance_conf,
        "fluency_ppl": fluency
    })

df_eval = pd.DataFrame(results)
# print(df_eval)
df_eval.head()

Unnamed: 0,hate_speech,generated_cs,relevance,novelty,stance,stance_confidence,fluency_ppl
0,അവർക്ക് വിവാഹം കഴിക്കാനുള്ള അവകാശമില്ല,"ദ്വേഷപരമായ പ്രസ്താവന:\n""അവർക്ക് വിവാഹം കഴിക്കാ...",0.535429,0.443676,COUNTER,0.99999,1.785187


In [None]:
!zip -r stance_classifier_xlmroberta.zip stance_classifier_xlmroberta


  adding: stance_classifier_xlmroberta/ (stored 0%)
  adding: stance_classifier_xlmroberta/sentencepiece.bpe.model (deflated 49%)
  adding: stance_classifier_xlmroberta/model.safetensors (deflated 29%)
  adding: stance_classifier_xlmroberta/tokenizer_config.json (deflated 76%)
  adding: stance_classifier_xlmroberta/config.json (deflated 51%)
  adding: stance_classifier_xlmroberta/special_tokens_map.json (deflated 52%)


In [None]:
from google.colab import files
files.download("stance_classifier_xlmroberta.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# ReZG Logic Chatbot (Multiple Interactions)

import torch
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
from transformers import (
    XLMRobertaTokenizer, XLMRobertaForSequenceClassification,
    AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM
)
from sklearn.metrics.pairwise import cosine_similarity
import math
import faiss

# Load data and models
file_path = "GPT_Combined_Original_and_Generated_HT_CS_Dataset - Combined_Original_and_Generated_HT_CS_Dataset.csv.csv"
df = pd.read_csv(file_path)
cs_list = df["CS"].astype(str).tolist()

embedder = SentenceTransformer("sentence-transformers/LaBSE")
cs_embeddings = embedder.encode(cs_list, convert_to_numpy=True, show_progress_bar=True)
index = faiss.IndexFlatL2(cs_embeddings.shape[1])
index.add(cs_embeddings)

model_path = "stance_classifier_xlmroberta"
stance_tokenizer = XLMRobertaTokenizer.from_pretrained(model_path)
stance_model = XLMRobertaForSequenceClassification.from_pretrained(model_path)
stance_model.eval().to("cuda" if torch.cuda.is_available() else "cpu")

gen_tokenizer = AutoTokenizer.from_pretrained("google/mt5-small")
gen_model = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-small")
gen_model.eval().to("cuda" if torch.cuda.is_available() else "cpu")

chat_tokenizer = AutoTokenizer.from_pretrained("OpenChat/openchat-3.5-1210")
chat_model = AutoModelForCausalLM.from_pretrained(
    "OpenChat/openchat-3.5-1210",
    torch_dtype=torch.float16,
    device_map="auto"
)
chat_model.eval()

def retrieve_topk_semantic(hs_text, top_k=10):
    hs_embedding = embedder.encode([hs_text], convert_to_numpy=True)
    distances, indices = index.search(hs_embedding, top_k)
    return [cs_list[i] for i in indices[0]]

def get_cls_embedding(text):
    inputs = stance_tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    inputs = {k: v.to(stance_model.device) for k, v in inputs.items()}
    with torch.no_grad():
        outputs = stance_model.base_model(**inputs)
    return outputs.last_hidden_state[:, 0, :].squeeze().cpu().numpy()

def rank_by_stance(hs_text, candidates, top_k=5):
    hs_vec = get_cls_embedding(hs_text)
    cs_vecs = [get_cls_embedding(cs) for cs in candidates]
    sims = cosine_similarity([hs_vec], cs_vecs)[0]
    ranked = [x for _, x in sorted(zip(sims, candidates), key=lambda x: x[0])][:top_k]  # most opposing
    return ranked

def compute_perplexity(prompt):
    encodings = gen_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True, max_length=128)
    input_ids = encodings.input_ids.to(gen_model.device)
    with torch.no_grad():
        outputs = gen_model(input_ids=input_ids, labels=input_ids)
        loss = outputs.loss
    return math.exp(loss.item()) if loss is not None else float("inf")

def rank_by_fitness(hs_text, cs_candidates, top_k=3):
    scored = [(cs, compute_perplexity(f"{hs_text} അതിനെ എതിർക്കുന്നു: {cs}")) for cs in cs_candidates]
    return [x for x, _ in sorted(scored, key=lambda x: x[1])[:top_k]]

def generate_counter_speech(hs_text, top_cs_list):
    context = " ".join(top_cs_list)
    messages = [
        {"role": "system", "content": "You are a polite, multilingual assistant that writes calm and respectful counter-speech in response to hateful or discriminatory statements. Use Malayalam or Manglish appropriately."},
        {"role": "user", "content": f"""
Hateful Statement:
\"{hs_text}\"

Supporting Counterpoints:
{context}

Now write a respectful and fluent counter-speech that responds to the hateful statement using the counterpoints.
"""}]
    input_ids = chat_tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to(chat_model.device)
    output_ids = chat_model.generate(input_ids, max_new_tokens=150, temperature=0.7, top_p=0.95, do_sample=True, repetition_penalty=1.1, eos_token_id=chat_tokenizer.eos_token_id)
    return chat_tokenizer.decode(output_ids[0], skip_special_tokens=True)

# === Multi-turn chatbot ===
print("\n🔁 ReZG Counter-Speech Chatbot (type 'quit' to exit)")

while True:
    hs_input = input("\n👤 Enter a hate statement in Malayalam:\n> ").strip()
    if hs_input.lower() == "quit":
        print("👋 Goodbye!")
        break

    try:
        sem_candidates = retrieve_topk_semantic(hs_input)
        stance_ranked = rank_by_stance(hs_input, sem_candidates)
        fitness_top = rank_by_fitness(hs_input, stance_ranked)
        cn = generate_counter_speech(hs_input, fitness_top)

        print("\n🤖 Counter-Speech Response:\n")
        print(cn)
    except Exception as e:
        print(f"⚠️ Error: {e}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/461 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/2.02k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/804 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.88G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/397 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/5.22M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.62M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/114 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.36M [00:00<?, ?B/s]

Batches:   0%|          | 0/160 [00:00<?, ?it/s]

OSError: stance_classifier_xlmroberta is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

# !cp -r stance_classifier_xlmroberta /content/drive/MyDrive/

MessageError: Error: credential propagation was unsuccessful