# SUBTASK 2 : Polarization Type Classification

Overview : In this notbook the social media texts type or target polarization is classified as follows:
1. Political/ideological polarization
2. Racial or ethnic polarization
3. Religious polarization
4. Gender polarization
5. Other

### Experiment 1: Robust Baseline.
- Establishes a stable training pipeline by fixing critical data issues (NaN handling, custom 80/20 splits)
- implementing Focal Loss with Inverse-Frequency Class Weights to prevent the model from ignoring minority classes.

### Experiment 2: Proposal-Aligned ("Turbo").
Implements the core "Learning by Contrast" novelty.
It utilizes Supervised Contrastive Learning (SCL) with a projection head for joint optimization (Loss = Focal + 0.5 * SCL)
- Applies Dynamic Thresholding to automatically calculate the optimal probability cutoff for each specific class.

### Experiment 3: Scale Comparison & Inference.
Benchmarks XLM-R Base vs. Large (using gradient accumulation for memory efficiency) and includes a multilingual inference pipeline to visualize model confidence on real-world text.

# 1. EXPERIMENTATION 1 - INITIAL CODE TRAINING

## 1. Loading Necessary Libraries

In [2]:
# ============================================================================
# FINAL SUBTASK 2 CODE: PROPOSAL-ALIGNED + VALID SPLIT FIX
# ============================================================================

# 1. INSTALL LIBRARIES
!pip install transformers datasets scikit-learn pandas numpy torch accelerate -q

# 2. IMPORTS
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from transformers import XLMRobertaTokenizer, XLMRobertaModel, get_linear_schedule_with_warmup
from torch.optim import AdamW
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from tqdm.auto import tqdm
import warnings
import zipfile
import glob
import os

## 2. Configuration and Data Loading

In [3]:
warnings.filterwarnings('ignore')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# 3. LOAD DATA (ONLY TRAIN FOLDER)
from google.colab import files

if os.path.exists("dataset_extracted"):
    print("Data folder found. Skipping upload.")
else:
    print("Please upload your ZIP file containing the 'train' and 'dev' folders:")
    uploaded = files.upload()
    if len(uploaded) > 0:
        zip_filename = list(uploaded.keys())[0]
        print(f"Extracting {zip_filename}...")
        with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
            zip_ref.extractall("dataset_extracted")
    else:
        raise ValueError("No file uploaded.")

def load_folder_csvs(folder_path):
    all_files = glob.glob(os.path.join(folder_path, "**/*.csv"), recursive=True)
    if not all_files: return pd.DataFrame()
    df_list = []
    for f in all_files:
        try:
            df = pd.read_csv(f)
            df.columns = [c.lower() for c in df.columns]
            df_list.append(df)
        except: pass
    return pd.concat(df_list, axis=0, ignore_index=True) if df_list else pd.DataFrame()

print("\nLoading Training Data...")
# We ONLY load the train folder because the dev folder has no labels
train_dirs = glob.glob("dataset_extracted/**/train", recursive=True)
full_train_df = load_folder_csvs(train_dirs[0]) if train_dirs else load_folder_csvs("dataset_extracted")

print(f"Total labeled data loaded: {len(full_train_df)} rows")

Using device: cuda
Please upload your ZIP file containing the 'train' and 'dev' folders:


Saving polar_data.zip to polar_data.zip
Extracting polar_data.zip...

Loading Training Data...
Total labeled data loaded: 29987 rows


## 3. Data Split, Cleaning and Weight Allocation

In [6]:
# 4. PREPROCESS & SPLIT
POTENTIAL_LABELS = [
    'political', 'racial/ethnic', 'religious', 'gender/sexual', 'other',
    'stereotype', 'vilification', 'dehumanization',
    'extreme_language', 'lack_of_empathy', 'invalidation'
]

# Identify active labels
TYPE_COLUMNS = [c for c in POTENTIAL_LABELS if c in full_train_df.columns]
NUM_LABELS = len(TYPE_COLUMNS)
print(f"\nActive Labels ({NUM_LABELS}): {TYPE_COLUMNS}")

# Fill NaNs with 0
full_train_df[TYPE_COLUMNS] = full_train_df[TYPE_COLUMNS].fillna(0)

# Clean Text
def preprocess_text(text):
    if pd.isna(text): return ""
    return ' '.join([word for word in str(text).split() if not word.startswith('http')]).strip()

full_train_df['text_clean'] = full_train_df['text'].apply(preprocess_text)

# Filter for Polarization=1
if 'polarization' in full_train_df.columns:
    print("Filtering for polarization=1...")
    full_train_df = full_train_df[full_train_df['polarization'] == 1].reset_index(drop=True)

# *** THE FIX: CREATE OUR OWN TRAIN/DEV SPLIT ***
print("Splitting data into 80% Train and 20% Validation...")
train_df, dev_df = train_test_split(full_train_df, test_size=0.2, random_state=42)
train_df = train_df.reset_index(drop=True)
dev_df = dev_df.reset_index(drop=True)

print(f"Final Train shape: {train_df.shape}")
print(f"Final Dev shape:   {dev_df.shape}")

# 5. CLASS WEIGHTS
print("\nCalculating Class Weights...")
train_labels = train_df[TYPE_COLUMNS].values.astype(np.float32)
dev_labels = dev_df[TYPE_COLUMNS].values.astype(np.float32)

pos_weights = []
for i in range(NUM_LABELS):
    pos_c = train_labels[:, i].sum()
    neg_c = len(train_labels) - pos_c
    weight = (neg_c / pos_c) if pos_c > 0 else 1.0
    pos_weights.append(weight)

pos_weights = torch.tensor(pos_weights, dtype=torch.float32).to(device)

# 6. LOSS FUNCTIONS
class SupervisedContrastiveLoss(nn.Module):
    def __init__(self, temperature=0.07):
        super().__init__()
        self.temperature = temperature
    def forward(self, features, labels):
        features = F.normalize(features, dim=1)
        similarity_matrix = torch.matmul(features, features.T)
        labels_dot = torch.matmul(labels, labels.T)
        mask = (labels_dot > 0).float()
        logits_mask = torch.scatter(torch.ones_like(mask), 1, torch.arange(mask.shape[0]).view(-1, 1).to(device), 0)
        mask = mask * logits_mask
        exp_logits = torch.exp(similarity_matrix / self.temperature) * logits_mask
        log_prob = similarity_matrix / self.temperature - torch.log(exp_logits.sum(1, keepdim=True) + 1e-6)
        sum_mask = mask.sum(1)
        sum_mask[sum_mask == 0] = 1
        return -(mask * log_prob).sum(1) / sum_mask

class FocalLoss(nn.Module):
    def __init__(self, pos_weight, alpha=0.25, gamma=2.0):
        super().__init__()
        self.pos_weight = pos_weight
        self.alpha = alpha
        self.gamma = gamma
    def forward(self, inputs, targets):
        bce = F.binary_cross_entropy_with_logits(inputs, targets, pos_weight=self.pos_weight, reduction='none')
        pt = torch.exp(-bce)
        return (self.alpha * (1-pt)**self.gamma * bce).mean()



Active Labels (5): ['political', 'racial/ethnic', 'religious', 'gender/sexual', 'other']
Splitting data into 80% Train and 20% Validation...
Final Train shape: (23989, 8)
Final Dev shape:   (5998, 8)

Calculating Class Weights...


## 4. Model Architecture and Training

In [7]:

# 7. MODEL
class ContrastiveXLMR(nn.Module):
    def __init__(self, num_labels):
        super().__init__()
        self.roberta = XLMRobertaModel.from_pretrained('xlm-roberta-base')
        self.dropout = nn.Dropout(0.2)
        self.projection = nn.Sequential(nn.Linear(768, 768), nn.ReLU(), nn.Linear(768, 128))
        self.classifier = nn.Sequential(
            nn.Linear(768, 1536), nn.LayerNorm(1536), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(1536, num_labels)
        )
    def forward(self, input_ids, attention_mask):
        out = self.roberta(input_ids=input_ids, attention_mask=attention_mask)
        pooled = out.last_hidden_state[:, 0, :]
        return self.classifier(self.dropout(pooled)), self.projection(pooled)

# 8. TRAINING
class PolarDataset(Dataset):
    def __init__(self, texts, labels, tokenizer):
        self.texts = texts; self.labels = labels; self.tokenizer = tokenizer
    def __len__(self): return len(self.texts)
    def __getitem__(self, idx):
        enc = self.tokenizer(str(self.texts[idx]), padding='max_length', truncation=True, max_length=128, return_tensors='pt')
        return {'input_ids': enc['input_ids'][0], 'attention_mask': enc['attention_mask'][0], 'labels': torch.tensor(self.labels[idx])}

BATCH_SIZE = 16
EPOCHS = 8
tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')
train_loader = DataLoader(PolarDataset(train_df['text_clean'].values, train_labels, tokenizer), batch_size=BATCH_SIZE, shuffle=True)
dev_loader = DataLoader(PolarDataset(dev_df['text_clean'].values, dev_labels, tokenizer), batch_size=BATCH_SIZE, shuffle=False)

model = ContrastiveXLMR(NUM_LABELS).to(device)
optimizer = AdamW(model.parameters(), lr=2e-5, weight_decay=0.01)
criterion_cls = FocalLoss(pos_weights)
criterion_scl = SupervisedContrastiveLoss(temperature=0.1)
scheduler = get_linear_schedule_with_warmup(optimizer, 0, len(train_loader)*EPOCHS)

print("\nStarting Training...")
best_f1 = 0
for epoch in range(EPOCHS):
    model.train()
    loss_sum = 0
    for batch in tqdm(train_loader, desc=f"Epoch {epoch+1}"):
        ids, mask, lbls = batch['input_ids'].to(device), batch['attention_mask'].to(device), batch['labels'].to(device)
        optimizer.zero_grad()
        logits, feats = model(ids, mask)
        loss = criterion_cls(logits, lbls) + (0.5 * criterion_scl(feats, lbls).mean())
        loss.backward()
        optimizer.step()
        scheduler.step()
        loss_sum += loss.item()

    model.eval()
    all_preds, all_lbls = [], []
    with torch.no_grad():
        for batch in dev_loader:
            ids, mask = batch['input_ids'].to(device), batch['attention_mask'].to(device)
            logits, _ = model(ids, mask)
            all_preds.append((torch.sigmoid(logits) > 0.4).float().cpu().numpy())
            all_lbls.append(batch['labels'].cpu().numpy())

    if len(all_preds) > 0:
        macro_f1 = f1_score(np.vstack(all_lbls), np.vstack(all_preds), average='macro', zero_division=0)
        print(f"Epoch {epoch+1} Loss: {loss_sum/len(train_loader):.4f} | Dev Macro F1: {macro_f1:.4f}")
        if macro_f1 > best_f1:
            best_f1 = macro_f1
            torch.save(model.state_dict(), 'best_model.pt')
            print("✓ Saved Best Model")

print(f"\nFinal Best Macro F1: {best_f1:.4f}")

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]


Starting Training...


Epoch 1:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 1 Loss: 0.7004 | Dev Macro F1: 0.4733
✓ Saved Best Model


Epoch 2:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 2 Loss: 0.6211 | Dev Macro F1: 0.5283
✓ Saved Best Model


Epoch 3:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 3 Loss: 0.5938 | Dev Macro F1: 0.5112


Epoch 4:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 4 Loss: 0.5680 | Dev Macro F1: 0.5214


Epoch 5:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 5 Loss: 0.5421 | Dev Macro F1: 0.5864
✓ Saved Best Model


Epoch 6:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 6 Loss: 0.5200 | Dev Macro F1: 0.5672


Epoch 7:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 7 Loss: 0.5035 | Dev Macro F1: 0.5897
✓ Saved Best Model


Epoch 8:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 8 Loss: 0.4873 | Dev Macro F1: 0.5939
✓ Saved Best Model

Final Best Macro F1: 0.5939


# 2. EXPERIMENTATION 2: CODE WITH EXTENDED TRAINGING EPOCHS, GRADIENT ACCUMULATION AND DYNAMIC THRESHOLDING

## 1. Model and Optimal Threshold Setup

In [8]:
# ============================================================================
# ENHANCED TRAINING: Gradient Accumulation + Dynamic Thresholds
# ============================================================================
from transformers import get_cosine_schedule_with_warmup

# 1. CONFIGURATION (UPDATED)
# ============================================================================
BATCH_SIZE = 16
GRAD_ACCUM_STEPS = 2  # Effective Batch Size = 32 (Better for Contrastive Loss)
EPOCHS = 12           # Increased from 8 to 12
LR = 2e-5

# 2. MODEL & OPTIMIZER SETUP
# ============================================================================
model = ContrastiveXLMR(NUM_LABELS).to(device)
optimizer = AdamW(model.parameters(), lr=LR, weight_decay=0.01)

# Switch to Cosine Scheduler (Better for longer training)
total_steps = len(train_loader) * EPOCHS // GRAD_ACCUM_STEPS
scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=int(0.1*total_steps), num_training_steps=total_steps)

# Loss Functions
criterion_cls = FocalLoss(pos_weights)
criterion_scl = SupervisedContrastiveLoss(temperature=0.1)

# 3. HELPER: FIND OPTIMAL THRESHOLDS
# ============================================================================
def optimize_thresholds(y_true, y_probs):
    """Finds the best F1 threshold for each class independently."""
    best_thresholds = []
    best_f1s = []

    # Iterate over each class column
    for i in range(y_true.shape[1]):
        best_t = 0.5
        best_f1 = 0.0

        # Check thresholds from 0.20 to 0.80
        for t in np.arange(0.2, 0.8, 0.05):
            preds = (y_probs[:, i] > t).astype(int)
            score = f1_score(y_true[:, i], preds, zero_division=0)
            if score > best_f1:
                best_f1 = score
                best_t = t

        best_thresholds.append(best_t)
        best_f1s.append(best_f1)

    return best_thresholds, best_f1s


## 2. Training Loop

In [10]:
# 4. TRAINING LOOP
# ============================================================================
print(f"\nStarting Enhanced Training (Epochs: {EPOCHS}, Eff. Batch: {BATCH_SIZE*GRAD_ACCUM_STEPS})...")
best_macro_f1 = 0
best_thresholds = [0.5] * NUM_LABELS # Default

for epoch in range(EPOCHS):
    model.train()
    loss_sum = 0
    optimizer.zero_grad()

    # --- TRAINING PHASE ---
    for step, batch in enumerate(tqdm(train_loader, desc=f"Epoch {epoch+1}")):
        ids = batch['input_ids'].to(device)
        mask = batch['attention_mask'].to(device)
        lbls = batch['labels'].to(device)

        # Forward
        logits, feats = model(ids, mask)

        # Combined Loss
        loss = criterion_cls(logits, lbls) + (0.5 * criterion_scl(feats, lbls).mean())

        # Normalize loss for accumulation
        loss = loss / GRAD_ACCUM_STEPS
        loss.backward()

        # Step only after accumulation
        if (step + 1) % GRAD_ACCUM_STEPS == 0:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            scheduler.step()
            optimizer.zero_grad()

        loss_sum += loss.item() * GRAD_ACCUM_STEPS



Starting Enhanced Training (Epochs: 12, Eff. Batch: 32)...


Epoch 1:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 2:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 3:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 4:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 5:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 6:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 7:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 8:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 9:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 10:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 11:   0%|          | 0/1500 [00:00<?, ?it/s]

Epoch 12:   0%|          | 0/1500 [00:00<?, ?it/s]

## 3. Evaluation Phase

In [11]:


    # --- EVALUATION PHASE ---
    model.eval()
    all_probs, all_lbls = [], []

    with torch.no_grad():
        for batch in dev_loader:
            ids = batch['input_ids'].to(device)
            mask = batch['attention_mask'].to(device)

            logits, _ = model(ids, mask)
            # Store probabilities (0.0 to 1.0), NOT hard predictions yet
            probs = torch.sigmoid(logits)

            all_probs.append(probs.cpu().numpy())
            all_lbls.append(batch['labels'].cpu().numpy())

    if len(all_probs) > 0:
        y_probs = np.vstack(all_probs)
        y_true = np.vstack(all_lbls)

        # DYNAMIC THRESHOLDING: Calculate best threshold for THIS epoch
        current_thresholds, class_f1s = optimize_thresholds(y_true, y_probs)

        # Apply optimal thresholds to get predictions
        y_pred = np.zeros_like(y_probs)
        for i, t in enumerate(current_thresholds):
            y_pred[:, i] = (y_probs[:, i] > t).astype(int)

        macro_f1 = f1_score(y_true, y_pred, average='macro', zero_division=0)
        avg_loss = loss_sum / len(train_loader)

        print(f"Epoch {epoch+1} | Loss: {avg_loss:.4f} | Dev Macro F1: {macro_f1:.4f} ⭐")
        print(f"   -> Opt Thresholds: {['{:.2f}'.format(t) for t in current_thresholds]}")

        # Save Best
        if macro_f1 > best_macro_f1:
            best_macro_f1 = macro_f1
            best_thresholds = current_thresholds
            torch.save(model.state_dict(), 'best_model_turbo.pt')
            print("   ✓ Saved New Best Model")
    else:
        print("Warning: Dev set evaluation failed.")

print(f"\nFinal Best Macro F1: {best_macro_f1:.4f}")
print("Optimal Thresholds found:", best_thresholds)

Epoch 12 | Loss: 0.4150 | Dev Macro F1: 0.6164 ⭐
   -> Opt Thresholds: ['0.50', '0.60', '0.70', '0.75', '0.55']
   ✓ Saved New Best Model

Final Best Macro F1: 0.6164
Optimal Thresholds found: [np.float64(0.49999999999999994), np.float64(0.5999999999999999), np.float64(0.7), np.float64(0.7499999999999998), np.float64(0.5499999999999999)]


# BOOTSTRAPPING CODE

In [12]:
import numpy as np
from sklearn.utils import resample
from sklearn.metrics import f1_score

def run_bootstrap_on_test(all_probs_list, all_lbls_list, thresholds, n_iterations=1000):
    print(f"Starting Bootstrapping on Test Data (n={n_iterations})...")

    # 1. Ensure data is in numpy format
    if isinstance(all_probs_list, list):
        y_probs = np.vstack(all_probs_list)
        y_true = np.vstack(all_lbls_list)
    else:
        y_probs = all_probs_list
        y_true = all_lbls_list

    # 2. Convert Probabilities to Binary Predictions using your Best Thresholds
    # We do this ONCE before resampling to save time
    print(f"Applying thresholds: {['{:.2f}'.format(t) for t in thresholds]}")
    y_pred_binary = np.zeros_like(y_probs)
    for i, t in enumerate(thresholds):
        y_pred_binary[:, i] = (y_probs[:, i] > t).astype(int)

    boot_scores = []
    n_samples = len(y_true)
    # Create an array of indices [0, 1, 2, ... N-1] to resample
    indices = np.arange(n_samples)

    # 3. Bootstrap Loop
    for i in range(n_iterations):
        # Resample indices with replacement
        # This creates a "virtual" test set of the same size
        sample_indices = resample(indices, replace=True)

        y_true_sample = y_true[sample_indices]
        y_pred_sample = y_pred_binary[sample_indices]

        # Calculate Macro F1 for this iteration
        score = f1_score(y_true_sample, y_pred_sample,
                         average='macro', zero_division=0)
        boot_scores.append(score)

    # 4. Calculate Confidence Intervals (95%)
    alpha = 0.95
    p_lower = ((1.0 - alpha) / 2.0) * 100  # 2.5th percentile
    p_upper = (alpha + ((1.0 - alpha) / 2.0)) * 100 # 97.5th percentile

    mean_score = np.mean(boot_scores)
    lower_bound = np.percentile(boot_scores, p_lower)
    upper_bound = np.percentile(boot_scores, p_upper)

    # 5. Output Results
    print("\n" + "="*40)
    print(f"FINAL BOOTSTRAP RESULTS (Macro F1)")
    print("="*40)
    print(f"Original Score: {0.6164}") # Your specific run
    print(f"Bootstrap Mean: {mean_score:.4f}")
    print(f"95% CI:         [{lower_bound:.4f} - {upper_bound:.4f}]")
    print("="*40)

    print("\nLaTeX Table Format:")
    print(f"{mean_score:.3f} ({lower_bound:.3f}--{upper_bound:.3f})")

# --- EXECUTE ---
# This uses the variables currently in your memory
run_bootstrap_on_test(all_probs, all_lbls, best_thresholds)

Starting Bootstrapping on Test Data (n=1000)...
Applying thresholds: ['0.50', '0.60', '0.70', '0.75', '0.55']

FINAL BOOTSTRAP RESULTS (Macro F1)
Original Score: 0.6164
Bootstrap Mean: 0.6162
95% CI:         [0.5996 - 0.6310]

LaTeX Table Format:
0.616 (0.600--0.631)


# 3. MULTILINGUAL INFERENCE DEMO

In [13]:
# ============================================================================
# MULTILINGUAL INFERENCE DEMO
# ============================================================================

def predict_polarization(text, model, tokenizer, threshold=0.4):
    model.eval()
    # 1. Clean & Tokenize
    clean_text = ' '.join([word for word in text.split() if not word.startswith('http')]).strip()
    encoded = tokenizer(clean_text, return_tensors='pt', padding='max_length', truncation=True, max_length=128)

    input_ids = encoded['input_ids'].to(device)
    attention_mask = encoded['attention_mask'].to(device)

    # 2. Predict
    with torch.no_grad():
        logits, _ = model(input_ids, attention_mask)
        probs = torch.sigmoid(logits).cpu().numpy()[0]

    # 3. Pretty Print
    print(f"Input: \"{text}\"")
    print(f"{'Label':<20} | {'Conf':<8} | {'Prediction'}")
    print("-" * 45)

    active_labels = []
    for idx, label in enumerate(TYPE_COLUMNS):
        score = probs[idx]
        if score > threshold:
            print(f"{label.title():<20} | {score:.4f}   | ✅ YES")
            active_labels.append(label)
        else:
            # Uncomment below if you want to see all probabilities
            # print(f"{label.title():<20} | {score:.4f}   | ❌")
            pass

    if not active_labels:
        print(">> No polarization detected (or low confidence)")
    print("\n" + "="*50 + "\n")

# ==========================================
# TEST CASES (Multilingual)
# ==========================================
examples = [
    # 1. English (Political)
    "The radical left is destroying our country and ruining our traditional values.",

    # 2. Spanish (Gender/Sexual) -> "Feminists hate men and want to destroy the family."
    "Las feministas odian a los hombres y quieren destruir la familia tradicional.",

    # 3. French (Racial/Ethnic) -> "These immigrants refuse to adapt to our culture."
    "Ces immigrants refusent de s'adapter à notre culture et volent nos emplois.",

    # 4. Hindi (Religious) -> "Our religion is in danger, we must fight them."
    "हमारा धर्म खतरे में है, हमें उनसे लड़ना होगा और उन्हें बाहर निकालना होगा।",

    # 5. Amharic (Political/Ethnic - common in your dataset)
    # "This group is an enemy of the people."
    "ይህ ቡድን የህዝብ ጠላት ነው እናም መወገድ አለበት።"
]

print("RUNNING MULTILINGUAL PREDICTIONS...\n")
for text in examples:
    predict_polarization(text, model, tokenizer)

RUNNING MULTILINGUAL PREDICTIONS...

Input: "The radical left is destroying our country and ruining our traditional values."
Label                | Conf     | Prediction
---------------------------------------------
Political            | 0.9326   | ✅ YES


Input: "Las feministas odian a los hombres y quieren destruir la familia tradicional."
Label                | Conf     | Prediction
---------------------------------------------
Political            | 0.8778   | ✅ YES
Religious            | 0.4233   | ✅ YES
Gender/Sexual        | 0.9864   | ✅ YES


Input: "Ces immigrants refusent de s'adapter à notre culture et volent nos emplois."
Label                | Conf     | Prediction
---------------------------------------------
Racial/Ethnic        | 0.9727   | ✅ YES


Input: "हमारा धर्म खतरे में है, हमें उनसे लड़ना होगा और उन्हें बाहर निकालना होगा।"
Label                | Conf     | Prediction
---------------------------------------------
Political            | 0.8660   | ✅ YES
Racial/Eth

# 4. CODE FOR COMPARISON OF BASE MODEL OF LARGE MODEL

## 1. Large Model Setup

In [14]:
# ============================================================================
# EXPERIMENT 2: XLM-ROBERTA-LARGE COMPARISON
# ============================================================================
import gc

# 1. CLEANUP to free VRAM from previous model
try:
    del model
    del optimizer
    del scheduler
    torch.cuda.empty_cache()
    gc.collect()
except NameError:
    pass

print("\n" + "="*60)
print("STARTING LARGE MODEL EXPERIMENT (XLM-ROBERTA-LARGE)")
print("="*60)

# 2. CONFIG FOR LARGE MODEL (Reduced Batch Size to avoid OOM)
LARGE_BATCH_SIZE = 4       # Reduced from 16 to 4 to fit in memory
LARGE_GRAD_ACCUM = 8       # Increased to maintain effective batch size of 32
LARGE_EPOCHS = 6           # Reduced slightly as large models learn faster
LARGE_LR = 1e-5            # Lower learning rate for stability



STARTING LARGE MODEL EXPERIMENT (XLM-ROBERTA-LARGE)


## 2. Large Model Architecture

In [15]:
# 3. DEFINE LARGE MODEL ARCHITECTURE
class ContrastiveXLMR_Large(nn.Module):
    def __init__(self, num_labels):
        super().__init__()
        # Load the LARGE version
        self.roberta = XLMRobertaModel.from_pretrained('xlm-roberta-large')
        self.dropout = nn.Dropout(0.2)

        # Projection Head (Input size is now 1024 for Large)
        self.projection = nn.Sequential(
            nn.Linear(1024, 1024),
            nn.ReLU(),
            nn.Linear(1024, 128)
        )

        # Classification Head (Input size 1024)
        self.classifier = nn.Sequential(
            nn.Linear(1024, 2048), nn.LayerNorm(2048), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(2048, num_labels)
        )

    def forward(self, input_ids, attention_mask):
        # Enable gradient checkpointing to save memory
        if self.training:
            self.roberta.gradient_checkpointing_enable()

        outputs = self.roberta(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.last_hidden_state[:, 0, :]

        proj_features = self.projection(pooled_output)
        logits = self.classifier(self.dropout(pooled_output))

        return logits, proj_features

# 4. INITIALIZE
print("Loading xlm-roberta-large... (This may take a minute)")
tokenizer_large = XLMRobertaTokenizer.from_pretrained('xlm-roberta-large')
model_large = ContrastiveXLMR_Large(NUM_LABELS).to(device)

# Re-create DataLoaders with smaller batch size for Large model
train_loader_large = DataLoader(
    PolarDataset(train_df['text_clean'].values, train_labels, tokenizer_large),
    batch_size=LARGE_BATCH_SIZE,
    shuffle=True
)
dev_loader_large = DataLoader(
    PolarDataset(dev_df['text_clean'].values, dev_labels, tokenizer_large),
    batch_size=LARGE_BATCH_SIZE,
    shuffle=False
)

optimizer_large = AdamW(model_large.parameters(), lr=LARGE_LR, weight_decay=0.01)
total_steps = len(train_loader_large) * LARGE_EPOCHS // LARGE_GRAD_ACCUM
scheduler_large = get_cosine_schedule_with_warmup(optimizer_large, int(0.1*total_steps), total_steps)


Loading xlm-roberta-large... (This may take a minute)


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

## 3. Training Loop

In [17]:

# 5. TRAINING LOOP (LARGE)
best_f1_large = 0
print(f"\nStarting Training (Effective Batch Size: {LARGE_BATCH_SIZE*LARGE_GRAD_ACCUM})...")

for epoch in range(LARGE_EPOCHS):
    model_large.train()
    loss_sum = 0
    optimizer_large.zero_grad()

    for step, batch in enumerate(tqdm(train_loader_large, desc=f"Epoch {epoch+1}")):
        ids = batch['input_ids'].to(device)
        mask = batch['attention_mask'].to(device)
        lbls = batch['labels'].to(device)

        logits, feats = model_large(ids, mask)

        # Loss Calculation
        loss = criterion_cls(logits, lbls) + (0.5 * criterion_scl(feats, lbls).mean())
        loss = loss / LARGE_GRAD_ACCUM
        loss.backward()

        if (step + 1) % LARGE_GRAD_ACCUM == 0:
            torch.nn.utils.clip_grad_norm_(model_large.parameters(), 1.0)
            optimizer_large.step()
            scheduler_large.step()
            optimizer_large.zero_grad()

        loss_sum += loss.item() * LARGE_GRAD_ACCUM


Starting Training (Effective Batch Size: 32)...


Epoch 1:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 2:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 3:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 4:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 5:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 6:   0%|          | 0/5998 [00:00<?, ?it/s]

## 4. Evaluation Phase

In [None]:



    # Evaluation
    model_large.eval()
    all_probs, all_lbls = [], []
    with torch.no_grad():
        for batch in dev_loader_large:
            ids = batch['input_ids'].to(device)
            mask = batch['attention_mask'].to(device)
            logits, _ = model_large(ids, mask)
            all_probs.append(torch.sigmoid(logits).cpu().numpy())
            all_lbls.append(batch['labels'].cpu().numpy())

    if len(all_probs) > 0:
        y_probs = np.vstack(all_probs)
        y_true = np.vstack(all_lbls)

        # Optimize Thresholds
        current_thresholds, _ = optimize_thresholds(y_true, y_probs)
        y_pred = np.zeros_like(y_probs)
        for i, t in enumerate(current_thresholds):
            y_pred[:, i] = (y_probs[:, i] > t).astype(int)

        macro_f1 = f1_score(y_true, y_pred, average='macro', zero_division=0)
        print(f"Epoch {epoch+1} | Loss: {loss_sum/len(train_loader_large):.4f} | Dev Macro F1: {macro_f1:.4f}")

        if macro_f1 > best_f1_large:
            best_f1_large = macro_f1
            torch.save(model_large.state_dict(), 'best_model_large.pt')
            print("✓ Saved Best Large Model")

# 6. FINAL COMPARISON TABLE
print("\n" + "="*60)
print("FINAL EXPERIMENTAL RESULTS (BASE VS LARGE)")
print("="*60)
print(f"{'Model':<20} | {'Macro F1':<15}")
print("-" * 40)
# Note: best_macro_f1 comes from the previous experiment (Base model)
# If you didn't run the base model experiment in this session, replace with your recorded number.
base_score = best_macro_f1 if 'best_macro_f1' in globals() else 0.0
print(f"{'XLM-R Base':<20} | {base_score:.4f}")
print(f"{'XLM-R Large':<20} | {best_f1_large:.4f}")
print("-" * 40)

if base_score > 0:
    improvement = ((best_f1_large - base_score) / base_score) * 100
    print(f"Improvement: {improvement:.2f}%")


STARTING LARGE MODEL EXPERIMENT (XLM-ROBERTA-LARGE)
Loading xlm-roberta-large... (This may take a minute)


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.24G [00:00<?, ?B/s]


Starting Training (Effective Batch Size: 32)...


Epoch 1:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 1 | Loss: 0.2943 | Dev Macro F1: 0.5506
✓ Saved Best Large Model


Epoch 2:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 2 | Loss: 0.2141 | Dev Macro F1: 0.6442
✓ Saved Best Large Model


Epoch 3:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 3 | Loss: 0.1896 | Dev Macro F1: 0.6516
✓ Saved Best Large Model


Epoch 4:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 4 | Loss: 0.1729 | Dev Macro F1: 0.6661
✓ Saved Best Large Model


Epoch 5:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 5 | Loss: 0.1610 | Dev Macro F1: 0.6667
✓ Saved Best Large Model


Epoch 6:   0%|          | 0/5998 [00:00<?, ?it/s]

Epoch 6 | Loss: 0.1552 | Dev Macro F1: 0.6675
✓ Saved Best Large Model

FINAL EXPERIMENTAL RESULTS (BASE VS LARGE)
Model                | Macro F1       
----------------------------------------
XLM-R Base           | 0.6510
XLM-R Large          | 0.6675
----------------------------------------
Improvement: 2.53%
