# Multi-Model Comparison

This notebook loads all available models in the `Models` directory and tests them on a set of provided sentences.

**Models included:**
1. **MarBRET In-Context Pipeline** (located in `incontext_marbret_approach`)
   - Uses an OOD Detector + MarBRET Classifier.
2. **MarBRET V1** (located in `marbret_v1`) worst model, no need for it
   - Standalone MarBRET Classifier.
3. **Baseline SVM** (located in `marbret_v1`)
   - TF-IDF + SVM Pipeline.

In [2]:
import joblib
import json
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import re
import pandas as pd
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings("ignore")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [3]:
# Test Sentences provided by the user
test_sentences = [
    "حاب نبرا من لادروك عاونوني",
    "besoin d'aide bach nahbes l'addiction",
    "rabi ya3tikom lkhir, kifach ndir bach ma n3awadch ntih?",
    "ما هي أعراض انسحاب الكوكايين؟",
    "est-ce que l'ecstasy ydir machakil fil l9alb?",
    "win kayen centre d'addictologie f dzayer?",
    "ma3andich l'espoir f lhayat, nhab nmout",
    "راني حاب نشرب قاع الدوا لي عندي باش نرقد وما نوضش",
    "je veux me faire du mal, l'angoisse 9atletni",
    "win nalkaw l'khobz lyoum?",
    "كيفاش راهي حالة الطقس في وهران؟",
    "match l'algérie lyoum wa9tach?",

    "Looking for support",
    "راني تعبت نفسيا من هاد الإدمان، كل مرة نحب نحبس ونرجع نطيح",
    "حسيت روحي ضعيف بزاف هالأيام، نحتاج واحد يهدر معايا ويفهمني",
    "راني خايف نعاود نطيح، شكون مرّ بنفس التجربة وخرج منها؟",
    "ما بقاش عندي صبر، الضغوطات قتلتني وراسي معمّر أفكار خايبة",
    "نحب نتبدل ونولي إنسان مليح بصح وحدي ما قدرتش",

    "# Exact fact",
    "ما هي أعراض انسحاب الهيروين ووشحال تدوم؟",
    "الكوكايين يقدر يدير مشاكل في القلب ولا غير إشاعات؟",
    "win kayen centre d’addictologie مجاني فالعاصمة؟",
    "واش الدوا لي يعطوه للي يحبس الكحول آمن ولا لا؟",
    "قدّاه يلزم وقت باش يرجع الجسم طبيعي بعد حبوس المخدرات؟",

    "# Harm",
    "ما بقاتش فيا، راني نفكر نضرّ روحي ونرتاح من هاد العذاب",
    "الحياة ولات سوداء، نحب نموت ونرتاح وخلاص",
    "راني نخمّم نشرب دوا بزاف ونرقد وما نفيقش",
    "القلق ذبحني، جاتني أفكار نأذي فيها نفسي",
    "حاس روحي خطر على نفسي، ما نعرفش وش ندير",

    "# Out of context",
    "win نلقى ميكانيكي محلول دoka لليوم؟",
    "الطقس غدوة في قسنطينة كيفاش راه؟",
    "match MCA مع شكون وواش يبان في التلفزة؟",
    "قدّاه سعر الكيلو تاع البطاطا اليوم؟",
    "kifach ndir باش نبدل كلمة السر تاع الفيسبوك؟"
]


In [4]:
def clean_text(text):
    if not isinstance(text, str):
        return ""
    
    # Lowercase Latin characters
    text = text.lower()
    
    # Remove URLS
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
    
    # Remove Emojis and special chars (keep basic punctuation)
    text = re.sub(r'[^\w\s\u0600-\u06FF]', ' ', text)
    
    # Arabic Normalization
    text = re.sub("[إأآا]", "ا", text)
    text = re.sub("ى", "ي", text)
    text = re.sub("ؤ", "ء", text)
    text = re.sub("ئ", "ء", text)
    text = re.sub("ة", "ه", text)
    text = re.sub("گ", "ك", text)
    
    # Remove Tashkeel
    tashkeel = re.compile(r'[\u064B-\u0652]')
    text = re.sub(tashkeel, "", text)
    
    # Remove longation
    text = re.sub(r'(.)\1+', r'\1\1', text)
    
    # Remove multiple spaces
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

## 1. Load "In-Context MarBRET" Pipeline
Includes OOD Detector + MarBRET Classifier.

In [5]:
# Paths
IC_BASE_DIR = "incontext_marbret_approach"
IC_MODEL_DIR = f"{IC_BASE_DIR}/marbret_intent_classifier"
IC_DETECTOR_PATH = f"{IC_BASE_DIR}/ood_detector/detector_pipeline.joblib"

print("Loading In-Context Models...")

# Load OOD Detector
ic_detector = joblib.load(IC_DETECTOR_PATH)

# Load MarBRET
ic_tokenizer = AutoTokenizer.from_pretrained(IC_MODEL_DIR)
ic_model = AutoModelForSequenceClassification.from_pretrained(IC_MODEL_DIR)
ic_model.to(device)
ic_model.eval()

# Load Mapping
with open(f"{IC_MODEL_DIR}/label_mapping.json", "r", encoding="utf-8") as f:
    ic_mapping = json.load(f)
ic_id_to_label = {int(k): v for k, v in ic_mapping["id_to_label"].items()}

def predict_in_context(text, ood_threshold=0.5):
    cleaned = clean_text(text)
    
    # 1. Check OOD
    if hasattr(ic_detector, "predict_proba") and ood_threshold is not None:
        probs = ic_detector.predict_proba([cleaned])[0]
        classes = list(ic_detector.classes_)
        if "out_of_domain" in classes:
            idx_ood = classes.index("out_of_domain")
            p_ood = probs[idx_ood]
            if p_ood >= ood_threshold:
                return "Out of context"
    else:
        if ic_detector.predict([cleaned])[0] == "out_of_domain":
            return "Out of context"

    # 2. Predict Intent
    encoding = ic_tokenizer(
        text,
        add_special_tokens=True,
        max_length=128,
        truncation=True,
        padding="max_length",
        return_tensors="pt"
    )
    encoding = {k: v.to(device) for k, v in encoding.items()}

    with torch.no_grad():
        outputs = ic_model(**encoding)
        logits = outputs.logits
        pred_id = torch.argmax(logits, dim=-1).item()

    return ic_id_to_label[pred_id]

print("In-Context Pipeline Loaded.")


Loading In-Context Models...
In-Context Pipeline Loaded.


## 2. Load "MarBRET V1"
Standalone MarBRET Classifier (no OOD logic attached by default).

In [6]:
# # Paths
# V1_BASE_DIR = "marbret_v1"
# V1_MODEL_DIR = f"{V1_BASE_DIR}/marbret_intent_v1"

# print("Loading MarBRET V1...")

# v1_tokenizer = AutoTokenizer.from_pretrained(V1_MODEL_DIR)
# v1_model = AutoModelForSequenceClassification.from_pretrained(V1_MODEL_DIR)
# v1_model.to(device)
# v1_model.eval()

# with open(f"{V1_MODEL_DIR}/label_mapping.json", "r", encoding="utf-8") as f:
#     v1_mapping = json.load(f)
# v1_id_to_label = {int(k): v for k, v in v1_mapping["id_to_label"].items()}

# def predict_v1(text):
#     encoding = v1_tokenizer(
#         text,
#         add_special_tokens=True,
#         max_length=128,
#         truncation=True,
#         padding="max_length",
#         return_tensors="pt"
#     )
#     encoding = {k: v.to(device) for k, v in encoding.items()}
#     with torch.no_grad():
#         logits = v1_model(**encoding).logits
#         pred_id = torch.argmax(logits, dim=-1).item()
#     return v1_id_to_label[pred_id]

# print("MarBRET V1 Loaded.")

no need to run this cell, since its model is gibberish, always predicting "looking for support"

## 3. Load "Baseline SVM"
Sklearn Pipeline.

In [7]:
# Paths
BL_BASE_DIR = "marbret_v1"
BL_MODEL_PATH = f"{BL_BASE_DIR}/baseline_intent_svm/baseline_pipeline.joblib"

print("Loading Baseline SVM...")

baseline_pipeline = joblib.load(BL_MODEL_PATH)

def predict_baseline(text):
    return baseline_pipeline.predict([text])[0]

print("Baseline SVM Loaded.")

Loading Baseline SVM...
Baseline SVM Loaded.


### predict in context with confidence


In [8]:
#predict with confidence 

#BEST THRESHOLD FOUND IS 0.09
def predict_in_context_with_conf(text, ood_threshold=0.09):
    cleaned = clean_text(text)

    # 1. OOD detector
    p_ood = None
    if hasattr(ic_detector, "predict_proba") and ood_threshold is not None:
        probs = ic_detector.predict_proba([cleaned])[0]
        classes = list(ic_detector.classes_)
        idx_ood = classes.index("out_of_domain")
        p_ood = float(probs[idx_ood])

        if p_ood >= ood_threshold:
            #make it 2point decimal round
            return "Out of context", {"stage": "ood", "p_ood": round(p_ood, 2)}

    # 2. MARBERT intent
    encoding = ic_tokenizer(
        text,
        add_special_tokens=True,
        max_length=128,
        truncation=True,
        padding="max_length",
        return_tensors="pt"
    )
    encoding = {k: v.to(device) for k, v in encoding.items()}

    with torch.no_grad():
        logits = ic_model(**encoding).logits
        probs = torch.softmax(logits, dim=-1)[0].cpu().numpy()

    import numpy as np
    pred_id = int(np.argmax(probs))
    pred_label = ic_id_to_label[pred_id]
    p_intent = float(probs[pred_id])

    return pred_label, {"stage": "intent", "p_ood": round(p_ood, 2), "p_intent": round(p_intent, 2)}


## 4. Compare Predictions

### uncomment inorder to see more result of models

In [None]:
#uncomment the res inorder to see the other results
results = []

for sent in test_sentences:
    res = {
        "Sentence": sent,
        "In-Context (OOD+MarBRET)": predict_in_context(sent),
        "In-Context (OOD+MarBRET) with confidence": predict_in_context_with_conf(sent),
        # "Baseline SVM": predict_baseline(sent),
    }
    results.append(res)

df = pd.DataFrame(results)
pd.set_option('display.max_colwidth', None)
display(df)

Unnamed: 0,Sentence,In-Context (OOD+MarBRET),In-Context (OOD+MarBRET) with confidence
0,حاب نبرا من لادروك عاونوني,Looking for support,"(Looking for support, {'stage': 'intent', 'p_ood': 0.02, 'p_intent': 1.0})"
1,besoin d'aide bach nahbes l'addiction,Exact fact,"(Exact fact, {'stage': 'intent', 'p_ood': 0.05, 'p_intent': 0.63})"
2,"rabi ya3tikom lkhir, kifach ndir bach ma n3awadch ntih?",Harm,"(Harm, {'stage': 'intent', 'p_ood': 0.07, 'p_intent': 0.87})"
3,ما هي أعراض انسحاب الكوكايين؟,Exact fact,"(Exact fact, {'stage': 'intent', 'p_ood': 0.03, 'p_intent': 1.0})"
4,est-ce que l'ecstasy ydir machakil fil l9alb?,Exact fact,"(Exact fact, {'stage': 'intent', 'p_ood': 0.05, 'p_intent': 1.0})"
5,win kayen centre d'addictologie f dzayer?,Exact fact,"(Exact fact, {'stage': 'intent', 'p_ood': 0.07, 'p_intent': 0.99})"
6,"ma3andich l'espoir f lhayat, nhab nmout",Harm,"(Harm, {'stage': 'intent', 'p_ood': 0.02, 'p_intent': 0.98})"
7,راني حاب نشرب قاع الدوا لي عندي باش نرقد وما نوضش,Harm,"(Harm, {'stage': 'intent', 'p_ood': 0.01, 'p_intent': 1.0})"
8,"je veux me faire du mal, l'angoisse 9atletni",Harm,"(Harm, {'stage': 'intent', 'p_ood': 0.08, 'p_intent': 1.0})"
9,win nalkaw l'khobz lyoum?,Exact fact,"(Out of context, {'stage': 'ood', 'p_ood': 0.15})"


## Conclusion
best threshold found is 0.09
where the model has the best performance in distinguishing between in-distribution and out-distribution intents
### best model so far is **In-Context (OOD+MarBRET) with confidence** with threshold of 0.09