# **Evasion Detection Notebook**

# **1. Objectives**

The purpose of this notebook is to contain the evasion detection pipeline. 
1. **Baseline Evasion score** (rule-based) is made up of three components:
- **Cosine similarity**- similarity of the question and answer, lower similarity = more evasive
- **Numeric specificity check**- does the question require a number, if so does the answer contain a number?, e.g. requests for financial data
- **Evasive phrases**- does the answer contain evasive phrases?, presence = more evasive

2. **LLM evasion score** (RoBERTa-MNLI) uses entailment/neutral/contradiction between the question and answer
- Lower entailment (and higher neutral + contradiction) = more evasive
  
3. **Blended evasion score** combines both scores including a weight for the LLM component
- Rationale is that baseline enforces precision while the LLM will capture semantics

# **1. Set up Workspace**

In [33]:
# Import libraries
# Core python
import os
import numpy as np
import pandas as pd
import re
import json
import pathlib
from pathlib import Path
from typing import List, Dict, Any 
import csv
import math
from collections import Counter

# NLP & Summarisation
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
import transformers, datasets, inspect
from llama_cpp import Llama 
import torch
import torch.nn.functional as F

# Retrieval
from sentence_transformers import SentenceTransformer 

# ML
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GroupShuffleSplit

# Visualisations
import matplotlib.pyplot as plt
import seaborn as sns 

# Set SEED.
SEED = 42

# **2. Data Preprocessing**

In [2]:
# Load dataset
all_jpm_2023_2025 = pd.read_csv('../data/processed/jpm/all_jpm_2023_2025.csv')

# View dataset.
display(all_jpm_2023_2025.head())

# Number of rows.
print('Number of rows:', all_jpm_2023_2025.shape[0])

Unnamed: 0,section,question_number,answer_number,speaker_name,role,company,content,year,quarter,is_pleasantry,source_pdf
0,presentation,,,Jeremy Barnum,Chief Financial Officer,JPMorganChase,"Thanks, and good morning, everyone. The presen...",2023,Q1,False,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...
1,qa,,,Steven Chubak,analyst,Wolfe Research LLC,"Hey, good morning.",2023,Q1,True,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...
2,qa,,,Jeremy Barnum,Chief Financial Officer,JPMorgan Chase & Co.,"Good morning, Steve.",2023,Q1,True,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...
3,qa,1.0,,Steven Chubak,analyst,Wolfe Research LLC,"So, Jamie, I was actually hoping to get your p...",2023,Q1,False,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...
4,qa,1.0,1.0,Jamie Dimon,Chairman & Chief Executive Officer,JPMorgan Chase & Co.,"Well, I think you were already kind of complet...",2023,Q1,False,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...


Number of rows: 1411


In [3]:
# Remove pleasantries.
all_jpm_2023_2025_cleaned = all_jpm_2023_2025[all_jpm_2023_2025['is_pleasantry'] == False]
print('Number of rows:', all_jpm_2023_2025_cleaned.shape[0])

Number of rows: 1241


In [4]:
# Check content column.
print('Number of rows with no content:', all_jpm_2023_2025_cleaned['content'].isna().sum())

Number of rows with no content: 23


In [5]:
# Drop rows with no content.
all_jpm_2023_2025_cleaned = all_jpm_2023_2025_cleaned.dropna(subset=['content'])

In [6]:
# Check content column.
print('Number of rows with no content:', all_jpm_2023_2025_cleaned['content'].isna().sum())

Number of rows with no content: 0


In [7]:
# View roles.
all_jpm_2023_2025_cleaned['role'].unique()

array(['Chief Financial Officer', 'analyst',
       'Chairman & Chief Executive Officer',
       'And then some. Theres a lot of value added.', 'Okay',
       "We're fundamentally", 'Thanks', 'Almost no chance.'], dtype=object)

- Some text has leaked into role column.

In [8]:
# View rows with invalid roles. 
valid_roles = 'analyst', 'Chief Financial Officer', 'Chairman & Chief Executive Officer'
invalid_roles_df = all_jpm_2023_2025_cleaned[~all_jpm_2023_2025_cleaned['role'].isin(valid_roles)]
invalid_roles_df.head(10)

Unnamed: 0,section,question_number,answer_number,speaker_name,role,company,content,year,quarter,is_pleasantry,source_pdf
305,qa,22.0,4.0,"Chief Financial Officer, JPMorganChase",And then some. Theres a lot of value added.,JPMorganChase,"Yeah. And obviously, I mean, we're not going t...",2025,Q2,False,data/raw/jpm/.ipynb_checkpoints/jpm-2q25-earni...
309,qa,23.0,3.0,"Chief Financial Officer, JPMorganChase",Okay,there you have it.,"But it's not like I thought it would do badly,...",2025,Q2,False,data/raw/jpm/.ipynb_checkpoints/jpm-2q25-earni...
650,qa,10.0,3.0,Who knows how important politics are in all th...,We're fundamentally,"as I said, I think on the press call, happy to...",little bit cautious about the pull-forward dyn...,2024,Q1,False,data/raw/jpm/jpm-1q24-earnings-call-transcript...
924,qa,8.0,2.0,"Chief Financial Officer, JPMorgan Chase & Co.",Thanks,Glenn.,"Operator: Next, we'll go to the line of Matt O...",2024,Q2,False,data/raw/jpm/jpm-2q24-earnings-call-transcript...
1059,qa,22.0,4.0,"Chief Financial Officer, JPMorganChase",And then some. Theres a lot of value added.,JPMorganChase,"Yeah. And obviously, I mean, we're not going t...",2025,Q2,False,data/raw/jpm/jpm-2q25-earnings-call-transcript...
1063,qa,23.0,3.0,"Chief Financial Officer, JPMorganChase",Okay,there you have it.,"But it's not like I thought it would do badly,...",2025,Q2,False,data/raw/jpm/jpm-2q25-earnings-call-transcript...
1274,qa,23.0,1.0,"Chairman & Chief Executive Officer, JPMorgan C...",Almost no chance.,JPMorganChase,"Well, but having – it's very important. While ...",2024,Q3,False,data/raw/jpm/jpm-3q24-earnings-conference-call...


In [9]:
# Input the correct role information.
all_jpm_2023_2025_cleaned.loc[[305, 309, 924, 1059, 1063], 'role'] = 'Chief Financial Officer'
all_jpm_2023_2025_cleaned.loc[[1274], 'role'] = 'Chairman & Chief Executive Officer'

# Drop nonsence row.
all_jpm_2023_2025_cleaned = all_jpm_2023_2025_cleaned.drop(index=650)

In [10]:
# Check the roles have been updated.
all_jpm_2023_2025_cleaned['role'].unique()

array(['Chief Financial Officer', 'analyst',
       'Chairman & Chief Executive Officer'], dtype=object)

In [11]:
# Normalise role names.
role_map = {
    'analyst': 'analyst',
    'Chief Financial Officer': 'banker',
    'Chairman & Chief Executive Officer': 'banker'
}

# Map roles.
all_jpm_2023_2025_cleaned['role_normalised'] = all_jpm_2023_2025_cleaned['role'].map(role_map)

In [12]:
# View dataset.
display(all_jpm_2023_2025_cleaned.head())
print('Number of rows:', all_jpm_2023_2025_cleaned.shape[0])

Unnamed: 0,section,question_number,answer_number,speaker_name,role,company,content,year,quarter,is_pleasantry,source_pdf,role_normalised
0,presentation,,,Jeremy Barnum,Chief Financial Officer,JPMorganChase,"Thanks, and good morning, everyone. The presen...",2023,Q1,False,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...,banker
3,qa,1.0,,Steven Chubak,analyst,Wolfe Research LLC,"So, Jamie, I was actually hoping to get your p...",2023,Q1,False,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...,analyst
4,qa,1.0,1.0,Jamie Dimon,Chairman & Chief Executive Officer,JPMorgan Chase & Co.,"Well, I think you were already kind of complet...",2023,Q1,False,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...,banker
5,qa,1.0,1.0,Steven Chubak,analyst,Wolfe Research LLC,Got it. And just in terms of appetite for the ...,2023,Q1,False,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...,analyst
6,qa,1.0,2.0,Jamie Dimon,Chairman & Chief Executive Officer,JPMorgan Chase & Co.,"Oh, yeah.",2023,Q1,False,data/raw/jpm/.ipynb_checkpoints/jpm-1q23-earni...,banker


Number of rows: 1217


In [13]:
# Save the cleaned dataset.
all_jpm_2023_2025_cleaned.to_csv('../data/processed/jpm/cleaned/all_jpm_2023_2025_cleaned') 

In [14]:
# Helper function to remove duplicates within questions and answers. 
def clean_repeats(text):
    if not isinstance(text, str):
        return text

    # 1) Normalize whitespace
    t = ' '.join(text.split()).strip()
    if not t:
        return t

    # 2) If the whole-string is a back-to-back duplicate (A+A) = keep first half
    mid = len(t) // 2
    if len(t) % 2 == 0 and t[:mid] == t[mid:]:
        t = t[:mid]

    # 3) Collapse immediate repeated token spans (n-grams)
    toks = t.split()
    out = []
    i = 0
    while i < len(toks):
        matched = False
        max_span = min(50, len(toks) - i)  # cap span to remaining length
        for n in range(max_span, 4, -1):  # try longer spans first: 50..5
            if i + 2*n <= len(toks) and toks[i:i+n] == toks[i+n:i+2*n]:
                out.extend(toks[i:i+n])  # keep one copy
                i += 2*n                # skip the duplicate block
                matched = True
                break
        if not matched:
            out.append(toks[i])
            i += 1
    t = ' '.join(out)

    # 4) Remove duplicate sentences globally (order-preserving)
    sents = re.split(r'(?<=[.!?])\s+', t)
    seen = set()
    uniq = []
    for s in sents:
        s_norm = s.strip()
        if not s_norm:
            continue
        key = ' '.join(s_norm.lower().split())
        if key not in seen:
            seen.add(key)
            uniq.append(s_norm)
    return ' '.join(uniq)

In [15]:
# Function to convert datasets into question and answer pairs.
def create_qa_pairs(df, min_answer_words=30):
    # Keep only the Q&A section.
    qa_df = df[df['section'].astype(str).str.lower() == 'qa'].copy()

    # Split into roles.
    analyst_rows = qa_df[qa_df['role_normalised'] == 'analyst'].copy()
    banker_rows  = qa_df[qa_df['role_normalised'] == 'banker' ].copy()

    # Keys to keep quarters separated
    key_q = ['year', 'quarter', 'question_number']

    # Build full question text per (year, quarter, question_number)
    question_text_map = (
        analyst_rows
        .groupby(key_q, dropna=False)['content']
        .apply(lambda parts: clean_repeats(' '.join(parts.astype(str))))
        .rename('question')
        .reset_index()
    )

    # Ensure bankers have an answer_number — sequential per (year, quarter, question_number) if missing
    if 'answer_number' not in banker_rows.columns or banker_rows['answer_number'].isna().any():
        banker_rows = banker_rows.sort_index().copy()
        banker_rows['answer_number'] = (
            banker_rows
            .groupby(key_q, dropna=False)
            .cumcount() + 1
        )

    # Combine multiple banker utterances belonging to the same answer
    banker_answers = (
        banker_rows
        .groupby(key_q + ['answer_number'], dropna=False)
        .agg({
            'content':        lambda parts: clean_repeats(' '.join(parts.astype(str))),
            'speaker_name':   'first',
            'role':           'first',
            'role_normalised':'first',
            'source_pdf':     'first'
        })
        .rename(columns={'content': 'answer'})
        .reset_index()
    )

    # Merge question text back onto each answer row
    qa_pairs = banker_answers.merge(
        question_text_map,
        on=key_q,
        how='left',
        validate='many_to_one'
    )

    # Order columns for readability
    column_order = [
        'year', 'quarter', 'question_number', 'answer_number',
        'question', 'answer',
        'speaker_name', 'role', 'role_normalised',
        'source_pdf'
    ]
    qa_pairs = qa_pairs.reindex(columns=[c for c in column_order if c in qa_pairs.columns])

    # Sort and reset index.
    qa_pairs = qa_pairs.sort_values(['year', 'quarter', 'question_number', 'answer_number']).reset_index(drop=True)

    # Drop duplicate answers.
    qa_pairs = qa_pairs.drop_duplicates(subset=['answer'])

    # Drop short answers below threshold to ensure quality answers.
    qa_pairs = qa_pairs[qa_pairs['answer'].astype(str).str.split().str.len() >= int(min_answer_words)]

    return qa_pairs

In [16]:
# Create q&A pairs.
all_jpm_2023_2025_qa = create_qa_pairs(all_jpm_2023_2025_cleaned)

In [17]:
# View number of examples.
print('Number of examples:', all_jpm_2023_2025_qa.shape[0])

Number of examples: 309


In [18]:
# Split into prediction set and validation/training/test set.
jpm_2025_predict_qa = all_jpm_2023_2025_qa[all_jpm_2023_2025_qa['year'] == 2025]
jpm_2023_2024_qa = all_jpm_2023_2025_qa[all_jpm_2023_2025_qa['year'].isin([2023, 2024])]

# Save the datasets.
jpm_2025_predict_qa.to_csv('../data/processed/jpm/cleaned/jpm_2025_predict_qa.csv') 
jpm_2023_2024_qa.to_csv('../data/processed/jpm/cleaned/jpm_2023_2024_qa.csv')  

The jpm_2023_2024_qa dataset was then manually labelled according to whether the banker's answer was deemed 'Direct' or 'Evasive'. The label was appended by a new column 'label'.

In [19]:
# Load the labelled dataset.
jpm_2023_2024_qa_labelled = pd.read_csv('../data/processed/jpm/cleaned/jpm_2023_2024_qa_labelled.csv')

# View the dataset.
jpm_2023_2024_qa_labelled = jpm_2023_2024_qa_labelled.drop('Unnamed: 0', axis=1)
jpm_2023_2024_qa_labelled.head()

Unnamed: 0,year,quarter,question_number,answer_number,question,answer,speaker_name,role,role_normalised,source_pdf,label
0,2023,Q4,1.0,1.0,Good morning. Thanks for all the comments on t...,"Yeah. Matt, not particularly updating. I think...",Jeremy Barnum,Chief Financial Officer,banker,data/raw/jpm/jpm-4q23-earnings-call-transcript...,Direct
1,2023,Q4,2.0,1.0,"Okay. And then just separately, you bought bac...",Yeah. Good question. And I think you framed it...,Jeremy Barnum,Chief Financial Officer,banker,data/raw/jpm/jpm-4q23-earnings-call-transcript...,Direct
2,2023,Q4,3.0,1.0,"Thanks. Jeremy, could you give a little more c...","Yeah. Actually, John, this quarter, that's all...",Jeremy Barnum,Chief Financial Officer,banker,data/raw/jpm/jpm-4q23-earnings-call-transcript...,Direct
3,2023,Q4,4.0,1.0,"Okay. And then, just to follow up on the NII, ...","Sure. Yeah, happy to do that, John. So, I thin...",Jeremy Barnum,Chief Financial Officer,banker,data/raw/jpm/jpm-4q23-earnings-call-transcript...,Direct
4,2023,Q4,5.0,1.0,Hey. Good morning. Maybe just to follow up in ...,Yeah. Both good questions. So let's do reprice...,Jeremy Barnum,Chief Financial Officer,banker,data/raw/jpm/jpm-4q23-earnings-call-transcript...,Direct


In [20]:
# Function to split into test, training and validation datasets, preserve number of evasive cases per set.
def train_val_test(df, group_key, test_fraction, val_fraction, random_state):

    # Split test from full data.
    gss1 = GroupShuffleSplit(n_splits=1, test_size=test_fraction, random_state=random_state)
    idx_trainval, idx_test = next(gss1.split(df, groups=df[group_key]))
    train_and_val = df.iloc[idx_trainval].reset_index(drop=True)
    test_set = df.iloc[idx_test].reset_index(drop=True)

    # Split VAL from the remaining data (val is relative to full size)
    val_fraction_of_remaining = val_fraction / (1.0 - test_fraction)
    gss2 = GroupShuffleSplit(n_splits=1, test_size=val_fraction_of_remaining, random_state=random_state + 1)
    idx_train, idx_val = next(gss2.split(train_and_val, groups=train_and_val[group_key]))
    train_set = train_and_val.iloc[idx_train].reset_index(drop=True)
    val_set = train_and_val.iloc[idx_val].reset_index(drop=True)

    return train_set, val_set, test_set


In [21]:
# Make a group key so answers for the same question are not split between datasets.
jpm_2023_2024_qa_labelled['group_key'] = (
    jpm_2023_2024_qa_labelled["year"].astype(str) + "_" +
    jpm_2023_2024_qa_labelled["quarter"].astype(str) + "_" +
    jpm_2023_2024_qa_labelled["question_number"].astype(str)
)

In [22]:
# Split into test, training and validation datasets.
jpm_train, jpm_val, jpm_test = train_val_test(
    jpm_2023_2024_qa_labelled,
    group_key='group_key',
    test_fraction=0.30,
    val_fraction=0.20,
    random_state=SEED
)

In [23]:
# View the split. 
print(f'Number of training examples: {jpm_train.shape[0]} (evasive: {jpm_train[jpm_train["label"] == "Evasive"].shape[0]})')
print(f'Number of validation examples: {jpm_val.shape[0]} (evasive: {jpm_val[jpm_val["label"] == "Evasive"].shape[0]})')
print(f'Number of test examples: {jpm_test.shape[0]} (evasive: {jpm_test[jpm_test["label"] == "Evasive"].shape[0]})')

Number of training examples: 107 (evasive: 22)
Number of validation examples: 43 (evasive: 11)
Number of test examples: 65 (evasive: 9)


In [24]:
# Save the datasets.
jpm_train.to_csv('../data/processed/jpm/cleaned/jpm_train.csv') 
jpm_val.to_csv('../data/processed/jpm/cleaned/jpm_val.csv') 
jpm_test.to_csv('../data/processed/jpm/cleaned/jpm_test.csv') 

# **3. Rule-based Baseline**

## **3.1 Set-up**

In [25]:
# List of evasive phrases
EVASIVE_PHRASES = [
    r"\btoo early\b",
    r"\bcan't (?:comment|share|discuss)\b",
    r"\bwon't (?:comment|share|provide)\b",
    r"\bno (?:update|comment)\b",
    r"\bwe (?:don't|do not) (?:break out|provide guidance)\b",
    r"\bnot (?:going to|able to) (?:comment|share|provide)\b",
    r"\bwe'll (?:come back|circle back)\b",
    r"\bnot something we disclose\b",
    r"\bas (?:we|I) (?:said|mentioned)\b",
    r"\bgenerally speaking\b",
    r"\bit's premature\b",
    r"\bit's difficult to say\b",
    r"\bI (?:wouldn't|won't) want to (?:speculate|get into)\b",
    r"\bI (?:think|guess|suppose)\b",
    r"\bkind of\b",
    r"\bsort of\b",
    r"\baround\b",
    r"\broughly\b",
    r"\bwe (?:prefer|plan) not to\b",
    r"\bwe're not prepared to\b",
]

# List of words that suggest the answer needs specific financial numbers to properly answer the question.
SPECIFICITY_TRIGGERS = [
    "how much","how many","what is","what are","when","which","where","who","why",
    "range","guidance","margin","capex","opex","revenue","sales","eps","ebitda",
    "timeline","date","target","growth","update","split","dividend","cost","price",
    "units","volumes","gross","net","tax","percentage","utilization","order book"
]

NUMERIC_PATTERN = r"(?:\d+(?:\.\d+)?%|\b\d{1,3}(?:,\d{3})*(?:\.\d+)?\b|£|\$|€)"

## **3.2 Functions**

In [26]:
# Function to calculate cosine similarity between question and answers.
def cosine_sim(q, a):
    vec = TfidfVectorizer(stop_words='english').fit_transform([q, a]) # converts text to vectors 
    sim = float(cosine_similarity(vec[0], vec[1])[0, 0]) # calculate the cosine similarity between the two vectors

    return sim

In [27]:
# Function to compute baseline evasion score.
def baseline_evasion_score(q, a):
    # 1. Cosine similarity
    sim = cosine_sim(q, a) # calculates cosine similarity using previous function
    sim_component = (1 - sim) * 45 # less similar the answer is, the bigger the contribution to the evasion score, scaled by 45

    # 2. Numerical specificity- Does the question require and answer with financial figures/ a specific answer?
    needs_num = any(t in q.lower() for t in SPECIFICITY_TRIGGERS) # true if the question requires a numeric/ specific answer
    has_num = bool(re.search(NUMERIC_PATTERN, a)) # true if the answer includes a number 
    numeric_component = 25 if needs_num and not has_num else 0 # score of 25 if the question needs a number but the answer doesn't give one

    # 3. Evasive phrases- does the answer contain evasive phrases?
    phrase_hits = sum(len(re.findall(p, a.lower())) for p in EVASIVE_PHRASES) # counts how many times an evasive phrase appears in the answer
    phrase_component = min(3, phrase_hits) * 8 # max of 3 hits counted, each hit = 8 points 

    # Final evasion score.
    score = min(100, sim_component + numeric_component + phrase_component) # adds components together and caps score at 100
    
    return score, sim, phrase_hits, needs_num, has_num

# **4. LLM**

## **4.1 Training**

Small, lightweight models were selected for this to prevent memory overload and long training times.

In [66]:
# =========================
# Evasion Detection Trainer (F1-optimized threshold)
# =========================

import os
import random
import numpy as np
import torch
import torch.nn.functional as F

from datasets import Dataset
from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    Trainer, TrainingArguments, EarlyStoppingCallback
)
from sklearn.metrics import (
    precision_recall_fscore_support, accuracy_score, roc_auc_score, confusion_matrix
)
from sklearn.utils.class_weight import compute_class_weight


# ===========
# Config
# ===========
SEED = 42
MAX_LEN = 256
NUM_EPOCHS = 10
BATCH_TRAIN = 8
BATCH_EVAL = 16
USE_FOCAL = False          # set True to enable focal loss
FOCAL_GAMMA = 2.0
MINORITY_WEIGHT_MULT = 2.0 # try 2.0–3.0 if collapse persists

# Keep EVERYTHING out of your repo
os.environ["HF_HOME"] = os.path.expanduser("~/hf_cache")
os.environ["HF_DATASETS_CACHE"] = os.path.expanduser("~/hf_datasets")
RUNS_DIR   = os.path.expanduser("~/ml_runs/evasion")
MODELS_DIR = os.path.expanduser("~/trained_models/evasion")
os.makedirs(RUNS_DIR, exist_ok=True)
os.makedirs(MODELS_DIR, exist_ok=True)


# ===========
# Utilities
# ===========
def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.backends.mps.is_available():
        try:
            torch.mps.manual_seed(seed)  # ignore if not present
        except Exception:
            pass

def prepare_text(df):
    # helps model separate Q from A
    return (df["question"].fillna("") + " </s> " + df["answer"].fillna(""))

label_map = {'direct': 0, 'evasive': 1}
def normalise_labels(s):
    return s.astype(str).str.lower().map(label_map)

def choose_threshold_by_f1_strict(y_true, prob_pos, min_pos=1, max_pos=None):
    """
    Choose threshold maximizing F1 on VAL using *strict* rules:
      - Sweep unique probability cutpoints (midpoints between sorted unique probs).
      - Skip degenerate thresholds that predict all-positives or all-negatives.
      - Tie-breakers: higher precision, then threshold closer to 0.5.
    """
    n = len(prob_pos)
    if max_pos is None:
        max_pos = n - 1

    uniq = np.unique(prob_pos)
    if len(uniq) == 1:
        cuts = [uniq[0]]  # model is flat; only one option
    else:
        cuts = list((uniq[:-1] + uniq[1:]) / 2.0)
        cuts = [max(0.0, float(uniq[0] - 1e-6))] + cuts + [min(1.0, float(uniq[-1] + 1e-6))]

    best = {"thr": 0.5, "precision": 0.0, "recall": 0.0, "f1": 0.0}
    for thr in cuts:
        pred1 = (prob_pos >= thr).astype(int)
        pos = int(pred1.sum())
        if pos < min_pos or pos > max_pos:
            continue  # skip degenerate operating points

        pr, rc, f1, _ = precision_recall_fscore_support(
            y_true, pred1, average="binary", pos_label=1, zero_division=0
        )
        if (f1 > best["f1"] or
            (f1 == best["f1"] and pr > best["precision"]) or
            (f1 == best["f1"] and pr == best["precision"] and abs(thr - 0.5) < abs(best["thr"] - 0.5))):
            best = {"thr": float(thr), "precision": float(pr), "recall": float(rc), "f1": float(f1)}
    return best


# ===========================
# Custom Trainer (weighted/focal)
# ===========================
class WeightedTrainer(Trainer):
    def __init__(self, class_weights=None, use_focal=False, gamma=2.0, **kwargs):
        super().__init__(**kwargs)
        self._class_weights = None
        if class_weights is not None:
            self._class_weights = torch.tensor(class_weights, dtype=torch.float)
        self.use_focal = use_focal
        self.gamma = gamma

    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get("logits")

        if self._class_weights is not None and self._class_weights.device != logits.device:
            self._class_weights = self._class_weights.to(logits.device)

        if not self.use_focal:
            loss_fct = torch.nn.CrossEntropyLoss(weight=self._class_weights)
            loss = loss_fct(logits, labels)
        else:
            # Focal Loss
            log_probs = F.log_softmax(logits, dim=-1)
            probs = torch.exp(log_probs)
            pt = probs.gather(1, labels.unsqueeze(1)).squeeze(1)      # p_t
            focal_factor = (1 - pt) ** self.gamma
            nll = F.nll_loss(log_probs, labels, reduction='none', weight=self._class_weights)
            loss = (focal_factor * nll).mean()

        return (loss, outputs) if return_outputs else loss


# ===========================
# Main training function
# ===========================
def train_model(model_name, jpm_train, jpm_val, jpm_test,
                max_len=MAX_LEN, freeze_backbone=False, seed=SEED,
                use_focal=USE_FOCAL, focal_gamma=FOCAL_GAMMA,
                minority_weight_mult=MINORITY_WEIGHT_MULT):

    set_seed(seed)

    # --- Prepare splits (expects: question, answer, label) ---
    for _, df in [('train', jpm_train), ('val', jpm_val), ('test', jpm_test)]:
        df['text'] = prepare_text(df)
        df['labels'] = normalise_labels(df['label'])
        df.dropna(subset=['labels'], inplace=True)
        df['labels'] = df['labels'].astype('int64')

    train_ds = Dataset.from_pandas(jpm_train[["text","labels"]], preserve_index=False)
    val_ds   = Dataset.from_pandas(jpm_val[["text","labels"]],   preserve_index=False)
    test_ds  = Dataset.from_pandas(jpm_test[["text","labels"]],  preserve_index=False)

    # --- Tokenization ---
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
    def tokenize(batch):
        return tokenizer(batch['text'], truncation=True, padding='max_length', max_length=max_len)

    enc_train = train_ds.map(tokenize, batched=True, remove_columns=["text"])
    enc_val   = val_ds.map(tokenize,   batched=True, remove_columns=["text"])
    enc_test  = test_ds.map(tokenize,  batched=True, remove_columns=["text"])

    # --- Model ---
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name, num_labels=2, problem_type='single_label_classification'
    )

    if freeze_backbone:
        for name, p in model.named_parameters():
            if 'classifier' not in name and 'score' not in name:
                p.requires_grad = False

    # --- Class weights (amp minority) ---
    y = np.array(train_ds['labels'])
    cw = compute_class_weight(class_weight="balanced", classes=np.array([0,1]), y=y)
    cw[1] = cw[1] * float(minority_weight_mult)
    class_weights = cw

    # --- Training args ---
    args = TrainingArguments(
        output_dir=os.path.join(RUNS_DIR, model_name.replace('/','_')),
        eval_strategy="epoch",
        save_strategy="epoch",
        save_total_limit=2,
        logging_steps=50,
        learning_rate=3e-5 if "roberta" in model_name else 4e-5,
        weight_decay=0.01,
        warmup_ratio=0.06,
        gradient_accumulation_steps=1,
        per_device_train_batch_size=BATCH_TRAIN,
        per_device_eval_batch_size=BATCH_EVAL,
        num_train_epochs=NUM_EPOCHS,
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        greater_is_better=True,
        fp16=False,                      # MPS doesn't use fp16
        report_to="none",
        remove_unused_columns=False,
        seed=seed,
        dataloader_pin_memory=False,
    )

    # --- Metrics (argmax) for trainer/early stopping ---
    def compute_metrics(p):
        preds = p.predictions.argmax(-1)
        labels = p.label_ids
        acc = accuracy_score(labels, preds)
        pr, rc, f1, _ = precision_recall_fscore_support(labels, preds, average="binary", pos_label=1, zero_division=0)
        try:
            proba = torch.softmax(torch.tensor(p.predictions), dim=-1).numpy()[:,1]
            auc = roc_auc_score(labels, proba)
        except Exception:
            auc = float('nan')
        return {"accuracy": acc, "precision": pr, "recall": rc, "f1": f1, "auc": auc}

    trainer = WeightedTrainer(
        model=model,
        args=args,
        train_dataset=enc_train,
        eval_dataset=enc_val,
        compute_metrics=compute_metrics,
        class_weights=class_weights,
        use_focal=use_focal,
        gamma=focal_gamma,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]
    )

    # --- Train ---
    print(f"\nTraining {model_name}")
    trainer.train()

    # --- Pick threshold on VAL (strict F1; avoid degenerate all-pos/neg) ---
    val_pred = trainer.predict(enc_val)
    val_probs = torch.softmax(torch.tensor(val_pred.predictions), dim=-1).numpy()[:, 1]
    val_y = val_pred.label_ids

    best = choose_threshold_by_f1_strict(val_y, val_probs, min_pos=1, max_pos=len(val_y)-1)
    print(f"Chosen threshold (F1-optimized): {best}")

    # Sanity: show how many positives at chosen threshold
    val_pos = int((val_probs >= best["thr"]).sum())
    print(f"VAL predicted positives at thr={best['thr']:.3f}: {val_pos}/{len(val_y)}")

    # --- Evaluate on TEST with chosen threshold ---
    test_pred = trainer.predict(enc_test)
    test_probs = torch.softmax(torch.tensor(test_pred.predictions), dim=-1).numpy()[:, 1]
    test_y = test_pred.label_ids
    test_pred1 = (test_probs >= best["thr"]).astype(int)

    tpr, trc, tf1, _ = precision_recall_fscore_support(test_y, test_pred1, average="binary", pos_label=1, zero_division=0)
    tacc = accuracy_score(test_y, test_pred1)
    cm = confusion_matrix(test_y, test_pred1).tolist()

    print("VAL (argmax):", trainer.evaluate())
    print("TEST (thresholded):",
          {"accuracy": tacc, "precision": tpr, "recall": trc, "f1": tf1,
           "threshold": best["thr"], "confusion_matrix": cm})
    print(f"TEST predicted positives at thr={best['thr']:.3f}: {int(test_pred1.sum())}/{len(test_y)}")

    # --- Save final model OUTSIDE the repo ---
    save_path = os.path.join(MODELS_DIR, f"{model_name.replace('/','_')}_direct_evasive")
    os.makedirs(save_path, exist_ok=True)
    trainer.save_model(save_path)
    tokenizer.save_pretrained(save_path)
    print(f"Model saved to: {save_path}")

    return {"val_threshold": best, "test_metrics": {"accuracy": tacc, "precision": tpr, "recall": trc, "f1": tf1}}, (model, tokenizer)


# ===========================
# Run (expects jpm_train/jpm_val/jpm_test to exist with columns: question, answer, label)
# ===========================
results = {}
for model_name in ['distilroberta-base', 'microsoft/deberta-v3-small']:
    res, _ = train_model(
        model_name=model_name,
        jpm_train=jpm_train,
        jpm_val=jpm_val,
        jpm_test=jpm_test,
        max_len=MAX_LEN,
        freeze_backbone=False,
        seed=SEED,
        use_focal=USE_FOCAL,
        focal_gamma=FOCAL_GAMMA,
        minority_weight_mult=MINORITY_WEIGHT_MULT
    )
    results[model_name] = res

print("\nSummary:", results)


Map: 100%|██████████| 107/107 [00:00<00:00, 2442.70 examples/s]
Map: 100%|██████████| 43/43 [00:00<00:00, 4259.99 examples/s]
Map: 100%|██████████| 65/65 [00:00<00:00, 4053.43 examples/s]
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Training distilroberta-base


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1,Auc
1,No log,0.720518,0.744186,0.0,0.0,0.0,0.471591
2,No log,0.655463,0.255814,0.255814,1.0,0.407407,0.5
3,No log,0.642544,0.255814,0.255814,1.0,0.407407,0.579545
4,0.699600,0.620321,0.255814,0.255814,1.0,0.407407,0.616477
5,0.699600,0.621463,0.255814,0.255814,1.0,0.407407,0.585227


Chosen threshold (F1-optimized): {'thr': 0.5493720173835754, 'precision': 0.2682926829268293, 'recall': 1.0, 'f1': 0.4230769230769231}
VAL predicted positives at thr=0.549: 41/43


VAL (argmax): {'eval_loss': 0.6554630398750305, 'eval_accuracy': 0.2558139534883721, 'eval_precision': 0.2558139534883721, 'eval_recall': 1.0, 'eval_f1': 0.4074074074074074, 'eval_auc': 0.5, 'eval_runtime': 0.4895, 'eval_samples_per_second': 87.839, 'eval_steps_per_second': 6.128, 'epoch': 5.0}
TEST (thresholded): {'accuracy': 0.23076923076923078, 'precision': 0.15254237288135594, 'recall': 1.0, 'f1': 0.2647058823529412, 'threshold': 0.5493720173835754, 'confusion_matrix': [[6, 50], [0, 9]]}
TEST predicted positives at thr=0.549: 59/65
Model saved to: /Users/laurenbrixey/trained_models/evasion/distilroberta-base_direct_evasive


Map: 100%|██████████| 107/107 [00:00<00:00, 2709.61 examples/s]
Map: 100%|██████████| 43/43 [00:00<00:00, 4008.87 examples/s]
Map: 100%|██████████| 65/65 [00:00<00:00, 3278.69 examples/s]
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-small and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Training microsoft/deberta-v3-small


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1,Auc
1,No log,0.694409,0.744186,0.0,0.0,0.0,0.488636
2,No log,0.649149,0.255814,0.255814,1.0,0.407407,0.528409
3,No log,0.633938,0.255814,0.255814,1.0,0.407407,0.491477
4,0.707300,0.688178,0.255814,0.255814,1.0,0.407407,0.426136
5,0.707300,0.664632,0.232558,0.238095,0.909091,0.377358,0.482955


Chosen threshold (F1-optimized): {'thr': 0.5717650651931763, 'precision': 0.30303030303030304, 'recall': 0.9090909090909091, 'f1': 0.45454545454545453}
VAL predicted positives at thr=0.572: 33/43


VAL (argmax): {'eval_loss': 0.6491491794586182, 'eval_accuracy': 0.2558139534883721, 'eval_precision': 0.2558139534883721, 'eval_recall': 1.0, 'eval_f1': 0.4074074074074074, 'eval_auc': 0.5284090909090908, 'eval_runtime': 0.9301, 'eval_samples_per_second': 46.231, 'eval_steps_per_second': 3.225, 'epoch': 5.0}
TEST (thresholded): {'accuracy': 0.5230769230769231, 'precision': 0.19444444444444445, 'recall': 0.7777777777777778, 'f1': 0.3111111111111111, 'threshold': 0.5717650651931763, 'confusion_matrix': [[27, 29], [2, 7]]}
TEST predicted positives at thr=0.572: 36/65
Model saved to: /Users/laurenbrixey/trained_models/evasion/microsoft_deberta-v3-small_direct_evasive

Summary: {'distilroberta-base': {'val_threshold': {'thr': 0.5493720173835754, 'precision': 0.2682926829268293, 'recall': 1.0, 'f1': 0.4230769230769231}, 'test_metrics': {'accuracy': 0.23076923076923078, 'precision': 0.15254237288135594, 'recall': 1.0, 'f1': 0.2647058823529412}}, 'microsoft/deberta-v3-small': {'val_threshold'

In [None]:
# import os, numpy as np, torch
# import torch.nn.functional as F
# from datasets import Dataset
# from transformers import (
#     AutoTokenizer, AutoModelForSequenceClassification,
#     Trainer, TrainingArguments, EarlyStoppingCallback
# )
# from sklearn.metrics import precision_recall_fscore_support, accuracy_score, roc_auc_score
# from sklearn.utils.class_weight import compute_class_weight

# # --- 1) Keep EVERYTHING out of your repo ------------------------------------
# os.environ["HF_HOME"] = os.path.expanduser("~/hf_cache")               # models/tokenizers cache
# os.environ["HF_DATASETS_CACHE"] = os.path.expanduser("~/hf_datasets")  # datasets cache

# RUNS_DIR   = os.path.expanduser("~/ml_runs/evasion")                    # trainer checkpoints
# MODELS_DIR = os.path.expanduser("~/trained_models/evasion")             # final saved models
# os.makedirs(RUNS_DIR, exist_ok=True)
# os.makedirs(MODELS_DIR, exist_ok=True)

# # --- 2) Data prep ------------------------------------------------------------
# def prepare_text(df):
#     return (df["question"].fillna("") + " </s> " + df["answer"].fillna(""))

# label_map = {'direct': 0, 'evasive': 1}
# def normalise_labels(s):
#     return s.astype(str).str.lower().map(label_map)

# for split_name, df in [('train', jpm_train), ('val', jpm_val), ('test', jpm_test)]:
#     df['text'] = prepare_text(df)
#     df['labels'] = normalise_labels(df['label'])
#     df.dropna(subset=['labels'], inplace=True)
#     df['labels'] = df['labels'].astype('int64')

# jpm_train_llm = Dataset.from_pandas(jpm_train[["text","labels"]], preserve_index=False)
# jpm_val_llm   = Dataset.from_pandas(jpm_val[["text","labels"]],   preserve_index=False)
# jpm_test_llm  = Dataset.from_pandas(jpm_test[["text","labels"]],  preserve_index=False)

# device = 'mps' if torch.backends.mps.is_available() else 'cpu'

# # --- 3) Trainer with weighted CE or focal loss ------------------------------
# class WeightedTrainer(Trainer):
#     def __init__(self, class_weights=None, use_focal=False, gamma=2.0, **kwargs):
#         super().__init__(**kwargs)
#         self._class_weights = None
#         if class_weights is not None:
#             self._class_weights = torch.tensor(class_weights, dtype=torch.float)
#         self.use_focal = use_focal
#         self.gamma = gamma

#     def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
#         labels = inputs.get("labels")
#         outputs = model(**inputs)
#         logits = outputs.get("logits")

#         if self._class_weights is not None and self._class_weights.device != logits.device:
#             self._class_weights = self._class_weights.to(logits.device)

#         if not self.use_focal:
#             loss_fct = torch.nn.CrossEntropyLoss(weight=self._class_weights)
#             loss = loss_fct(logits, labels)
#         else:
#             log_probs = F.log_softmax(logits, dim=-1)
#             probs = torch.exp(log_probs)
#             pt = probs.gather(1, labels.unsqueeze(1)).squeeze(1)      # p_t
#             focal_factor = (1 - pt) ** self.gamma
#             nll = F.nll_loss(log_probs, labels, reduction='none', weight=self._class_weights)
#             loss = (focal_factor * nll).mean()

#         return (loss, outputs) if return_outputs else loss

# # --- 4) Training loop with fixes + thresholding -----------------------------
# def train_model(model_name, max_len=256, freeze_backbone=False, seed=42, use_focal=False):
#     tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

#     def tokenize(batch):
#         return tokenizer(
#             batch['text'],
#             truncation=True,
#             padding='max_length',
#             max_length=max_len
#         )

#     enc_train = jpm_train_llm.map(tokenize, batched=True, remove_columns=["text"])
#     enc_val   = jpm_val_llm.map(tokenize,   batched=True, remove_columns=["text"])
#     enc_test  = jpm_test_llm.map(tokenize,  batched=True, remove_columns=["text"])

#     model = AutoModelForSequenceClassification.from_pretrained(
#         model_name, num_labels=2, problem_type='single_label_classification'
#     )

#     if freeze_backbone:
#         for name, p in model.named_parameters():
#             if 'classifier' not in name and 'score' not in name:
#                 p.requires_grad = False

#     # Class weights (make minority class hit harder)
#     y = np.array(jpm_train_llm['labels'])
#     cw = compute_class_weight(class_weight="balanced", classes=np.array([0,1]), y=y)
#     cw[1] = cw[1] * 2.0  # try 2.0–3.0 if needed
#     class_weights = cw

#     args = TrainingArguments(
#         output_dir=os.path.join(RUNS_DIR, model_name.replace('/','_')),
#         eval_strategy="epoch",
#         save_strategy="epoch",
#         save_total_limit=2,
#         logging_steps=50,
#         learning_rate=3e-5 if "roberta" in model_name else 4e-5,
#         weight_decay=0.01,
#         warmup_ratio=0.06,
#         gradient_accumulation_steps=1,
#         per_device_train_batch_size=8,
#         per_device_eval_batch_size=16,
#         num_train_epochs=10,
#         load_best_model_at_end=True,
#         metric_for_best_model="f1",
#         greater_is_better=True,
#         fp16=False,                         # MPS doesn't use fp16
#         report_to="none",
#         remove_unused_columns=False,
#         seed=seed,
#         dataloader_pin_memory=False,
#     )

#     def compute_metrics(p):
#         preds = p.predictions.argmax(-1)
#         labels = p.label_ids
#         acc = accuracy_score(labels, preds)
#         pr, rc, f1, _ = precision_recall_fscore_support(labels, preds, average="binary", pos_label=1, zero_division=0)
#         try:
#             proba = torch.softmax(torch.tensor(p.predictions), dim=-1).numpy()[:,1]
#             auc = roc_auc_score(labels, proba)
#         except Exception:
#             auc = float('nan')
#         return {"accuracy": acc, "precision": pr, "recall": rc, "f1": f1, "auc": auc}

#     trainer = WeightedTrainer(
#         model=model,
#         args=args,
#         train_dataset=enc_train,
#         eval_dataset=enc_val,
#         compute_metrics=compute_metrics,
#         class_weights=class_weights,
#         use_focal=use_focal,
#         callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]
#     )

#     trainer.train()

#     # --- Post-training: choose threshold on VAL to maximize recall (then f1) ---
#     val_pred = trainer.predict(enc_val)
#     val_probs = torch.softmax(torch.tensor(val_pred.predictions), dim=-1).numpy()[:,1]
#     val_y = val_pred.label_ids

#     best = {"thr":0.5, "recall":-1.0, "precision":0.0, "f1":0.0}
#     for thr in np.linspace(0.1, 0.9, 17):
#         pred1 = (val_probs >= thr).astype(int)
#         pr, rc, f1, _ = precision_recall_fscore_support(val_y, pred1, average="binary", pos_label=1, zero_division=0)
#         if (rc > best["recall"]) or (rc == best["recall"] and f1 > best["f1"]):
#             best = {"thr":float(thr), "recall":float(rc), "precision":float(pr), "f1":float(f1)}
#     print(f"Chosen threshold (val-optimized): {best}")

#     # --- Evaluate on TEST with chosen threshold ---
#     test_pred = trainer.predict(enc_test)
#     test_probs = torch.softmax(torch.tensor(test_pred.predictions), dim=-1).numpy()[:,1]
#     test_y = test_pred.label_ids
#     test_pred1 = (test_probs >= best["thr"]).astype(int)
#     tpr, trc, tf1, _ = precision_recall_fscore_support(test_y, test_pred1, average="binary", pos_label=1, zero_division=0)
#     tacc = accuracy_score(test_y, test_pred1)
#     print("VAL (argmax):", trainer.evaluate())
#     print("TEST (thresholded):", {"accuracy": tacc, "precision": tpr, "recall": trc, "f1": tf1, "threshold": best["thr"]})

#     # --- Save final model OUTSIDE the repo ---
#     save_path = os.path.join(MODELS_DIR, f"{model_name.replace('/','_')}_direct_evasive")
#     os.makedirs(save_path, exist_ok=True)
#     trainer.save_model(save_path)
#     tokenizer.save_pretrained(save_path)
#     print(f"Model saved to: {save_path}")

#     return model, tokenizer

# # --- 5) Train the models -----------------------------------------------------
# models = [
#     'distilroberta-base',
#     'microsoft/deberta-v3-small'
# ]

# for m in models:
#     print(f"\nTraining {m}")
#     _ = train_model(m, max_len=256, freeze_backbone=False, seed=42, use_focal=False)



Training distilroberta-base


Map: 100%|██████████| 107/107 [00:00<00:00, 2367.94 examples/s]
Map: 100%|██████████| 43/43 [00:00<00:00, 1868.50 examples/s]
Map: 100%|██████████| 65/65 [00:00<00:00, 1881.11 examples/s]
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1,Auc
1,No log,0.665799,0.255814,0.255814,1.0,0.407407,0.519886
2,No log,0.647968,0.255814,0.255814,1.0,0.407407,0.551136
3,No log,0.642327,0.255814,0.255814,1.0,0.407407,0.514205
4,0.694800,0.635433,0.255814,0.255814,1.0,0.407407,0.548295


Chosen threshold (val-optimized): {'thr': 0.1, 'recall': 1.0, 'precision': 0.2558139534883721, 'f1': 0.4074074074074074}


VAL (argmax): {'eval_loss': 0.6657994985580444, 'eval_accuracy': 0.2558139534883721, 'eval_precision': 0.2558139534883721, 'eval_recall': 1.0, 'eval_f1': 0.4074074074074074, 'eval_auc': 0.5198863636363635, 'eval_runtime': 0.4842, 'eval_samples_per_second': 88.804, 'eval_steps_per_second': 6.196, 'epoch': 4.0}
TEST (thresholded): {'accuracy': 0.13846153846153847, 'precision': 0.13846153846153847, 'recall': 1.0, 'f1': 0.24324324324324326, 'threshold': 0.1}
Model saved to: /Users/laurenbrixey/trained_models/evasion/distilroberta-base_direct_evasive

Training microsoft/deberta-v3-small


Map: 100%|██████████| 107/107 [00:00<00:00, 4098.06 examples/s]
Map: 100%|██████████| 43/43 [00:00<00:00, 4341.72 examples/s]
Map: 100%|██████████| 65/65 [00:00<00:00, 4086.05 examples/s]
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-small and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1,Auc
1,No log,0.697952,0.744186,0.0,0.0,0.0,0.4375
2,No log,0.646467,0.255814,0.255814,1.0,0.407407,0.414773
3,No log,0.634962,0.255814,0.255814,1.0,0.407407,0.434659
4,0.707200,0.629265,0.255814,0.255814,1.0,0.407407,0.465909
5,0.707200,0.661363,0.255814,0.255814,1.0,0.407407,0.414773


Chosen threshold (val-optimized): {'thr': 0.1, 'recall': 1.0, 'precision': 0.2558139534883721, 'f1': 0.4074074074074074}


VAL (argmax): {'eval_loss': 0.6464669704437256, 'eval_accuracy': 0.2558139534883721, 'eval_precision': 0.2558139534883721, 'eval_recall': 1.0, 'eval_f1': 0.4074074074074074, 'eval_auc': 0.4147727272727273, 'eval_runtime': 0.9102, 'eval_samples_per_second': 47.241, 'eval_steps_per_second': 3.296, 'epoch': 5.0}
TEST (thresholded): {'accuracy': 0.13846153846153847, 'precision': 0.13846153846153847, 'recall': 1.0, 'f1': 0.24324324324324326, 'threshold': 0.1}
Model saved to: /Users/laurenbrixey/trained_models/evasion/microsoft_deberta-v3-small_direct_evasive


## **4.2 Functions for pipeline**

In [None]:
def model_max_len(tokenizer, model):
    m = getattr(tokenizer, "model_max_length", None)
    if m is None or m == int(1e30):
        m = getattr(getattr(model, "config", None), "max_position_embeddings", 512)
    return int(m or 512)

def token_len(tokenizer, text):
    return len(tokenizer.encode(text, add_special_tokens=False))

def compute_answer_budget(tokenizer, model, question, hyp_max_tokens, q_cap=128, safety_margin=12):
    max_len = model_max_len(tokenizer, model)            # usually 512
    specials = tokenizer.num_special_tokens_to_add(pair=True)
    q_tokens = min(token_len(tokenizer, question), q_cap)
    budget = max_len - specials - q_tokens - hyp_max_tokens - safety_margin
    return max(32, budget)

def chunk_answer_for_pair(tokenizer, answer, answer_budget, stride_tokens=128):
    """
    Chunk the ANSWER using tokenizer.tokenize (no model max-length checks),
    then stitch back to text with convert_tokens_to_string.
    """
    toks = tokenizer.tokenize(answer)  # <-- avoids the max-length warning
    if len(toks) <= answer_budget:
        return [answer]

    chunks, i = [], 0
    while i < len(toks):
        window_tokens = toks[i:i+answer_budget]
        window_text = tokenizer.convert_tokens_to_string(window_tokens)
        chunks.append(window_text)
        if i + answer_budget >= len(toks):
            break
        i += max(1, answer_budget - stride_tokens)
    return chunks

def pair_logits_chunks(model, tokenizer, device, premise, hypothesis, max_length=None, stride=128):
    if max_length is None:
        max_length = model_max_len(tokenizer, model)

    enc = tokenizer(
        premise,
        hypothesis,
        return_tensors='pt',
        truncation='only_first',          # split/truncate Q+A only
        max_length=max_length,
        stride=stride,
        return_overflowing_tokens=True,
        padding='max_length'              # <-- add this
    )

    # keep only keys the model expects
    input_names = set(getattr(tokenizer, "model_input_names",
                              ["input_ids", "attention_mask", "token_type_ids"]))

    def to_batch(enc_dict, i=None):
        batch = {}
        for k, v in enc_dict.items():
            if k in input_names and isinstance(v, torch.Tensor):
                batch[k] = (v[i:i+1] if i is not None else v).to(device)
        return batch

    # single chunk
    if enc["input_ids"].shape[0] == 1:
        batch = to_batch(enc)
        with torch.no_grad():
            logits = model(**batch).logits
        return [logits.squeeze(0)]

    # multiple overflowed chunks
    logits_list = []
    n = enc["input_ids"].shape[0]
    for i in range(n):
        batch = to_batch(enc, i)
        with torch.no_grad():
            out = model(**batch).logits
        logits_list.append(out.squeeze(0))
    return logits_list

def get_label_idx(model, name, default):
    id2label = getattr(model.config, "id2label", {})
    if id2label:
        for k, v in id2label.items():
            if name in str(v).lower():
                return int(k)
    return default

def p_entail_from_logits(logits, model, temperature=1.0):
    nlab = logits.shape[-1]
    ent_i = get_label_idx(model, "entail", 2 if nlab==3 else 1)
    probs = torch.softmax(logits / float(temperature), dim=-1)
    return float(probs[ent_i])

# --- your templates (unchanged) ---
DIRECT_TEMPLATES = [
    "The answer gives a direct and specific response to the question.",
    "The answer addresses the question explicitly and concretely.",
    "The answer responds directly with actionable specifics.",
]
EVASIVE_TEMPLATES = [
    "The answer avoids giving a direct response to the question.",
    "The answer is evasive or deflects without specifics.",
    "The answer sidesteps the question and withholds details.",
]

def llm_evasion_score(question, answer, model, tokenizer, device, temperature=2.0, stride=128):
    max_len = model_max_len(tokenizer, model)
    n_dir, n_eva = len(DIRECT_TEMPLATES), len(EVASIVE_TEMPLATES)

    p_ent_direct_list, p_ent_evasive_list = [], []

    premise = f"Q: {question}\nA: {answer}"

    # Collect P(entailment) for DIRECT hypotheses (over chunks), then mean over templates
    for h in DIRECT_TEMPLATES:
        logits_chunks = pair_logits_chunks(model, tokenizer, device, premise, h, max_length=max_len, stride=stride)
        # For each chunk, compute P(entail); take the max across chunks (recall-friendly)
        pents = [p_entail_from_logits(log, model, temperature) for log in logits_chunks]
        p_ent_direct_list.append(max(pents))

    # Same for EVASIVE hypotheses
    for h in EVASIVE_TEMPLATES:
        logits_chunks = pair_logits_chunks(model, tokenizer, device, premise, h, max_length=max_len, stride=stride)
        pents = [p_entail_from_logits(log, model, temperature) for log in logits_chunks]
        p_ent_evasive_list.append(max(pents))

    # Mean over templates
    p_ent_direct  = float(torch.tensor(p_ent_direct_list).mean())
    p_ent_evasive = float(torch.tensor(p_ent_evasive_list).mean())

    # Neutral-aware normalization (don’t force a 2-class softmax over logits)
    denom = p_ent_evasive + p_ent_direct + 1e-9
    p_evasive = float(p_ent_evasive / denom)
    p_direct  = 1.0 - p_evasive

    return {
        'p_direct': p_direct,
        'p_evasive': p_evasive,
        'p_ent_direct': p_ent_direct,
        'p_ent_evasive': p_ent_evasive
    }

# **5. Evasion Detection Pipeline**

## **5.1 Functions**

In [None]:
# Function to compute blended evasion score and return all scores.
def compute_all_evasion_scores(q, a, *, models_and_tokenizers=models_and_tokenizers, device, LLM_WEIGHT=0.30):
    
    # Compute baseline evasion score.
    base_score, _, _, _, _ = baseline_evasion_score(q, a)

    # Individual LLM scores.
    llm_scores = {}
    for name, (m, t) in models_and_tokenizers.items():
        scores = llm_evasion_score(q, a, m, t, device)
        llm_scores[name] = float(100.0 * scores['p_evasive'])

    # Ensemble LLM score.
    llm_avg = float(np.mean(list(llm_scores.values()))) if llm_scores else 0.0

    # Compute blended score.
    blended_score = float(np.clip((1.0 - LLM_WEIGHT) * base_score + LLM_WEIGHT * llm_avg, 0.0, 100.0))

    return {
        'baseline': base_score,
        'llm_individual': llm_scores,
        'llm_avg': llm_avg,
        'blended': blended_score
        }

In [None]:
# Function to label 'Direct' or 'Evasive' based on the score.
def label_from_score(score, threshold):
    return 'Evasive' if score >= threshold else 'Direct'

In [None]:
# Evasion Pipeline.
def evasion_pipeline(df, models_and_tokenizers, device, LLM_WEIGHT, EVASION_THRESHOLD_BASE, EVASION_THRESHOLD_LLM, EVASION_THRESHOLD_BLENDED):

    records = []

    for _, row in df.iterrows():
        q, a = str(row['question']), str(row['answer'])
        output = compute_all_evasion_scores(q=q, a=a, LLM_WEIGHT=LLM_WEIGHT, models_and_tokenizers=models_and_tokenizers, device=device)

        pred_base = label_from_score(output['baseline'], EVASION_THRESHOLD_BASE)
        pred_llm_avg = label_from_score(output['llm_avg'], EVASION_THRESHOLD_LLM)
        pred_blended = label_from_score(output['blended'], EVASION_THRESHOLD_BLENDED)

        record = {
            'question_number': row.get('question_number'),
            'question': q,
            'answer': a,

            # Evasion Scores
            'evasion_score_baseline': int(output['baseline']),
            'evasion_score_llm_avg': int(output['llm_avg']),
            "evasion_score_blended": int(output['blended']),

            # Predicted labels.
            'prediction_baseline': pred_base,
            'prediction_llm_avg': pred_llm_avg,
            'prediction_blended': pred_blended,
        }

        for model_name, score in output['llm_individual'].items():
            record[f'evasion_score_{model_name}'] = int(score)
            record[f'prediction_{model_name}'] = label_from_score(score, EVASION_THRESHOLD_LLM)

        records.append(record)

    return pd.DataFrame(records)

## **5.2 Threshold Tuning & Model Selection**

In [None]:
# Perform an initial run with preliminary threshold values.
LLM_WEIGHT = 0.30
EVASION_THRESHOLD_BASE = 30.0
EVASION_THRESHOLD_LLM = 30.0
EVASION_THRESHOLD_BLENDED = 30.0

jpm_val_qa_scores = evasion_pipeline(
    jpm_val_qa_labelled, 
    models_and_tokenizers, 
    device, 
    LLM_WEIGHT, 
    EVASION_THRESHOLD_BASE, 
    EVASION_THRESHOLD_LLM, 
    EVASION_THRESHOLD_BLENDED
    )

In [None]:
# View the results and reappend the label.
jpm_val_qa_scores['label'] = jpm_val_qa_labelled['label'].values
jpm_val_qa_scores.head()

In [None]:
# Function to extract ground truth (1 = Evasive, 0 = Direct)
def extract_y_true(df):
    return (df['label'].astype(str).str.strip().str.lower() == 'evasive').astype(int).values

In [None]:
# Function calculate metrics for each threshold.
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

def tune_threshold(df, score_col, thr_grid):
    y_true = extract_y_true(df)                     # get true labels
    scores = df[score_col].astype(float).values     # get raw evasion scores 

    rows = []
    for thr in thr_grid:
        y_pred = (scores >= thr).astype(int) # label response evasive (1) if score is higher than threshold

        precision = precision_score(y_true, y_pred, zero_division=0)
        recall = recall_score(y_true, y_pred, zero_division=0)
        f1 = f1_score(y_true, y_pred, zero_division=0)
        accuracy = accuracy_score(y_true, y_pred)

        rows.append({
            'threshold': float(thr),
            'precision': precision,
            'recall': recall,
            'f1': f1,
            'accuracy': accuracy
        })
    
    results = pd.DataFrame(rows).sort_values(
        by=['f1', 'recall'],
        ascending=[False, False]
        ).reset_index(drop=True)
    
    return results

In [None]:
# Define threshold ranges around current thresholds.
thr_base_grid = np.arange(40, 85, 5)
thr_llm_grid = np.arange(35, 85, 5)
thr_blend_grid = np.arange(40, 85, 5)

In [None]:
# Baseline / blended / avg LLM 
base_results = tune_threshold(jpm_val_qa_scores, 'evasion_score_baseline', thr_base_grid)
llm_avg_results = tune_threshold(jpm_val_qa_scores, 'evasion_score_llm_avg', thr_llm_grid)
blend_results = tune_threshold(jpm_val_qa_scores, 'evasion_score_blended', thr_blend_grid)

# Individual LLM models
roberta_results = tune_threshold(jpm_val_qa_scores, 'evasion_score_roberta', thr_llm_grid)
deberta_results = tune_threshold(jpm_val_qa_scores, 'evasion_score_deberta', thr_llm_grid)
zs_deberta_results = tune_threshold(jpm_val_qa_scores, 'evasion_score_zs_deberta', thr_llm_grid)

In [None]:
# Extract the best thresholds based on recall.
best_base_thr = base_results.loc[0, 'threshold']
best_avg_llm_thr = llm_avg_results.loc[0, 'threshold']
best_blend_thr = blend_results.loc[0, 'threshold']

best_roberta_thr = roberta_results.loc[0, 'threshold']
best_deberta_thr = deberta_results.loc[0, 'threshold']
best_zs_derberta_thr = zs_deberta_results.loc[0, 'threshold']

print('Best Baseline Threshold:', best_base_thr)
print('Best avg LLM Threshold:', best_avg_llm_thr)
print('Best Blended Threshold', best_base_thr)

print('Best roberta Threshold:', best_roberta_thr)
print('Best deberta Threshold:', best_deberta_thr)
print('Best zs deberta Threshold', best_zs_derberta_thr)

Best Baseline Threshold: 40.0
Best avg LLM Threshold: 50.0
Best Blended Threshold 40.0
Best roberta Threshold: 35.0
Best deberta Threshold: 60.0
Best zs deberta Threshold 55.0


In [None]:
# Inspect trade-offs.
print('\nTop 5 baseline configs:\n', base_results.head())
print('\nTop 5 llm configs:\n', llm_avg_results.head())
print('\nTop 5 blended configs:\n', blend_results.head())

print('\nTop 5 roberta configs:\n', roberta_results.head())
print('\nTop 5 deberta configs:\n', deberta_results.head())
print('\nTop 5 zs deberta configs:\n', zs_deberta_results.head())


Top 5 baseline configs:
    threshold  precision    recall        f1  accuracy
0       40.0   0.208955  1.000000  0.345679  0.258741
1       45.0   0.183486  0.714286  0.291971  0.321678
2       65.0   0.209677  0.464286  0.288889  0.552448
3       70.0   0.236842  0.321429  0.272727  0.664336
4       55.0   0.177778  0.571429  0.271186  0.398601

Top 5 llm configs:
    threshold  precision    recall        f1  accuracy
0       50.0   0.211538  0.785714  0.333333  0.384615
1       35.0   0.198529  0.964286  0.329268  0.230769
2       40.0   0.198413  0.892857  0.324675  0.272727
3       55.0   0.215190  0.607143  0.317757  0.489510
4       45.0   0.198198  0.785714  0.316547  0.335664

Top 5 blended configs:
    threshold  precision    recall        f1  accuracy
0       40.0   0.201439  1.000000  0.335329  0.223776
1       65.0   0.250000  0.464286  0.325000  0.622378
2       45.0   0.193548  0.857143  0.315789  0.272727
3       50.0   0.183486  0.714286  0.291971  0.321678
4       55

## **5.2 Optimised Evaluation**

## **5.3 2025 Predictions**