<a href="https://colab.research.google.com/github/MMillward2012/deepmind_internship/blob/main/notebooks/7_benchmarks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!ls

initial_benchmarks.ipynb    initial_train_finbert.ipynb


### <u>1. Imports</u>

**System & Data**: `pathlib` for handling file paths, `gc` for memory management, `pandas` for loading our data, and `scikit-learn` for splitting it.

**Machine Learning**: `torch` for the base models and `onnxruntime` for inference and its quantisation tools.

**Hugging Face**: `transformers` for loading our pre-trained models and tokenisers.

We also import Python's `logging` library to set the log level for onnxruntime to `ERROR`. This prevents routine warnings from cluttering the output during the quantisation process.

In [71]:
# Standard & System
import gc
from pathlib import Path
import pandas as pd
from sklearn.model_selection import train_test_split

# Data & ML
import numpy as np
import torch
import onnxruntime as ort
from onnxruntime.quantization import quantize_static, CalibrationDataReader, QuantType

# Hugging Face
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Suppress ONNX Runtime logging
import logging

logging.getLogger("onnxruntime").setLevel(logging.ERROR)

### <u>2. Configuration & Model Discovery</u>

This section sets up key configuration variables and prepares all the assets needed for the main loop.

`BASE_DIR`: The root folder where your models are stored.

`ONNX_OPSET_VERSION`: We set this to 17, a modern version required for newer model architectures.

`DATA_FILE_PATH`, `RANDOM_SEED`, `TEST_SIZE`: Parameters to ensure we can reliably load and split our financial text data to get a consistent calibration set.

In [72]:
# Model & ONNX Configuration
BASE_DIR = Path("models")
ONNX_OPSET_VERSION = 17

# Data & Split Configuration 
DATA_FILE_PATH = Path("data/FinancialPhraseBank/all-data.csv")
RANDOM_SEED = 42
TEST_SIZE = 0.25 # 25% for the test set


def is_valid_model_dir(d: Path) -> bool:
    """Checks if a directory contains a valid Hugging Face model."""
    config_exists = (d / "config.json").exists()
    model_file_exists = (d / "pytorch_model.bin").exists() or (d / "model.safetensors").exists()
    return config_exists and model_file_exists


def prepare_calibration_data(data_path, test_size, random_seed, num_samples=100):
    """Loads, splits, and samples the data to create a calibration set."""
    print(f"Loading data from {data_path}...")
    df = pd.read_csv(
        data_path,
        header=None,
        names=['sentiment', 'text'],
        encoding='latin-1')

    # Split data to get the test set
    _, test_df = train_test_split(
        df, test_size=test_size, random_state=random_seed, stratify=df['sentiment'])

    # Sample the calibration set from the test data
    calibration_df = test_df.sample(n=num_samples, random_state=random_seed)
    print(f"✅ Created a calibration dataset with {len(calibration_df)} samples.")
    return calibration_df


# Find all valid model directories
model_dirs = [d for d in BASE_DIR.iterdir() if d.is_dir() and is_valid_model_dir(d)]
print(f"✅ Found {len(model_dirs)} valid models.")

# Call the function to prepare data
calibration_df = prepare_calibration_data(DATA_FILE_PATH, TEST_SIZE, RANDOM_SEED)


✅ Found 5 valid models.
Loading data from data/FinancialPhraseBank/all-data.csv...
✅ Created a calibration dataset with 100 samples.


### <u>3. ONNX Helper Class & Export Function</u>

Here we define the specialised helper classes for our pipeline, `ONNXExportWrapper`.

This is a small `torch.nn.Module` that wraps our Hugging Face model. Its only job is to ensure the model's output is a simple logits tensor, which is a standard requirement for ONNX export.

`TextCalibrationDataReader` is an essential class for performing static quantisation. Its role is to feed our calibration data to the ONNX Runtime tool. It's built to be robust:

It inspects the ONNX model file to find out exactly which inputs it needs (e.g., `input_ids`, `attention_mask`).

It tokenises the text from our calibration dataframe.

It then provides this data one sample at a time, ensuring the dictionary it yields perfectly matches the model's required inputs. This prevents errors when quantising models with different architectures.

In [73]:
class ONNXExportWrapper(torch.nn.Module):
    """A wrapper to ensure model output is a simple tensor for ONNX compatibility."""
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, input_ids, attention_mask):
        outputs = self.model(
            input_ids=input_ids, attention_mask=attention_mask, return_dict=False
        )
        return outputs[0]


class TextCalibrationDataReader(CalibrationDataReader):
    """A robust data reader that adapts to the model's specific inputs."""
    def __init__(self, data_df: pd.DataFrame, tokenizer, onnx_model_path: Path):
        self.tokenizer = tokenizer
        self.data_list = data_df["text"].tolist()
        self.index = 0

        # Find the model's required inputs
        session = ort.InferenceSession(str(onnx_model_path), providers=["CPUExecutionProvider"])
        model_inputs = {input.name for input in session.get_inputs()}

        # Tokenize all data and filter to only include the model's inputs
        tokenized_data = self.tokenizer(
            self.data_list, padding="max_length", truncation=True, max_length=128, return_tensors="np"
        )
        self.feed = {
            key: tokenized_data[key] for key in tokenized_data if key in model_inputs
        }
        self.input_names = list(self.feed.keys())

    def get_next(self):
        if self.index >= len(self.data_list):
            return None

        item = {name: self.feed[name][self.index:self.index+1] for name in self.input_names}
        self.index += 1
        return item

### <u>4. Main Processing & Export Loop</u>

This is the main loop where everything comes together. For clarity, the export_model_to_onnx function is defined here. For each model found, the loop performs a sequence of optimised steps:

**Step 1: Export to ONNX**

It first checks if a standard ONNX version of the model already exists. If it doesn't, it loads the full PyTorch model and tokeniser, exports the model to the ONNX format using our helper function, and then clears the large model from memory to conserve resources.

**Step 2: Static Quantisation**

Next, it checks if a final, statically quantised model exists. If not, it uses our `TextCalibrationDataReader` to feed the calibration data into the `quantize_static` function. This tool analyses the data flow through the model and creates a highly efficient, production-ready `.onnx` file with 8-bit integer weights and activations.

In [74]:
## 🔁 Main Processing Loop

def export_model_to_onnx(model, tokenizer, onnx_path: Path, opset_version: int):
    """Exports a PyTorch model to the ONNX format."""
    print("   - Wrapping model for ONNX export...")
    wrapped_model = ONNXExportWrapper(model)
    wrapped_model.eval()
    dummy_input = tokenizer("This is a sample sentence.", return_tensors="pt")
    print(f"   - 🚀 Exporting to ONNX (Opset {opset_version})...")
    torch.onnx.export(
        model=wrapped_model,
        args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
        f=str(onnx_path), input_names=["input_ids", "attention_mask"], output_names=["output"],
        dynamic_axes={
            "input_ids": {0: "batch_size", 1: "sequence_length"},
            "attention_mask": {0: "batch_size", 1: "sequence_length"},
            "output": {0: "batch_size"},
        },
        opset_version=opset_version, do_constant_folding=True,
    )
    print(f"   - ✅ Model successfully exported to {onnx_path.name}")

for model_dir in model_dirs:
    print("-" * 70)
    print(f"⏳ Processing model: {model_dir.name}")

    onnx_dir = model_dir / "onnx"
    onnx_dir.mkdir(exist_ok=True)
    onnx_model_path = onnx_dir / "model.onnx"
    quantised_model_path = onnx_dir / "model-static-quant.onnx"

    # --- Step 1: Export to ONNX if needed ---
    if not onnx_model_path.exists():
        print("   - 📦 ONNX model not found. Starting export...")
        try:
            tokenizer = AutoTokenizer.from_pretrained(model_dir)
            model = AutoModelForSequenceClassification.from_pretrained(model_dir)
            export_model_to_onnx(model, tokenizer, onnx_model_path, ONNX_OPSET_VERSION)
            del model, tokenizer
            gc.collect()
        except Exception as e:
            print(f"   - ❌ Export failed for {model_dir.name}: {e}")
            continue
    else:
        print(f"   - ✅ Standard ONNX model already exists.")

    # --- Step 2: Perform Static Quantisation if needed ---
    if onnx_model_path.exists() and not quantised_model_path.exists():
        print(f"   - ⚖️ Performing static quantisation for {onnx_model_path.name}...")
        try:
            tokenizer = AutoTokenizer.from_pretrained(model_dir)
            calibration_data_reader = TextCalibrationDataReader(calibration_df, tokenizer, onnx_model_path)
            quantize_static(
                model_input=onnx_model_path,
                model_output=quantised_model_path,
                calibration_data_reader=calibration_data_reader,
                weight_type=QuantType.QInt8,
            )
            print(f"   - ✅ Statically quantised model saved to {quantised_model_path.name}")
        except Exception as e:
            print(f"   - ❌ Static quantisation failed for {model_dir.name}: {e}")
    elif quantised_model_path.exists():
         print(f"   - ✅ Statically quantised model already exists.")

print("-" * 70)
print("🎉 All models have been processed.")

----------------------------------------------------------------------
⏳ Processing model: all-MiniLM-L6-v2-financial-sentiment
   - ✅ Standard ONNX model already exists.
   - ✅ Statically quantised model already exists.
----------------------------------------------------------------------
⏳ Processing model: distilbert-financial-sentiment
   - ✅ Standard ONNX model already exists.
   - ✅ Statically quantised model already exists.
----------------------------------------------------------------------
⏳ Processing model: finbert-tone-financial-sentiment
   - ✅ Standard ONNX model already exists.
   - ✅ Statically quantised model already exists.
----------------------------------------------------------------------
⏳ Processing model: tinybert-financial-classifier
   - ✅ Standard ONNX model already exists.
   - ✅ Statically quantised model already exists.
----------------------------------------------------------------------
⏳ Processing model: mobilebert-uncased-financial-sentiment
   

### <u> 1. Setup & Configuration </u>

This first cell handles the initial setup. We import all necessary libraries, configure the logging to provide clean output, and define the dataclasses that will hold our configuration settings (BenchmarkConfig) and store the final measurements (BenchmarkResult).

In [75]:
import gc
import logging
import platform
import statistics
import time
from contextlib import contextmanager
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import Dict, List, Optional, Tuple

import numpy as np
import onnxruntime as ort
import pandas as pd
import psutil
from sklearn.metrics import accuracy_score, f1_score
from sklearn.model_selection import train_test_split
from transformers import AutoTokenizer

# Configure logging for clear output in the notebook
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


@dataclass
class BenchmarkConfig:
    """Configuration for the entire benchmarking run."""
    benchmark_iterations: int = 100
    warmup_iterations: int = 20
    batch_sizes: List[int] = None
    accuracy_sample_size: int = 500
    test_csv_path: Optional[str] = None
    device_mode: str = "auto"  # "auto", "cpu", or "gpu"

    def __post_init__(self):
        if self.batch_sizes is None:
            self.batch_sizes = [1, 2, 4, 8]

@dataclass
class BenchmarkResult:
    """A structured class to hold results from a single benchmark run."""
    model: str
    batch_size: int
    avg_latency_ms: float
    p95_latency_ms: float
    throughput_samples_per_sec: float
    peak_memory_mb: float
    model_size_mb: float
    provider: str
    accuracy: Optional[float] = None
    f1_score: Optional[float] = None

    def to_dict(self) -> Dict:
        return asdict(self)

### <u> 2. Component: Hardware & Model Loading </u>

These classes handle the initialisation of the ONNX Runtime session. The ExecutionProviderManager intelligently selects the best hardware available (e.g., CUDAExecutionProvider for NVIDIA GPUs), and the ModelLoader creates the inference session.

In [76]:
class ExecutionProviderManager:
    """Manages ONNX execution providers based on platform and preferences."""
    @staticmethod
    def get_execution_providers(mode: str = "auto") -> List[str]:
        available = ort.get_available_providers()
        # For Linux/Windows, prioritise GPU; for macOS, prioritise CPU
        preferences = ["CUDAExecutionProvider", "CPUExecutionProvider"]
        if platform.system() == "Darwin":
            preferences = ["CPUExecutionProvider", "CoreMLExecutionProvider"]
        
        chosen = [p for p in preferences if p in available]
        logger.info(f"Available providers: {available}. Auto-selected: {chosen}")
        return chosen

class ModelLoader:
    """Handles loading an ONNX model into an inference session."""
    @staticmethod
    def load_onnx_session(onnx_path: Path, providers: List[str]) -> ort.InferenceSession:
        opts = ort.SessionOptions()
        opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
        logger.info(f"Creating ONNX session for {onnx_path.name} with providers: {providers}")
        return ort.InferenceSession(str(onnx_path), providers=providers, sess_options=opts)

### <u> 3. Component: Data Handling </u>

The DataProcessor class is responsible for all data-related tasks. It can load the test dataset from a CSV file and prepare text inputs by tokenising them into the format required by the models.

In [77]:
class DataProcessor:
    """Handles data preprocessing and batch preparation."""
    def __init__(self, tokenizer, max_length: int = 128):
        self.tokenizer, self.max_length = tokenizer, max_length
        self.example_inputs = ["Stocks surged after the company reported record earnings."]

    def prepare_batch_inputs(self, texts: List[str]) -> Dict[str, np.ndarray]:
        encoding = self.tokenizer(
            texts, return_tensors="np", max_length=self.max_length,
            padding="max_length", truncation=True
        )
        # Ensure all model inputs are int64, as required by ONNX
        return {k: v.astype(np.int64) for k, v in encoding.items()}

    def load_test_dataset(self, csv_path: Path) -> Tuple[List[str], List[int]]:
        df = pd.read_csv(csv_path, names=["label", "text"], encoding="latin1")
        df = df.dropna(subset=["label", "text"])
        df["label"] = df["label"].astype('category').cat.codes
        _, test_df = train_test_split(df, test_size=0.25, random_state=42, stratify=df["label"])
        logger.info(f"Loaded {len(df)} rows; using {len(test_df)} for accuracy testing.")
        return test_df["text"].tolist(), test_df["label"].astype(int).tolist()

### <u> 4. Component: Performance Measurement </u>

These classes handle the actual performance measurements. PerformanceMonitor provides tools to track memory usage, while LatencyBenchmarker runs the model through many iterations to accurately measure inference speed.

In [78]:
class PerformanceMonitor:
    """Monitors system performance during benchmarking."""
    @staticmethod
    def measure_memory_usage() -> float:
        """Returns the current process's memory usage in MB."""
        return psutil.Process().memory_info().rss / (1024**2)

    @staticmethod
    def get_model_size_mb(onnx_path: Path) -> float:
        """Returns the model's file size in MB."""
        return onnx_path.stat().st_size / (1024**2)

class LatencyBenchmarker:
    """Handles the details of latency benchmarking."""
    def __init__(self, config: BenchmarkConfig):
        self.config = config
    
    def warmup_session(self, session: ort.InferenceSession, inputs: Dict[str, np.ndarray]):
        logger.info(f"🔥 Warming up with {self.config.warmup_iterations} iterations...")
        for _ in range(self.config.warmup_iterations):
            session.run(None, inputs)
    
    def measure_latency(self, session: ort.InferenceSession, inputs: Dict[str, np.ndarray]) -> List[float]:
        logger.info(f"⏱️  Running latency benchmark ({self.config.benchmark_iterations} iterations)...")
        times = []
        for _ in range(self.config.benchmark_iterations):
            start = time.perf_counter()
            session.run(None, inputs)
            times.append((time.perf_counter() - start) * 1000) # Append time in ms
        return times

### <u> 5. Component: Accuracy Evaluation </u>

The AccuracyEvaluator class is dedicated to measuring the model's correctness. It runs the model on the test dataset to calculate the accuracy and F1 score.

In [79]:
class AccuracyEvaluator:
    """Handles model accuracy and F1 score evaluation."""
    def __init__(self, session: ort.InferenceSession, data_processor: DataProcessor):
        self.session, self.data_processor = session, data_processor
    
    def evaluate(self, texts: List[str], labels: List[int], batch_size: int, max_samples: int):
        num_samples = min(len(texts), max_samples)
        eval_texts, eval_labels = texts[:num_samples], labels[:num_samples]
        
        logger.info(f"🎯 Evaluating accuracy on {num_samples} samples (batch size: {batch_size})...")
        all_predictions = []
        for i in range(0, num_samples, batch_size):
            batch_texts = eval_texts[i: i + batch_size]
            inputs = self.data_processor.prepare_batch_inputs(batch_texts)
            
            # Filter inputs to only what the model actually needs
            model_inputs = {inp.name for inp in self.session.get_inputs()}
            valid_inputs = {k: v for k, v in inputs.items() if k in model_inputs}
            
            outputs = self.session.run(None, valid_inputs)
            all_predictions.extend(np.argmax(outputs[0], axis=1))
        
        accuracy = accuracy_score(eval_labels, all_predictions)
        f1 = f1_score(eval_labels, all_predictions, average="weighted")
        logger.info(f"   -> Accuracy: {accuracy:.2%}, F1 Score: {f1:.2%}")
        return accuracy, f1

### <u> 6. Main Orchestrator Class </u>

This is the main orchestrator class. ONNXModelBenchmarker initialises all the helper components and contains the primary benchmark_model method, which executes the full benchmark pipeline for a single model.

In [80]:
class ONNXModelBenchmarker:
    """Orchestrates all components to run a benchmark for a single model."""
    def __init__(self, config: BenchmarkConfig, tokenizer):
        self.config = config
        self.data_processor = DataProcessor(tokenizer, max_length=128)
        self.latency_benchmarker = LatencyBenchmarker(config)
    
    def benchmark_model(self, model_name: str, onnx_path: Path, batch_size: int) -> Optional[BenchmarkResult]:
        try:
            logger.info(f"\n{'='*60}\n🚀 BENCHMARKING: {model_name} | Batch Size: {batch_size}\n{'='*60}")
            providers = ExecutionProviderManager.get_execution_providers(self.config.device_mode)
            session = ModelLoader.load_onnx_session(onnx_path, providers)

            # Prepare inputs and filter to what the model actually needs
            inputs = self.data_processor.prepare_batch_inputs(self.data_processor.example_inputs * batch_size)
            model_inputs = {inp.name for inp in session.get_inputs()}
            valid_inputs = {k: v for k, v in inputs.items() if k in model_inputs}

            mem_before = PerformanceMonitor.measure_memory_usage()
            self.latency_benchmarker.warmup_session(session, valid_inputs)
            times = self.latency_benchmarker.measure_latency(session, valid_inputs)
            mem_after = PerformanceMonitor.measure_memory_usage()
            
            avg_latency = statistics.mean(times)
            p95_latency = np.percentile(times, 95)
            
            accuracy, f1 = None, None
            if self.config.test_csv_path:
                evaluator = AccuracyEvaluator(session, self.data_processor)
                texts, labels = self.data_processor.load_test_dataset(Path(self.config.test_csv_path))
                accuracy, f1 = evaluator.evaluate(texts, labels, batch_size, self.config.accuracy_sample_size)

            return BenchmarkResult(
                model=model_name, batch_size=batch_size,
                avg_latency_ms=avg_latency, p95_latency_ms=p95_latency,
                throughput_samples_per_sec=(1000 * batch_size) / avg_latency if avg_latency > 0 else 0,
                peak_memory_mb=mem_after,
                model_size_mb=PerformanceMonitor.get_model_size_mb(onnx_path),
                provider=session.get_providers()[0],
                accuracy=accuracy, f1_score=f1
            )
        except Exception as e:
            logger.error(f"❌ Benchmark failed for {model_name} (batch {batch_size}): {e}", exc_info=True)
            return None

### <u> 7. Results Management </u>

The ResultsManager class handles the final reporting. It prints a clean summary table to the console and saves the full, detailed results to a CSV file.

In [81]:
class ResultsManager:
    """Manages benchmark results and reporting."""
    @staticmethod
    def save_results(results: List[BenchmarkResult], output_dir: Path = Path("results2")):
        if not results: return
        output_dir.mkdir(exist_ok=True)
        df = pd.DataFrame([r.to_dict() for r in results])
        df.to_csv(output_dir / "benchmark_results_debugging.csv", index=False)
        logger.info(f"💾 Results saved to '{output_dir.resolve()}'")

    @staticmethod
    def print_summary(results: List[BenchmarkResult]):
        if not results: return
        df = pd.DataFrame([r.to_dict() for r in results])
        summary_cols = ["model", "batch_size", "provider", "avg_latency_ms", "p95_latency_ms", "accuracy"]
        print("\n" + "="*80 + "\n📊 BENCHMARK SUMMARY\n" + "="*80)
        print(df[summary_cols].to_string(index=False, float_format="%.2f"))
        print("="*80)

### <u> 8. Benchmark Runner Functions </u>

These two functions drive the whole process. discover_models scans the models directory to find all the .onnx files, and run_full_benchmark iterates through them, calling the orchestrator and collecting the results.

In [82]:
def discover_models(models_dir: str) -> List[Tuple[str, Path]]:
    """Discover all available ONNX models, preferring statically quantised versions."""
    valid_models = []
    logger.info(f"--- Step 1: Discovering models in '{models_dir}' ---")
    for model_dir in Path(models_dir).iterdir():
        if not model_dir.is_dir() or not (model_dir / "onnx").exists(): continue
        
        # Prioritise the statically quantised model
        quant_path = model_dir / "onnx" / "model-quantised.onnx"
        if quant_path.exists():
            valid_models.append((f"{model_dir.name}-quant", quant_path))
            logger.info(f"  ✓ Found Quantised: {model_dir.name} ({quant_path.name})")
        
        # # Also add the standard model if it exists
        # standard_path = model_dir / "onnx" / "model.onnx"
        # if standard_path.exists():
        #     valid_models.append((model_dir.name, standard_path))
        #     logger.info(f"  ✓ Found Standard: {model_dir.name} ({standard_path.name})")

    if not valid_models: logger.warning(f"No valid .onnx models found.")
    return valid_models

def run_full_benchmark(models_dir: str, config: BenchmarkConfig):
    """Run the full benchmark suite on all discovered models."""
    all_results = []
    
    valid_models = discover_models(models_dir)
    if not valid_models: return

    logger.info("\n--- Step 2: Running all benchmarks ---")
    for model_name, onnx_path in valid_models:
        # --- KEY CHANGE: Load the correct tokenizer for THIS model ---
        model_dir_path = onnx_path.parent.parent # Navigate up from 'model.onnx' -> 'onnx' -> model folder
        logger.info(f"Loading specific tokenizer from: {model_dir_path}")
        tokenizer = AutoTokenizer.from_pretrained(model_dir_path)
        
        # Create a new benchmarker instance with the correct tokenizer
        benchmarker = ONNXModelBenchmarker(config, tokenizer)
        
        for batch_size in config.batch_sizes:
            result = benchmarker.benchmark_model(model_name, onnx_path, batch_size)
            if result: all_results.append(result)
    
    if all_results:
        logger.info("\n--- Step 3: Reporting results ---")
        ResultsManager.print_summary(all_results)
        ResultsManager.save_results(all_results)

### <u> 9. Execute the Benchmark </u>

This final cell is the "run" button. It defines your specific configuration, initialises the tokenizer, and then calls run_full_benchmark to kick off the entire process.

In [83]:
# --- Define configuration and run the benchmark ---
MODELS_DIRECTORY = "models"
ACCURACY_DATASET_PATH = "data/FinancialPhraseBank/all-data.csv"

logger.info("--- Initialising Benchmark ---")

# Create the configuration for this run
benchmark_config = BenchmarkConfig(
    batch_sizes=[1, 4, 8],
    test_csv_path=ACCURACY_DATASET_PATH
)

# Run the benchmark (no longer needs a tokenizer passed in)
run_full_benchmark(MODELS_DIRECTORY, benchmark_config)

logger.info("\n--- Benchmark Finished ---")

2025-07-28 09:28:51,555 - INFO - --- Initialising Benchmark ---
2025-07-28 09:28:51,557 - INFO - --- Step 1: Discovering models in 'models' ---
2025-07-28 09:28:51,558 - INFO -   ✓ Found Quantised: all-MiniLM-L6-v2-financial-sentiment (model-quantised.onnx)
2025-07-28 09:28:51,559 - INFO -   ✓ Found Quantised: distilbert-financial-sentiment (model-quantised.onnx)
2025-07-28 09:28:51,560 - INFO -   ✓ Found Quantised: finbert-tone-financial-sentiment (model-quantised.onnx)
2025-07-28 09:28:51,561 - INFO -   ✓ Found Quantised: tinybert-financial-classifier (model-quantised.onnx)
2025-07-28 09:28:51,561 - INFO -   ✓ Found Quantised: mobilebert-uncased-financial-sentiment (model-quantised.onnx)
2025-07-28 09:28:51,562 - INFO - 
--- Step 2: Running all benchmarks ---
2025-07-28 09:28:51,562 - INFO - Loading specific tokenizer from: models/all-MiniLM-L6-v2-financial-sentiment
2025-07-28 09:28:51,666 - INFO - 
🚀 BENCHMARKING: all-MiniLM-L6-v2-financial-sentiment-quant | Batch Size: 1
2025-07-2

KeyboardInterrupt: 