<a href="https://colab.research.google.com/github/MMillward2012/deepmind_internship/blob/main/notebooks/7_benchmarks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Import packages

In [45]:
# %cd ..
!ls

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


README.md        [34mmodels[m[m           [34mresults[m[m
[34mdata[m[m             [34mnotebooks[m[m        [34msrc[m[m
[34mfigures[m[m          requirements.txt [34mvenv-py311[m[m


In [32]:
import os
import time
import numpy as np
import pandas as pd
from pathlib import Path
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
from transformers.onnx import export
from transformers.onnx.features import FeaturesManager
import onnxruntime as ort
from onnxruntime.quantization import quantize_dynamic, QuantType
import torch


In [33]:
BASE_DIR = Path("models")
ONNX_OPSET = 13

In [34]:
def is_valid_model_dir(d):
    return (d / "config.json").exists() and ((d / "pytorch_model.bin").exists() or (d / "model.safetensors").exists())

In [35]:
model_dirs = [d for d in BASE_DIR.iterdir() if d.is_dir() and is_valid_model_dir(d)]
print("Found valid models:", [m.name for m in model_dirs])

Found valid models: ['all-MiniLM-L6-v2-financial-sentiment', 'distilbert-financial-sentiment', 'finbert-tone-financial-sentiment', 'SmolLM2-360M-Instruct-financial-sentiment', 'tinybert-financial-classifier', 'mobilebert-uncased-financial-sentiment']


In [36]:
class ONNXExportWrapper(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, input_ids, attention_mask):
        # Call model with return_dict=False to get a tuple output
        outputs = self.model(input_ids=input_ids, attention_mask=attention_mask, return_dict=False)
        # Return only the logits tensor (usually first element)
        return outputs[0]


In [37]:
def export_to_onnx(model_dir, onnx_path):
    print("🔍 Loading model and tokenizer...")
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModelForSequenceClassification.from_pretrained(model_dir)
    model.eval()

    wrapped_model = ONNXExportWrapper(model)  # Wrap the model here

    dummy_input = tokenizer("This company is doing great!", return_tensors="pt")

    print("🚀 Exporting to ONNX...")
    torch.onnx.export(
        wrapped_model,
        (dummy_input["input_ids"], dummy_input["attention_mask"]),
        str(onnx_path),
        input_names=["input_ids", "attention_mask"],
        output_names=["output"],
        dynamic_axes={
            "input_ids": {0: "batch_size", 1: "sequence_length"},
            "attention_mask": {0: "batch_size", 1: "sequence_length"},
            "output": {0: "batch_size"},
        },
        opset_version=17,  # Use >=14 due to scaled_dot_product_attention operator support
        do_constant_folding=True,
    )
    print(f"✅ Exported to {onnx_path}")


In [38]:
results = []

for model_dir in model_dirs:
    print(f"\n⏳ Processing {model_dir.name}...")
    
    onnx_dir = model_dir / "onnx"
    onnx_dir.mkdir(exist_ok=True)
    onnx_model_path = onnx_dir / "model.onnx"
    quantised_model_path = onnx_dir / "model-int8.onnx"

    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_dir)

    # Export ONNX if not already done
    if not onnx_model_path.exists():
        print("📦 Exporting to ONNX...")
        export_to_onnx(model_dir, onnx_model_path)
    else:
        print("✅ ONNX already exists.")


⏳ Processing all-MiniLM-L6-v2-financial-sentiment...
✅ ONNX already exists.

⏳ Processing distilbert-financial-sentiment...
✅ ONNX already exists.

⏳ Processing finbert-tone-financial-sentiment...
📦 Exporting to ONNX...
🔍 Loading model and tokenizer...
🚀 Exporting to ONNX...
✅ Exported to models/finbert-tone-financial-sentiment/onnx/model.onnx

⏳ Processing SmolLM2-360M-Instruct-financial-sentiment...
✅ ONNX already exists.

⏳ Processing tinybert-financial-classifier...
✅ ONNX already exists.

⏳ Processing mobilebert-uncased-financial-sentiment...
✅ ONNX already exists.


In [51]:
import os
from onnxruntime.quantization import quantize_dynamic, QuantType

models_dir = "models"  # root directory containing model subfolders

def quantize_all_models(models_root):
    for model_name in os.listdir(models_root):
        model_path = os.path.join(models_root, model_name, "onnx", "model.onnx")
        
        if not os.path.isfile(model_path):
            print(f"[SKIP] No ONNX model found for {model_name} at expected path: {model_path}")
            continue
        
        quantized_model_path = os.path.join(models_root, model_name, "onnx", "model_quantized.onnx")
        print(f"[PROCESSING] Quantizing model '{model_name}'")
        
        try:
            quantize_dynamic(
                model_input=model_path,
                model_output=quantized_model_path,
                weight_type=QuantType.QInt8
            )
            print(f"[SUCCESS] Saved quantized model: {quantized_model_path}")
        except Exception as e:
            print(f"[ERROR] Failed to quantize {model_name}: {e}")

if __name__ == "__main__":
    quantize_all_models(models_dir)


[SKIP] No ONNX model found for .DS_Store at expected path: models/.DS_Store/onnx/model.onnx
[PROCESSING] Quantizing model 'all-MiniLM-L6-v2-financial-sentiment'




[SUCCESS] Saved quantized model: models/all-MiniLM-L6-v2-financial-sentiment/onnx/model_quantized.onnx
[PROCESSING] Quantizing model 'distilbert-financial-sentiment'




[SUCCESS] Saved quantized model: models/distilbert-financial-sentiment/onnx/model_quantized.onnx
[PROCESSING] Quantizing model 'finbert-tone-financial-sentiment'




[SUCCESS] Saved quantized model: models/finbert-tone-financial-sentiment/onnx/model_quantized.onnx
[SKIP] No ONNX model found for .gitkeep at expected path: models/.gitkeep/onnx/model.onnx
[PROCESSING] Quantizing model 'SmolLM2-360M-Instruct-financial-sentiment'




[SUCCESS] Saved quantized model: models/SmolLM2-360M-Instruct-financial-sentiment/onnx/model_quantized.onnx
[PROCESSING] Quantizing model 'tinybert-financial-classifier'




[SUCCESS] Saved quantized model: models/tinybert-financial-classifier/onnx/model_quantized.onnx
[PROCESSING] Quantizing model 'mobilebert-uncased-financial-sentiment'


  elem_type: 7
  shape {
    dim {
      dim_value: 3
    }
    dim {
      dim_value: 2
    }
  }
}
.
  elem_type: 7
  shape {
    dim {
      dim_value: 3
    }
    dim {
      dim_value: 2
    }
  }
}
.


[SUCCESS] Saved quantized model: models/mobilebert-uncased-financial-sentiment/onnx/model_quantized.onnx


In [52]:
import time
import onnxruntime as ort
import psutil
from transformers import AutoTokenizer

EXAMPLE_INPUT = "Stocks surged after the company reported record earnings."
MAX_LENGTH = 128
BENCHMARK_ITERATIONS = 100

def benchmark_onnx_model(onnx_path, tokenizer):
    # Load ONNX model session
    sess = ort.InferenceSession(str(onnx_path), providers=["CPUExecutionProvider"])

    # Measure memory usage after session creation
    process = psutil.Process()
    memory_mb = process.memory_info().rss / 1024 / 1024

    # Prepare input tokens
    inputs = tokenizer(EXAMPLE_INPUT, return_tensors="np", max_length=MAX_LENGTH, padding="max_length", truncation=True)

    # Warm-up
    for _ in range(10):
        sess.run(None, {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]})

    # Measure latency over multiple iterations
    times = []
    for _ in range(BENCHMARK_ITERATIONS):
        start = time.time()
        sess.run(None, {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]})
        times.append((time.time() - start) * 1000)  # milliseconds

    avg_latency = sum(times) / len(times)
    p99_latency = sorted(times)[int(len(times) * 0.99) - 1]

    # Calculate throughput: predictions per second (using avg latency)
    throughput = 1000 / avg_latency

    # Model size in MB
    model_size_mb = onnx_path.stat().st_size / (1024 * 1024)

    return {
        "avg_latency_ms": avg_latency,
        "p99_latency_ms": p99_latency,
        "memory_mb": memory_mb,
        "model_size_mb": model_size_mb,
        "throughput_preds_per_sec": throughput
    }


In [54]:
results = []

for model_dir in BASE_DIR.iterdir():
    if not model_dir.is_dir() or model_dir.name == ".gitkeep":
        continue

    onnx_path = model_dir / "onnx" / "model_quantized.onnx"
    if onnx_path.exists():
        print(f"Benchmarking model: {model_dir.name}")
        tokenizer = AutoTokenizer.from_pretrained(model_dir)
        result = benchmark_onnx_model(onnx_path, tokenizer)
        result["model"] = model_dir.name
        results.append(result)
    else:
        print(f"ONNX model not found for {model_dir.name}")

import pandas as pd
df = pd.DataFrame(results)
df = df.sort_values("avg_latency_ms").reset_index(drop=True)

print(df)


Benchmarking model: all-MiniLM-L6-v2-financial-sentiment
Benchmarking model: distilbert-financial-sentiment
Benchmarking model: finbert-tone-financial-sentiment
Benchmarking model: SmolLM2-360M-Instruct-financial-sentiment
Benchmarking model: tinybert-financial-classifier
Benchmarking model: mobilebert-uncased-financial-sentiment
   avg_latency_ms  p99_latency_ms    memory_mb  model_size_mb  \
0        9.539635       14.684200   929.234375      13.909289   
1       18.591957       26.837826   684.359375      21.980877   
2       55.933518       66.138983   930.906250      25.459671   
3       60.804486       65.684080   710.593750      64.228925   
4      121.273766      125.671864   737.656250     105.492896   
5      446.809196      501.713276  1150.609375     347.381740   

   throughput_preds_per_sec                                      model  
0                104.825812              tinybert-financial-classifier  
1                 53.786699       all-MiniLM-L6-v2-financial-senti

In [55]:
import time
import onnxruntime as ort
import psutil
from transformers import AutoTokenizer
from pathlib import Path
import numpy as np

EXAMPLE_INPUT = "Stocks surged after the company reported record earnings."
MAX_LENGTH = 128
BENCHMARK_ITERATIONS = 100
WARMUP_ITERATIONS = 10

In [56]:
import time
import gc
import statistics
from contextlib import contextmanager
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple
import onnxruntime as ort
import psutil
from transformers import AutoTokenizer
from pathlib import Path
import numpy as np
import pandas as pd

# Configuration
EXAMPLE_INPUTS = [
    "Stocks surged after the company reported record earnings.",
    "The weather forecast predicts heavy rain throughout the weekend.",
    "Scientists have discovered a new species of deep-sea creature.",
    "Technology companies are investing heavily in artificial intelligence research.",
    "The local community center will host a charity fundraising event next month."
]
MAX_LENGTH = 128
BENCHMARK_ITERATIONS = 100
WARMUP_ITERATIONS = 20
BATCH_SIZES = [1, 4, 8, 16]  # Test different batch sizes

@dataclass
class BenchmarkResult:
    model: str
    batch_size: int
    avg_latency_ms: float
    p50_latency_ms: float
    p95_latency_ms: float
    p99_latency_ms: float
    std_latency_ms: float
    min_latency_ms: float
    max_latency_ms: float
    memory_delta_mb: float
    peak_memory_mb: float
    model_size_mb: float
    throughput_samples_per_sec: float
    tokens_per_sec: float
    cpu_utilization_avg: float
    gpu_available: bool
    provider: str
    session_creation_time_ms: float

@contextmanager
def cpu_monitor():
    """Monitor CPU usage during execution"""
    process = psutil.Process()
    cpu_percentages = []
    
    def sample_cpu():
        cpu_percentages.append(process.cpu_percent())
    
    # Initial sample
    sample_cpu()
    yield cpu_percentages
    # Final sample
    sample_cpu()

def get_optimal_providers():
    """Get the best available ONNX Runtime execution providers"""
    available_providers = ort.get_available_providers()
    
    preferred_providers = [
        "CUDAExecutionProvider",
        "ROCMExecutionProvider",
        "OpenVINOExecutionProvider",
        "CoreMLExecutionProvider",
        "CPUExecutionProvider"
    ]
    
    for provider in preferred_providers:
        if provider in available_providers:
            return [provider]
    return ["CPUExecutionProvider"]

def load_onnx_session(onnx_path: Path, providers: Optional[List[str]] = None) -> Tuple[ort.InferenceSession, float]:
    """Load ONNX session with timing"""
    if providers is None:
        providers = get_optimal_providers()
    
    start_time = time.perf_counter()
    session_options = ort.SessionOptions()
    session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
    session_options.enable_mem_pattern = True
    session_options.enable_cpu_mem_arena = True
    
    session = ort.InferenceSession(
        str(onnx_path),
        providers=providers,
        sess_options=session_options
    )
    creation_time = (time.perf_counter() - start_time) * 1000
    return session, creation_time

def measure_memory_usage() -> float:
    """Get current memory usage in MB"""
    process = psutil.Process()
    return process.memory_info().rss / (1024 ** 2)

def prepare_batch_inputs(tokenizer, texts: List[str], max_length: int = MAX_LENGTH) -> Dict[str, np.ndarray]:
    """Prepare batched inputs for inference"""
    encoded = tokenizer(
        texts,
        return_tensors="np",
        max_length=max_length,
        padding="max_length",
        truncation=True
    )
    return {
        "input_ids": encoded["input_ids"].astype(np.int64),
        "attention_mask": encoded["attention_mask"].astype(np.int64)
    }

def warmup_session(session: ort.InferenceSession, inputs: Dict[str, np.ndarray], iterations: int = WARMUP_ITERATIONS):
    """Warm up the session with multiple iterations"""
    print(f"  Warming up for {iterations} iterations...")
    for i in range(iterations):
        session.run(None, inputs)
        if i % 5 == 0:
            gc.collect()

def measure_latency_detailed(session: ort.InferenceSession, inputs: Dict[str, np.ndarray],
                             iterations: int = BENCHMARK_ITERATIONS) -> Tuple[List[float], float]:
    """Measure latency with detailed statistics and CPU monitoring"""
    times = []
    
    gc.collect()
    with cpu_monitor() as cpu_samples:
        for i in range(iterations):
            if i % 25 == 0 and i > 0:
                gc.collect()
            
            start = time.perf_counter()
            session.run(None, inputs)
            end = time.perf_counter()
            
            times.append((end - start) * 1000)  # ms
            
            if i % 10 == 0:
                cpu_samples.append(psutil.Process().cpu_percent())
    
    avg_cpu = statistics.mean(cpu_samples) if cpu_samples else 0.0
    return times, avg_cpu

def calculate_detailed_stats(times: List[float], batch_size: int, num_tokens: int) -> Dict[str, float]:
    """Calculate latency statistics"""
    times_sorted = sorted(times)
    n = len(times)
    
    return {
        'avg_latency_ms': statistics.mean(times),
        'p50_latency_ms': times_sorted[n // 2],
        'p95_latency_ms': times_sorted[int(n * 0.95)],
        'p99_latency_ms': times_sorted[int(n * 0.99)] if n >= 100 else times_sorted[-1],
        'std_latency_ms': statistics.stdev(times) if len(times) > 1 else 0.0,
        'min_latency_ms': min(times),
        'max_latency_ms': max(times),
        'throughput_samples_per_sec': (1000 * batch_size) / statistics.mean(times),
        'tokens_per_sec': (1000 * num_tokens) / statistics.mean(times)
    }

def get_model_size_mb(onnx_path: Path) -> float:
    """Get model size in megabytes"""
    return onnx_path.stat().st_size / (1024 ** 2)

def benchmark_onnx_model(onnx_path: Path, tokenizer, batch_size: int = 1) -> BenchmarkResult:
    """Comprehensive benchmark of ONNX model"""
    providers = get_optimal_providers()
    session, creation_time = load_onnx_session(onnx_path, providers)
    
    test_texts = (EXAMPLE_INPUTS * ((batch_size // len(EXAMPLE_INPUTS)) + 1))[:batch_size]
    inputs = prepare_batch_inputs(tokenizer, test_texts)
    total_tokens = inputs["input_ids"].size
    
    memory_before = measure_memory_usage()
    peak_memory = memory_before
    
    warmup_session(session, inputs)
    
    print(f"  Running {BENCHMARK_ITERATIONS} benchmark iterations...")
    times, avg_cpu = measure_latency_detailed(session, inputs)
    
    memory_after = measure_memory_usage()
    peak_memory = max(peak_memory, memory_after)
    
    stats = calculate_detailed_stats(times, batch_size, total_tokens)
    model_size_mb = get_model_size_mb(onnx_path)
    
    return BenchmarkResult(
        model=onnx_path.parent.parent.name,
        batch_size=batch_size,
        memory_delta_mb=memory_after - memory_before,
        peak_memory_mb=peak_memory,
        model_size_mb=model_size_mb,
        cpu_utilization_avg=avg_cpu,
        gpu_available="CUDA" in providers[0] or "ROCM" in providers[0],
        provider=providers[0],
        session_creation_time_ms=creation_time,
        **stats
    )

def save_detailed_results(results: List[BenchmarkResult], output_dir: Path = Path("benchmark_results")):
    """Save results in CSV, JSON and print summary"""
    output_dir.mkdir(exist_ok=True)
    
    df = pd.DataFrame([result.__dict__ for result in results])
    df.to_csv(output_dir / "benchmark_results.csv", index=False)
    df.to_json(output_dir / "benchmark_results.json", indent=2)
    
    summary_cols = [
        'model', 'batch_size', 'avg_latency_ms', 'p99_latency_ms',
        'throughput_samples_per_sec', 'memory_delta_mb', 'provider'
    ]
    summary_df = df[summary_cols].sort_values(['model', 'batch_size'])
    
    print("\n" + "="*80)
    print("BENCHMARK SUMMARY")
    print("="*80)
    print(summary_df.to_string(index=False, float_format='%.2f'))
    
    print("\n" + "="*80)
    print("TOP PERFORMERS")
    print("="*80)
    
    for batch_size in df['batch_size'].unique():
        batch_df = df[df['batch_size'] == batch_size]
        fastest = batch_df.loc[batch_df['avg_latency_ms'].idxmin()]
        highest_throughput = batch_df.loc[batch_df['throughput_samples_per_sec'].idxmax()]
        
        print(f"\nBatch Size {batch_size}:")
        print(f"  Fastest: {fastest['model']} ({fastest['avg_latency_ms']:.2f} ms)")
        print(f"  Highest Throughput: {highest_throughput['model']} ({highest_throughput['throughput_samples_per_sec']:.2f} samples/sec)")

def main():
    """Main benchmarking function"""
    base_dir = Path("models/")  # Update to your models directory
    
    if not base_dir.exists():
        print(f"Models directory '{base_dir}' not found!")
        return
    
    results = []
    
    print("Available ONNX Runtime Providers:", ort.get_available_providers())
    print("Using providers:", get_optimal_providers())
    print(f"System Memory: {psutil.virtual_memory().total / (1024**3):.1f} GB")
    print(f"CPU Count: {psutil.cpu_count()}")
    print()
    
    model_dirs = [d for d in base_dir.iterdir() if d.is_dir()]
    
    for model_dir in model_dirs:
        onnx_path = model_dir / "onnx" / "model.onnx"
        if not onnx_path.exists():
            print(f"⚠️  ONNX model not found in {model_dir}")
            continue
        
        print(f"🔍 Benchmarking {model_dir.name}...")
        try:
            tokenizer = AutoTokenizer.from_pretrained(model_dir)
            
            for batch_size in BATCH_SIZES:
                print(f"  Batch size: {batch_size}")
                result = benchmark_onnx_model(onnx_path, tokenizer, batch_size)
                results.append(result)
                
        except Exception as e:
            print(f"❌ Error benchmarking {model_dir.name}: {e}")
            continue
        
        print(f"✅ Completed {model_dir.name}\n")
    
    if results:
        save_detailed_results(results)
        print(f"\n📊 Benchmarked {len(set(r.model for r in results))} models with {len(results)} configurations")
        print("📁 Detailed results saved to 'benchmark_results/' directory")
    else:
        print("❌ No models were successfully benchmarked!")

if __name__ == "__main__":
    main()


Available ONNX Runtime Providers: ['CoreMLExecutionProvider', 'AzureExecutionProvider', 'CPUExecutionProvider']
Using providers: ['CoreMLExecutionProvider']
System Memory: 8.0 GB
CPU Count: 8

🔍 Benchmarking all-MiniLM-L6-v2-financial-sentiment...
  Batch size: 1


[0;93m2025-07-23 14:41:45.264360 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:bert.embeddings.word_embeddings.weight, shape: {30522,384}[m
[0;93m2025-07-23 14:41:45.266209 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 56 number of nodes in the graph: 326 number of nodes supported by CoreML: 164[m
[0;93m2025-07-23 14:41:45.307797 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:45.330732 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:45.342956 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:45.386036 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 1

  Warming up for 20 iterations...


Context leak detected, msgtracer returned -1


  Running 100 benchmark iterations...
  Batch size: 4


[0;93m2025-07-23 14:41:56.480928 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:bert.embeddings.word_embeddings.weight, shape: {30522,384}[m
[0;93m2025-07-23 14:41:56.481638 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 56 number of nodes in the graph: 326 number of nodes supported by CoreML: 164[m
[0;93m2025-07-23 14:41:56.498365 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:56.507886 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:56.516653 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:56.530566 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 1

  Warming up for 20 iterations...
❌ Error benchmarking all-MiniLM-L6-v2-financial-sentiment: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running 6255025888846516209_CoreML_6255025888846516209_4 node. Name:'CoreMLExecutionProvider_6255025888846516209_CoreML_6255025888846516209_4_4' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
🔍 Benchmarking distilbert-financial-sentiment...
  Batch size: 1


[0;93m2025-07-23 14:41:57.443806 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:distilbert.embeddings.word_embeddings.weight, shape: {30522,768}[m
[0;93m2025-07-23 14:41:57.444543 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 57 number of nodes in the graph: 313 number of nodes supported by CoreML: 163[m
[0;93m2025-07-23 14:41:57.454384 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:57.471073 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:57.480905 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:41:57.490246 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/m

  Warming up for 20 iterations...


Context leak detected, msgtracer returned -1


  Running 100 benchmark iterations...
  Batch size: 4


[0;93m2025-07-23 14:42:19.896340 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:distilbert.embeddings.word_embeddings.weight, shape: {30522,768}[m
[0;93m2025-07-23 14:42:19.897037 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 57 number of nodes in the graph: 313 number of nodes supported by CoreML: 163[m
[0;93m2025-07-23 14:42:19.907973 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:42:19.924388 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:42:19.934555 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:42:19.944686 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/m

  Warming up for 20 iterations...
❌ Error benchmarking distilbert-financial-sentiment: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running 10752758088624767045_CoreML_10752758088624767045_5 node. Name:'CoreMLExecutionProvider_10752758088624767045_CoreML_10752758088624767045_5_5' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
🔍 Benchmarking finbert-tone-financial-sentiment...
  Batch size: 1
  Warming up for 20 iterations...


[0;93m2025-07-23 14:42:21.203424 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:model.bert.embeddings.word_embeddings.weight, shape: {30873,768}[m
[0;93m2025-07-23 14:42:21.204680 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 110 number of nodes in the graph: 614 number of nodes supported by CoreML: 320[m
[0;93m2025-07-23 14:42:21.258415 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:42:21.270074 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:42:21.283114 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:42:21.300021 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/

  Running 100 benchmark iterations...
  Batch size: 4


[0;93m2025-07-23 14:43:01.019265 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:model.bert.embeddings.word_embeddings.weight, shape: {30873,768}[m
[0;93m2025-07-23 14:43:01.020502 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 110 number of nodes in the graph: 614 number of nodes supported by CoreML: 320[m
[0;93m2025-07-23 14:43:01.075661 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:43:01.087142 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:43:01.096245 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:43:01.110966 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/

  Warming up for 20 iterations...
❌ Error benchmarking finbert-tone-financial-sentiment: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running 17604349712411484192_CoreML_17604349712411484192_4 node. Name:'CoreMLExecutionProvider_17604349712411484192_CoreML_17604349712411484192_4_4' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
🔍 Benchmarking SmolLM2-360M-Instruct-financial-sentiment...
  Batch size: 1
  Warming up for 20 iterations...


[0;93m2025-07-23 14:43:03.968900 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:model.model.embed_tokens.weight, shape: {49153,960}[m
[0;93m2025-07-23 14:43:03.970236 [W:onnxruntime:, helper.cc:89 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/model/Slice_4_output_0, shape: {0}[m
[0;93m2025-07-23 14:43:03.975078 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 259 number of nodes in the graph: 3730 number of nodes supported by CoreML: 1297[m
[0;93m2025-07-23 14:43:05.037023 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:43:05.050609 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:43:05.070816 [W:onnxruntime:, mode

  Running 100 benchmark iterations...
  Batch size: 4


[0;93m2025-07-23 14:46:44.775072 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:model.model.embed_tokens.weight, shape: {49153,960}[m
[0;93m2025-07-23 14:46:44.775902 [W:onnxruntime:, helper.cc:89 IsInputSupported] CoreML does not support shapes with dimension values of 0. Input:/model/model/Slice_4_output_0, shape: {0}[m
[0;93m2025-07-23 14:46:44.779788 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 259 number of nodes in the graph: 3730 number of nodes supported by CoreML: 1297[m
[0;93m2025-07-23 14:46:45.881144 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:45.895716 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:45.917652 [W:onnxruntime:, mode

  Warming up for 20 iterations...
❌ Error benchmarking SmolLM2-360M-Instruct-financial-sentiment: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running 9273022787927528322_CoreML_9273022787927528322_4 node. Name:'CoreMLExecutionProvider_9273022787927528322_CoreML_9273022787927528322_4_4' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
🔍 Benchmarking tinybert-financial-classifier...
  Batch size: 1


[0;93m2025-07-23 14:46:52.437953 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:model.bert.embeddings.word_embeddings.weight, shape: {30522,312}[m
[0;93m2025-07-23 14:46:52.438631 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 38 number of nodes in the graph: 230 number of nodes supported by CoreML: 112[m
[0;93m2025-07-23 14:46:52.451649 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:52.461800 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:52.471770 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:52.487075 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/m

  Warming up for 20 iterations...
  Running 100 benchmark iterations...
  Batch size: 4


[0;93m2025-07-23 14:46:58.046828 [W:onnxruntime:, helper.cc:83 IsInputSupported] CoreML does not support input dim > 16384. Input:model.bert.embeddings.word_embeddings.weight, shape: {30522,312}[m
[0;93m2025-07-23 14:46:58.047344 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 38 number of nodes in the graph: 230 number of nodes supported by CoreML: 112[m
[0;93m2025-07-23 14:46:58.061322 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:58.071564 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:58.080655 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:58.095112 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/m

  Warming up for 20 iterations...
❌ Error benchmarking tinybert-financial-classifier: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running 14308768299282785986_CoreML_14308768299282785986_4 node. Name:'CoreMLExecutionProvider_14308768299282785986_CoreML_14308768299282785986_4_4' Status Message: Error executing model: Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
🔍 Benchmarking mobilebert-uncased-financial-sentiment...
  Batch size: 1


[0;93m2025-07-23 14:46:58.830154 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 363 number of nodes in the graph: 1685 number of nodes supported by CoreML: 1140[m
[0;93m2025-07-23 14:46:59.779232 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:59.796751 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:59.806601 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:59.861746 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureOptimizationHints[m
[0;93m2025-07-23 14:46:59.870276 [W:onnxruntime:, model.mm:552 LoadModel] iOS 17.4+/macOS 14.4+ or later is required to ConfigureO

  Warming up for 20 iterations...


Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1


: 