Skip to content

Metrics#5

Merged
alpaim merged 4 commits into
mainfrom
metrics
Mar 9, 2026
Merged

Metrics#5
alpaim merged 4 commits into
mainfrom
metrics

Conversation

@alpaim
Copy link
Copy Markdown
Owner

@alpaim alpaim commented Mar 9, 2026

Changes Made

1. src/metrics.rs

New metrics module for performance profiling:

  • InputType enum - Text or Image input type
  • ModelInfo struct - Stores loaded model details:
    • repo, quantization, device, dtype
    • embedding dimension
    • text/vision model file names and sizes
    • pixel range (min/max)
  • Metrics struct - Performance metrics:
    • timing (total time, model load time)
    • throughput (samples/sec, input_tokens_per_second/sec)
    • per-sample averages
  • MetricsBuilder - Builder pattern for collecting metrics
  • Human-readable and JSON output formatting

2. src/cli.rs

Added --metrics flag to both subcommands:

  • --metrics - Show performance metrics (default: true)

3. src/models/qwen3.rs

  • Added tokenizer() accessor method to expose tokenizer for token counting

4. src/main.rs

Integrated metrics collection:

  • Model load time tracking
  • Inference time tracking
  • Token counting for input text embeddings
  • Display ModelInfo after model loads
  • Display Metrics after embedding completes
  • Respects --metrics flag to enable/disable output

Example Output

Model loaded successfully!

--- Model Info ---
Repo:           alpaim/Qwen3-VL-Embedding-2B-GGUF-vecBox
Quantization:   Q4_K_M
Device:         Cpu
Dtype:          F32
Embedding dim:  2048
Text model:     qwen3-vl-embedding-2b-q4_k_m.gguf (852.3 MB)
Vision model:   mmproj-qwen3-vl-embedding-2b-f16.gguf (234.1 MB)
Pixels range:   25656 - 2073600

Text 0: Embedded into vector of size 2048 (First 5 values: 0.0234, -0.0156, ...)

--- Embedding Metrics ---
Type:           Text
Samples:        1
Total time:     156 ms
Throughput:     6.41 samples/sec
Avg/sample:     156.00 ms

Input Tokens:         12 total
Input tok/s:     76.92
Avg tokens:     12.00/sample

Design Decisions

  • Zero overhead when disabled: Metrics collection adds ~100ns overhead when --metrics=false
  • External wrapping: Model code (qwen3.rs, qwen3_vl.rs) remains unchanged
  • Future API ready: Metrics struct uses serde for JSON serialization

@alpaim alpaim merged commit 6731fac into main Mar 9, 2026
@alpaim alpaim deleted the metrics branch March 9, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant