[AAASM-195] ✅ (bench): Add Python-side performance benchmarks for latency contracts by Chisanan232 · Pull Request #21 · ai-agent-assembly/python-sdk

Chisanan232 · 2026-05-01T10:24:31Z

Description

Add a comprehensive Python-side performance benchmark suite to verify the <2ms per-call (AAASM-45) and <50ms detection (AAASM-47) latency contracts. This establishes measurable baselines so future regressions are detectable.

Type of Change

Breaking Changes

Does this PR introduce any breaking changes?

No
Yes (please describe below)

Related Issues

Related JIRA ticket: AAASM-195
Related stories: AAASM-45 (<2ms per-call), AAASM-47 (<50ms detection)

What Changed

Added pytest-benchmark to dev dependencies
Created test/bench/ directory with shared fixtures and latency contract constants
Added benchmark pytest marker to pytest.ini
Benchmarks for all 6 adapter hook register/unregister cycles (LangChain, LangGraph, CrewAI, Pydantic AI, OpenAI Agents, MCP)
Benchmarks for AdapterRegistry.auto_detect() scaling with 0/1/2/4 frameworks
Benchmark for init_assembly() cold-start time
Conditional benchmark for report_llm_call() PyO3 round-trip (skips when native module not built)
Latency contract enforcement tests using time.perf_counter_ns() with P50/P95/P99 percentile reporting
Baseline results documented in test/bench/BASELINE.md
CI workflow (.github/workflows/benchmarks.yml) for automated regression detection

Baseline Results

All benchmarks pass well within contract thresholds:

Adapter hooks: 0.6–2.7µs mean (contract: <2ms)
Detection: ~1.3ms mean (contract: <50ms)
Cold start: ~1.5ms mean

Testing

Describe the testing performed for this PR:

Unit tests added/updated
Integration tests added/updated
Manual testing performed
No tests required (explain why)

Run benchmarks: pytest test/bench/ --benchmark-only
Run contract tests: pytest test/bench/test_latency_contracts.py --benchmark-disable

Checklist

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-05-01T10:26:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Benchmark the governance interception overhead on each tool/function call when hooks are active (the hot path). Covers all 6 adapters: CrewAI, LangChain, LangGraph, Pydantic AI, OpenAI Agents, MCP. Addresses AAASM-195 AC1: per-call overhead of each framework adapter hook. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace register/unregister cycle measurement with actual per-call patched function overhead for the <2ms P99 contract. Each adapter now benchmarks its real hot path: CrewAI BaseTool.run(), LangChain callback dispatch, LangGraph wrapped node, and async adapters (Pydantic AI, OpenAI Agents, MCP) measured inside event loops. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Document patched-call benchmark results for all 6 adapters. Sync adapters ~1-2us, async adapters ~30-40us (includes event-loop scheduling overhead from benchmark harness). All well under 2ms. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sonarqubecloud · 2026-05-01T10:46:16Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Chisanan232 and others added 12 commits May 1, 2026 18:03

🔧 (deps): Add pytest-benchmark to dev dependency group

bc9f686

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

✨ (bench): Create test/bench directory with shared benchmark fixtures

1ba549b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

🔧 (config): Add benchmark pytest marker to pytest.ini

afcebfd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

✅ (bench): Add per-adapter hook apply/revert latency benchmarks

8c8a209

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

✅ (bench): Add AdapterRegistry.auto_detect() scaling benchmarks

2bfb94f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

✅ (bench): Add init_assembly() cold-start benchmark

c225149

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

✅ (bench): Add report_llm_call() PyO3 round-trip benchmark

98d7dcd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

✅ (bench): Add latency contract enforcement tests with P50/P95/P99

382da71

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

📝 (bench): Document initial benchmark baseline results

c6b8beb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

🔧 (ci): Add benchmark CI workflow for performance regression detection

e10cac6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

🚨 (bench): Narrow mypy type-ignore comments to method-assign

db691c7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

🚨 (bench): Apply linter import sorting and formatting fixes

727ed4d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Chisanan232 and others added 3 commits May 1, 2026 18:44

Chisanan232 merged commit 330a813 into master May 1, 2026
23 checks passed

Chisanan232 deleted the v0.0.0/AAASM-195/add_python_benchmarks branch May 1, 2026 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AAASM-195] ✅ (bench): Add Python-side performance benchmarks for latency contracts#21

[AAASM-195] ✅ (bench): Add Python-side performance benchmarks for latency contracts#21
Chisanan232 merged 15 commits into
masterfrom
v0.0.0/AAASM-195/add_python_benchmarks

Chisanan232 commented May 1, 2026

Uh oh!

codecov Bot commented May 1, 2026

Uh oh!

sonarqubecloud Bot commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Chisanan232 commented May 1, 2026

Description

Type of Change

Breaking Changes

Related Issues

What Changed

Baseline Results

Testing

Checklist

Uh oh!

codecov Bot commented May 1, 2026

Codecov Report

Uh oh!

sonarqubecloud Bot commented May 1, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant