Skip to content

Conversation

@hammadtq
Copy link
Collaborator

What’s new & why
  • Token Quota
    Re-implemented streaming-safe accounting• Mid-stream breach detection & rollback• Optional limit via MAX_TOKENS_PER_MIN helper

  • Usage Metering
    Prometheus & OpenMeter now behind USAGE_METERING extra• Fail-fast error when USAGE_METERING=openmeter but OPENMETER_API_KEY is missing

  • Gateway/Main parity
    Unified middleware order, CORS placement, /a2a prefix, memory endpoint fields, lifespan startup/shutdown logic

  • CLI UX
    attach-gateway prints friendly emoji error instead of raw traceback when required env vars are missing

  • Packaging
    Added logs.py to wheel via py_modules• Split extras: memory, quota, usage• Removed hard pin for Weaviate from core deps (optional)

  • Docs / CI
    Updated README examples & pytest matrix; new tests for OpenMeter/Prometheus fallbacks

All unit tests pass (pytest -q)
• make lint clean
• Docs updated (README.md, CHANGELOG.md)
• Built wheel installs with pip install --no-deps dist/*.whl
• Tested fresh venv install from TestPyPI (0.3.7)

hammadtq added 30 commits July 14, 2025 23:02
…ing-hooks

Implement usage accounting hooks
…rometheus

Add metrics helper and usage extra
…leware-for-fastapi

Fix quota streaming tail handling
…env-vars

Make token limit optional via shared helper
…ta-exceeded

Fix 429 response when quota hit during stream
…ng-window-accounting

Fix token quota rollback for streaming
Quota middleware v2 – streaming-safe token accounting + OpenMeter direct HTTP
@runnerelectrode
Copy link
Collaborator

LGTM

@hammadtq hammadtq merged commit a5e8267 into main Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants