-
Notifications
You must be signed in to change notification settings - Fork 2
Release 0.3.7 – Quota, Usage Metering & Parity patch #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ing-hooks Implement usage accounting hooks
…rometheus Add metrics helper and usage extra
…upport Add real OpenMeter metering
…leware-for-fastapi Fix quota streaming tail handling
…trics Fix final prompt token accounting
…env-vars Make token limit optional via shared helper
…ta-exceeded Fix 429 response when quota hit during stream
…ng-window-accounting Fix token quota rollback for streaming
…-exceed Handle quota breach mid-stream
Quota middleware v2 – streaming-safe token accounting + OpenMeter direct HTTP
Fix/memory events endpoint
Fix CORS OPTIONS Requests Blocked by JWT Middleware
HashamUlHaq
approved these changes
Jul 25, 2025
Collaborator
|
LGTM |
runnerelectrode
approved these changes
Jul 25, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Token Quota
Re-implemented streaming-safe accounting• Mid-stream breach detection & rollback• Optional limit via MAX_TOKENS_PER_MIN helper
Usage Metering
Prometheus & OpenMeter now behind USAGE_METERING extra• Fail-fast error when USAGE_METERING=openmeter but OPENMETER_API_KEY is missing
Gateway/Main parity
Unified middleware order, CORS placement, /a2a prefix, memory endpoint fields, lifespan startup/shutdown logic
CLI UX
attach-gateway prints friendly emoji error instead of raw traceback when required env vars are missing
Packaging
Added logs.py to wheel via py_modules• Split extras: memory, quota, usage• Removed hard pin for Weaviate from core deps (optional)
Docs / CI
Updated README examples & pytest matrix; new tests for OpenMeter/Prometheus fallbacks
All unit tests pass (pytest -q)
• make lint clean
• Docs updated (README.md, CHANGELOG.md)
• Built wheel installs with pip install --no-deps dist/*.whl
• Tested fresh venv install from TestPyPI (0.3.7)