Quota middleware v2 – streaming-safe token accounting + OpenMeter direct HTTP #41

hammadtq · 2025-07-21T21:40:58Z

@HashamUlHaq could you review when you have a chance? In particular:
Scope of the patch

Full rewrite of middleware/quota.py
Uses a sliding-window meter store (InMemoryMeterStore + RedisMeterStore)
Streaming-aware quota enforcement via the internal _Streamer helper
Robust tail-parsing in finalize() for:
SSE frames (data: ...)
[DONE] sentinels
plain JSON bodies
Prometheus usage counters are now updated once with the canonical numbers taken from the LLM response.

…ing-hooks Implement usage accounting hooks

…rometheus Add metrics helper and usage extra

…upport Add real OpenMeter metering

…leware-for-fastapi Fix quota streaming tail handling

…trics Fix final prompt token accounting

…env-vars Make token limit optional via shared helper

…ta-exceeded Fix 429 response when quota hit during stream

…ng-window-accounting Fix token quota rollback for streaming

…-exceed Handle quota breach mid-stream

hammadtq · 2025-07-23T03:00:29Z

Pushed OpenMeter HTTP refactor & README. Ready for another look—thanks!

HashamUlHaq

Thanks for the feature!
Following is my analysis:

Pre-request: Estimate with tiktoken

Before the request is sent to the LLM backend, the middleware:

Parses the prompt/messages.
Uses tiktoken (or a fallback) to estimate the number of input tokens (tokens_in).
Checks if this estimated usage would exceed the user’s quota.
If over quota: The request is rejected immediately.

Post-request: Canonical count from LLM response

After the LLM backend (vLLM, Ollama, OpenAI, etc.) returns a response:

The middleware inspects the response for a usage field.
If present, it uses the prompt_tokens and completion_tokens from the LLM’s own metrics as the canonical count for both - incoming and outgoing tokens.
If not present, it falls back to counting tokens in the output using tiktoken.
These canonical numbers are what get reported to the usage metering backend (Prometheus, OpenMeter, etc.).

Good thing:
In the fallback method, the middleware does NOT recalculate the input tokens (tokens_in)—it only recalculates the output tokens (tokens_out) if the LLM response does not provide a usage field.

Suggestions:

The fallback method is len(list(text.encode())), which might inflate the numbers significantly. A more accurate approach might be to use len(text) // 4.

Other suggestions (not as important):

Set a maximum request size to avoid excessive memory or CPU usage during tokenization.

hammadtq added 15 commits July 14, 2025 23:02

adding new branch

8411512

chore: polish usage hooks

23d3d8d

Merge pull request #35 from attach-dev/codex/implement-pluggable-bill…

8a0badd

…ing-hooks Implement usage accounting hooks

prometheus test is now working

14ba61c

Guard metrics when usage backend lacks counter

4c0fc8e

Merge pull request #36 from attach-dev/codex/add-/metrics-route-for-p…

3e84109

…rometheus Add metrics helper and usage extra

prometheus usage now works

526ff97

Polish OpenMeter backend and docs

99567fe

Merge pull request #37 from attach-dev/codex/add-openmeter-metering-s…

c5d2b64

…upport Add real OpenMeter metering

excluded quota pathways

5079191

Add mid-stream quota log and expand tail buffer

929bc07

Merge pull request #38 from attach-dev/codex/implement-tokenquotamidd…

7db4207

…leware-for-fastapi Fix quota streaming tail handling

fix token quota finalization

6a83552

Merge pull request #39 from attach-dev/codex/fix-token-counting-in-me…

ced186f

…trics Fix final prompt token accounting

token calculation matches between response output and metrics endpoint

f162f20

hammadtq marked this pull request as draft July 21, 2025 21:41

hammadtq added 10 commits July 22, 2025 05:43

Make token quota optional with shared int_env helper

38bf0c2

Merge pull request #42 from attach-dev/codex/add-_int_env-helper-for-…

4aa2a05

…env-vars Make token limit optional via shared helper

Add content-type header for quota 429

19b10d4

Merge pull request #43 from attach-dev/codex/fix-http-response-on-quo…

bc6653f

…ta-exceeded Fix 429 response when quota hit during stream

fix quota streaming rollback

9e31551

Merge pull request #44 from attach-dev/codex/implement-accurate-slidi…

e7f2367

…ng-window-accounting Fix token quota rollback for streaming

Return proper 429 JSON when quota exceeded mid-stream

9b35225

Merge pull request #45 from attach-dev/codex/return-json-429-on-quota…

287f7f9

…-exceed Handle quota breach mid-stream

open meter now works

db6cd28

updated readme and openmeter

fc5cfed

hammadtq requested a review from HashamUlHaq July 23, 2025 02:50

hammadtq marked this pull request as ready for review July 23, 2025 02:53

hammadtq changed the title ~~Quota middleware v2 — streaming-safe token accounting~~ Quota middleware v2 – streaming-safe token accounting **+ OpenMeter direct HTTP** Jul 23, 2025

hammadtq changed the title Quota middleware v2 – streaming-safe token accounting **+ OpenMeter direct HTTP** Quota middleware v2 – streaming-safe token accounting + OpenMeter direct HTTP Jul 23, 2025

HashamUlHaq reviewed Jul 23, 2025

View reviewed changes

fixed len(text) if tiktoken is not available

29ba261

hammadtq merged commit e80c891 into dev Jul 23, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quota middleware v2 – streaming-safe token accounting + OpenMeter direct HTTP #41

Quota middleware v2 – streaming-safe token accounting + OpenMeter direct HTTP #41

Uh oh!

hammadtq commented Jul 21, 2025 •

edited

Loading

Uh oh!

hammadtq commented Jul 23, 2025

Uh oh!

HashamUlHaq left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Quota middleware v2 – streaming-safe token accounting + OpenMeter direct HTTP #41

Quota middleware v2 – streaming-safe token accounting + OpenMeter direct HTTP #41

Uh oh!

Conversation

hammadtq commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hammadtq commented Jul 23, 2025

Uh oh!

HashamUlHaq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hammadtq commented Jul 21, 2025 •

edited

Loading