Skip to content
Discussion options

You must be logged in to vote

There are a few important things that explain why you're not seeing any 429 responses:

1. The ai-rate-limiting plugin requires ai-proxy or ai-proxy-multi
The plugin only activates when ctx.picked_ai_instance_name is set, which is populated by the ai-proxy or ai-proxy-multi plugin [1]. In your config, you're using a plain upstream with proxy-rewrite — so the ai-rate-limiting plugin never actually triggers. You would need to route through ai-proxy for the plugin to work.

2. Token counting happens after the LLM responds, not before
The plugin uses a two-phase approach: a 1-token pre-flight check in the access phase, then the real token deduction in the log phase after parsing the LLM respon…

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@dosubot
Comment options

Answer selected by SylvainVerdy
@SylvainVerdy
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants