You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Problem (Context)
Currently, the ai-rate-limiting plugin stores token usage and quotas in local shared memory (SHM). In a modern Kubernetes deployment with multiple APISIX replicas, the token counters are isolated per Pod.
Why this is breaking AI Gateway features:
Inconsistent Fallback: If a user makes a request that consumes 2500 tokens on Pod A (where the limit is 50), the next request routed to Pod B or Pod C will bypass the fallback logic because their local counters are still at 0.
Quota Multiplier: The effective limit becomes limit * number_of_replicas, making fine-grained AI cost control impossible.
Header Drift: The X-Rate-Limit-Remaining headers change inconsistently depending on which Pod the Load Balancer hits.
Comparison with existing plugins
Standard plugins like limit-count already support a policy: redis option to synchronize state across a cluster. The ai-rate-limiting plugin lacks this critical "Cloud Native" feature.
Proposed Solution
Add a policy field to the ai-rate-limiting plugin configuration, allowing users to point to a Redis instance for global token counting
Current Workaround (Suboptimal)
Currently, users have to write complex serverless-functions to manually increment Redis keys in the log phase and check them in the access phase, which defeats the purpose of having a dedicated AI plugin.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
The Problem (Context)
Currently, the ai-rate-limiting plugin stores token usage and quotas in local shared memory (SHM). In a modern Kubernetes deployment with multiple APISIX replicas, the token counters are isolated per Pod.
Why this is breaking AI Gateway features:
Inconsistent Fallback: If a user makes a request that consumes 2500 tokens on Pod A (where the limit is 50), the next request routed to Pod B or Pod C will bypass the fallback logic because their local counters are still at 0.
Quota Multiplier: The effective limit becomes limit * number_of_replicas, making fine-grained AI cost control impossible.
Header Drift: The X-Rate-Limit-Remaining headers change inconsistently depending on which Pod the Load Balancer hits.
Comparison with existing plugins
Standard plugins like limit-count already support a policy: redis option to synchronize state across a cluster. The ai-rate-limiting plugin lacks this critical "Cloud Native" feature.
Proposed Solution
Add a policy field to the ai-rate-limiting plugin configuration, allowing users to point to a Redis instance for global token counting
Current Workaround (Suboptimal)
Currently, users have to write complex serverless-functions to manually increment Redis keys in the log phase and check them in the access phase, which defeats the purpose of having a dedicated AI plugin.
Beta Was this translation helpful? Give feedback.
All reactions