Feature Request / Issue: Global State Support for ai-rate-limiting (Redis Integration) #13076

mohamedDev · 2026-03-05T20:43:55Z

mohamedDev
Mar 5, 2026

The Problem (Context)
Currently, the ai-rate-limiting plugin stores token usage and quotas in local shared memory (SHM). In a modern Kubernetes deployment with multiple APISIX replicas, the token counters are isolated per Pod.
Why this is breaking AI Gateway features:
Inconsistent Fallback: If a user makes a request that consumes 2500 tokens on Pod A (where the limit is 50), the next request routed to Pod B or Pod C will bypass the fallback logic because their local counters are still at 0.

Quota Multiplier: The effective limit becomes limit * number_of_replicas, making fine-grained AI cost control impossible.

Header Drift: The X-Rate-Limit-Remaining headers change inconsistently depending on which Pod the Load Balancer hits.

Comparison with existing plugins
Standard plugins like limit-count already support a policy: redis option to synchronize state across a cluster. The ai-rate-limiting plugin lacks this critical "Cloud Native" feature.
Proposed Solution
Add a policy field to the ai-rate-limiting plugin configuration, allowing users to point to a Redis instance for global token counting
Current Workaround (Suboptimal)
Currently, users have to write complex serverless-functions to manually increment Redis keys in the log phase and check them in the access phase, which defeats the purpose of having a dedicated AI plugin.