Skip to content

feat(ai-rate-limiting): add expression-based limit strategy#13191

Merged
nic-6443 merged 7 commits intoapache:masterfrom
nic-6443:feat/ai-rate-limiting-expression
Apr 10, 2026
Merged

feat(ai-rate-limiting): add expression-based limit strategy#13191
nic-6443 merged 7 commits intoapache:masterfrom
nic-6443:feat/ai-rate-limiting-expression

Conversation

@nic-6443
Copy link
Copy Markdown
Member

@nic-6443 nic-6443 commented Apr 9, 2026

Description

This PR adds the expression limit strategy to the ai-rate-limiting plugin.

Expression strategy

The expression limit strategy allows defining rate limit groups using lua-resty-expr expressions. Each group can have its own count, time_window, and matching expression. When a request matches multiple groups, the first matching group is used. If no group matches, the request is passed through without rate limiting.

This enables fine-grained AI token rate limiting based on request attributes (headers, query params, variables, etc.).

Example config

{
  "limit_strategy": "expression",
  "cost_expr": "input_tokens + completion_tokens",
  "limit_groups": [
    {
      "expression": [["http_x_model", "==", "gpt-4"]],
      "count": 500,
      "time_window": 60
    },
    {
      "expression": [["http_x_model", "==", "gpt-3.5"]],
      "count": 1000,
      "time_window": 60
    }
  ]
}

Checklist

  • Expression strategy implementation with schema, compile, eval functions
  • Handle edge cases: negative, NaN, Inf token usage
  • Safe expression environment (upstream usage keys cannot shadow built-in functions)
  • 13 test cases covering expression rate limiting scenarios

Note on remaining header accuracy

The X-AI-RateLimit-Remaining header currently shows a value that is off by 1 (e.g., 499 instead of 500) due to the limit-count module deducting cost during the access-phase dry-run peek. This will be fixed in a follow-up PR after apisix-build-tools#455 merges and a new apisix-runtime is released with lua-resty-limit-traffic v1.2.0, which supports cost=0 for non-deducting peeks.

Add a new 'expression' option for the limit_strategy field in
ai-rate-limiting plugin, allowing users to define custom Lua arithmetic
expressions for dynamic token cost calculation.

When limit_strategy is set to 'expression', the plugin evaluates the
user-defined cost_expr against the raw LLM API usage response fields
(e.g., input_tokens, cache_creation_input_tokens, output_tokens).
Missing variables default to 0, and safe math functions (abs, ceil,
floor, max, min) are available.

This enables use cases like:
- Cache-aware billing: input_tokens + cache_creation_input_tokens
- Weighted costs: input_tokens + cache_read_input_tokens * 0.1 + output_tokens
- Provider-specific fields: any numeric field from the raw usage response
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Apr 9, 2026
…alculation

The open-source limit-count module includes the peek cost (1) in the
remaining header during dry_run access phase, unlike the enterprise
limit-count-advanced module. Adjust all expected remaining values
by -1 to match this behavior.
@nic-6443 nic-6443 requested a review from Copilot April 9, 2026 12:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an expression-based token cost strategy to the ai-rate-limiting plugin so users can compute rate-limit cost from provider-specific usage fields via a Lua arithmetic expression.

Changes:

  • Extends limit_strategy with "expression" and adds cost_expr to the plugin schema.
  • Introduces sandboxed compilation/evaluation of expressions against ctx.llm_raw_usage.
  • Adds a dedicated test suite covering schema validation and (non-)streaming Anthropic scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.

File Description
apisix/plugins/ai-rate-limiting.lua Adds expression strategy, schema field, and runtime expression evaluation for token-cost calculation.
t/plugin/ai-rate-limiting-expression.t Adds integration tests validating expression config and Anthropic streaming/non-streaming behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nic-6443 added 3 commits April 9, 2026 20:25
- Prevent raw usage fields from shadowing safe math functions
  (e.g., a field named 'math' or 'abs' from LLM response)
- Reject non-finite values (NaN/inf) from expression results
- Clamp negative expression results to 0 instead of crediting tokens
- Add test for negative expression result (cache_read > input)
When expression evaluates to a negative value that gets clamped to 0,
calling rate_limit() with cost=0 triggers an assertion failure in
resty.limit.count's incoming function. Skip the call entirely when
used_tokens is 0 since there's nothing to deduct.
@nic-6443
Copy link
Copy Markdown
Member Author

nic-6443 commented Apr 9, 2026

Fixed the CI failure in TEST 13: when the expression evaluates to a negative value that gets clamped to 0, calling rate_limit() with cost=0 triggers an assertion failure in resty.limit.count's incoming function (dict:incr(key, 0, ...)). The fix skips the rate_limit() call entirely when used_tokens == 0 since there's nothing to deduct.

…tics

The lua-resty-limit-traffic library is being upgraded from v1.0.0 to v1.2.0
in the apisix-runtime build. Key library change: incoming_new() now counts UP
(returns consumed) instead of DOWN (returns remaining).

Changes:
- limit-count-local.lua: Convert consumed return value to remaining
  (remaining = limit - consumed), matching the enterprise limit-count-advanced
  module. When commit=false (dry_run), pass cost=0 to the library so it reads
  current state without deducting, eliminating the off-by-1 in remaining header.
- limit-count/init.lua: Add dry_run rejection check inside local-policy branch
  only (not redis, which always commits and has no dry_run support).
- ai-rate-limiting-expression.t: Revert remaining header expectations to match
  enterprise values now that dry_run shows accurate remaining.
@nic-6443 nic-6443 changed the title feat(ai-rate-limiting): add expression-based limit strategy feat(ai-rate-limiting): add expression-based limit strategy and adapt limit-count for limit-traffic v1.2.0 Apr 10, 2026
@nic-6443 nic-6443 changed the title feat(ai-rate-limiting): add expression-based limit strategy and adapt limit-count for limit-traffic v1.2.0 feat(ai-rate-limiting): add expression-based limit strategy Apr 10, 2026
@nic-6443 nic-6443 merged commit ac99cd8 into apache:master Apr 10, 2026
23 checks passed
@nic-6443 nic-6443 deleted the feat/ai-rate-limiting-expression branch April 10, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants