Skip to content

feat(gateway): enforce agent decision budgets#10

Open
JOY (JOY) wants to merge 4 commits into
devfrom
feat/gateway-agent-budget
Open

feat(gateway): enforce agent decision budgets#10
JOY (JOY) wants to merge 4 commits into
devfrom
feat/gateway-agent-budget

Conversation

@JOY
Copy link
Copy Markdown
Contributor

Summary

  • Enforce per-player /v1/agent/decide request rate limits from LLM_RATE_LIMIT_PER_PLAYER_PER_MIN.
  • Enforce per-player daily token budget from LLM_TOKEN_BUDGET_PER_PLAYER_DAY before calling the decider/model.
  • Return structured 429 responses, including Retry-After for minute rate limits.
  • Keep deterministic fallback/local behavior available by allowing zero or negative config values to disable limits.

Fixes #6

Verification

  • go test -count=1 ./...
  • go vet ./...
  • go build -trimpath -o %TEMP%\second-spawn-gateway-rate-limit-test.exe .

@JOY JOY (JOY) force-pushed the feat/gateway-agent-budget branch from 66bb738 to 90c7eeb Compare May 16, 2026 17:11
@JOY
Copy link
Copy Markdown
Contributor Author

Local code-review skill pass

Target: PR #10 feat/gateway): enforce agent decision budgets

Engine Specialist Findings: N/A - no engine specialist configured

.Codex/docs/technical-preferences.md is not present in this worktree, so engine-specialist routing is skipped per the local code-review skill.

Testability: TESTABLE

  • HTTP tests cover the allowed first request, same-player minute rate limit, structured 429 reason, and Retry-After header.
  • Budget tests assert a daily token budget block happens before the decider/model is called.
  • Unit coverage verifies minute windows and UTC daily budget windows reset.

ADR Compliance: NO ADRS FOUND

No ADR reference was found in the implementation file headers or relevant commit messages. The change stays within the accepted gateway boundary: rate and token budget checks happen before LLM/provider execution, while authoritative gameplay state remains outside the gateway.

Standards Compliance: PASS

  • Config values are sourced from existing gateway config fields.
  • Dependencies remain injectable through NewWithDependencies.
  • Limiter state is isolated behind a small helper type and guarded by a mutex.
  • The implementation does not add Unity/client-side provider keys or game-state mutation.

Architecture: CLEAN

  • /v1/agent/decide now enforces request and budget gates before calling the decider, preventing avoidable provider calls once a player is capped.
  • Zero or negative config values disable enforcement, preserving deterministic local fallback behavior for development.
  • The limiter is intentionally in-memory for the prototype gateway process.

SOLID: COMPLIANT

The limiter has one reason to change, and server construction still supports test doubles for the store and decider.

Game-Specific Concerns

Positive Observations

Required Changes

None from local code-review.

Suggestions

  • Track distributed/shared limiter storage before running more than one gateway instance.
  • Later, replace rough request-size token estimates with actual provider usage accounting once provider responses expose usage consistently.

Verdict: APPROVED WITH SUGGESTIONS

Gemini (@gemini-code-assist) review
Codex (@codex) review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements per-player rate limiting and daily token budgeting for agent decisions by introducing an agentDecisionLimiter and updating the handleAgentDecide endpoint. The implementation includes in-memory usage tracking, a Retry-After header for rate-limited requests, and comprehensive unit tests. Feedback identifies a potential memory leak in the player state map, performance concerns regarding token estimation via JSON serialization, and the redundancy of the maxInt helper function given Go 1.21's built-in max function.

I am having trouble creating individual review comments. Click here to see my feedback.

backend/gateway/internal/server/server.go (195)

high

The players map in agentDecisionLimiter grows indefinitely as new players interact with the gateway. This will lead to a memory leak in a long-running process. Consider using a cache with an eviction policy (e.g., an LRU cache or a map with TTL) to manage player states. Since the project already has a Redis configuration, leveraging Redis for distributed rate limiting would be a more robust solution.

backend/gateway/internal/server/server.go (279)

medium

Go 1.21 introduced the built-in max function, making the custom maxInt helper redundant. You can use the built-in function directly.

TokenBudgetRemaining: max(tokenBudget-state.tokensUsed, 0),

backend/gateway/internal/server/server.go (289)

medium

Calling json.Marshal(req) on every request to estimate the token count is computationally expensive, especially as the AgentContext (which includes character memories) grows. Since this is only a rough estimate, consider calculating the length based on the sizes of the individual fields or using a more efficient estimation heuristic that avoids full JSON serialization.

backend/gateway/internal/server/server.go (302-307)

medium

This helper function is redundant as Go 1.21+ provides a built-in max function. It can be removed after updating the call site.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a rate limiting and token budgeting system for agent decisions, introducing an agentDecisionLimiter to track per-player usage. The feedback highlights a high-severity security vulnerability where player IDs are retrieved from the request body instead of authenticated sessions, as well as a potential memory leak in the unbounded player state map. Additionally, the reviewer suggested optimizing the token estimation logic and replacing a custom maxInt utility with Go 1.21's built-in max function.

mu sync.Mutex
cfg *config.Config
now func() time.Time
players map[string]*agentDecisionLimitState
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The players map grows indefinitely as new player IDs are encountered, which will lead to a memory leak over time. Since player state is only relevant for the current day/minute, consider using a cache with expiration (like an LRU cache) or implementing a periodic cleanup routine to remove stale entries.

}
req.Allowed = ensureStopAllowed(req.Allowed)
// TODO(#6): enforce per-player decision rate limits and daily token budgets here.
if allowed, result := s.limiter.Allow(req.Context.Player.PlayerID, estimateAgentDecisionTokens(req)); !allowed {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The PlayerID used for rate limiting and budgeting is taken directly from the request body (req.Context.Player.PlayerID). This allows a malicious user to impersonate another player by providing their ID, potentially exhausting their budget or bypassing their own limits. The PlayerID should be derived from the authenticated user session (e.g., from the Supabase JWT) rather than the client-provided JSON body.

}

playerID = normalizeLimitPlayerID(playerID)
tokenEstimate = maxInt(tokenEstimate, 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the built-in max function introduced in Go 1.21 instead of the custom maxInt utility.

Suggested change
tokenEstimate = maxInt(tokenEstimate, 1)
tokenEstimate = max(tokenEstimate, 1)

TokenEstimate: tokenEstimate,
TokenBudgetPerDay: tokenBudget,
TokenBudgetUsedToday: s.tokensUsed,
TokenBudgetRemaining: maxInt(tokenBudget-s.tokensUsed, 0),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the built-in max function introduced in Go 1.21 instead of the custom maxInt utility.

Suggested change
TokenBudgetRemaining: maxInt(tokenBudget-s.tokensUsed, 0),
TokenBudgetRemaining: max(tokenBudget-s.tokensUsed, 0),

Comment on lines +142 to +147
func maxInt(a int, b int) int {
if a > b {
return a
}
return b
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This utility function is redundant as Go 1.21+ provides a built-in max function. After updating the call sites, this definition can be removed.

Comment on lines +179 to +191
func estimateAgentDecisionTokens(req agent.DecisionRequest) int {
payload, err := json.Marshal(req)
if err != nil {
return agentDecisionOutputTokenReserve
}
// Rough English/JSON estimate: four bytes per token plus the configured
// maximum completion reserve used by the model-backed decision path.
estimate := len(payload)/4 + agentDecisionOutputTokenReserve
if estimate < agentDecisionOutputTokenReserve {
return agentDecisionOutputTokenReserve
}
return estimate
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The estimateAgentDecisionTokens function has a few areas for improvement:\n\n1. Marshaling the entire request back to JSON is inefficient since it was just unmarshaled. This happens on every request, including those that are rate-limited.\n2. The check if estimate < agentDecisionOutputTokenReserve is redundant because len(payload)/4 is always non-negative.\n\nConsider using the original request body length if available, or a simpler heuristic.

@JOY
Copy link
Copy Markdown
Contributor Author

Delta after Gemini review on PR #10:

  • Verified the high-priority concern before changing code. The reviewer was correct that /v1/agent/decide budget enforcement keyed off the request body when no trusted identity was wired.
  • Added a local Supabase-compatible HS256 JWT verifier in the gateway auth package and wired /v1/agent/decide to prefer the verified sub as the limiter key when auth is configured.
  • Kept dev/test fallback behavior when no JWT verifier is configured, so local prototype flows remain usable.
  • Added regression coverage proving two requests with the same trusted auth subject but different body player_id values still hit the same rate-limit key.
  • Added missing-auth 401 coverage when a verifier is configured.
  • Added direct HS256 verifier tests for valid, invalid-signature, and expired tokens.
  • Addressed the unbounded limiter map by pruning stale player limiter state after a daily-budget-scale TTL.
  • Replaced the custom maxInt helper with Go's built-in max.
  • Replaced per-request json.Marshal(req) token estimation with a lightweight field-size heuristic.

Verification on latest head 61bab25:

  • go test -count=1 ./... in backend/gateway passed.
  • go vet ./... in backend/gateway passed.
  • go build -trimpath in backend/gateway passed.
  • Changed-file em-dash/en-dash scan passed.
  • git diff --check passed.

Gemini (@gemini-code-assist) please delta re-review latest commit 61bab25.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements JWT-based authentication for the gateway and introduces per-player rate limiting and token budget enforcement for agent decisions. The changes include a new HS256 JWT verifier, a rate-limiting service, and updates to the server handler to integrate these features. The reviewer highlighted a potential performance bottleneck in the rate limiter's pruning logic, which currently operates with O(N) complexity under a global lock, and suggested improving the maintainability of the token estimation logic, which currently relies on manual field-by-field summation.

l.mu.Lock()
defer l.mu.Unlock()

l.pruneExpired(now)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The pruneExpired method is called inside a global mutex lock on every request. This method iterates over the entire players map, resulting in O(N) complexity under lock. As the number of unique players grows, this will become a significant performance bottleneck and increase latency for all users. Consider pruning periodically in a background goroutine or using a data structure that supports efficient TTL-based eviction.

Comment on lines +215 to +247
func estimateAgentDecisionTokens(req agent.DecisionRequest) int {
chars := len(req.Context.Player.PlayerID) +
len(req.Context.Player.DisplayName) +
len(req.Context.Body.BodyID) +
len(req.Context.Body.ArchetypeID) +
len(req.Context.Body.VisualPrefabKey) +
len(req.Context.Body.Equipment.PrimaryWeapon) +
len(req.Context.Body.Cultivation.Tier) +
len(req.Context.Body.AgentPolicy.Mode) +
len(req.Context.Body.Soul.Name) +
len(req.Context.Body.Soul.CoreDrive) +
len(req.Context.Body.Soul.Temperament) +
len(req.Context.Body.Soul.CombatStyle) +
len(req.Context.Body.Soul.SocialStyle) +
len(req.Context.Body.Soul.PlayerNotes) +
len(req.WorldSnapshot.ZoneID)
for _, goal := range req.Context.Body.Soul.LongTermGoals {
chars += len(goal)
}
for _, boundary := range req.Context.Body.Soul.MoralBoundaries {
chars += len(boundary)
}
for _, memory := range req.Context.Body.Memory {
chars += len(memory.ID) + len(memory.Kind) + len(memory.Summary) + 16
}
chars += len(req.Allowed) * 12
chars += len(req.WorldSnapshot.NearbyTargets) * 48
chars += len(req.WorldSnapshot.NearbyObjects) * 40

// Rough English/JSON estimate: four characters per token plus the
// completion reserve used by the model-backed decision path.
return max(chars/4+agentDecisionOutputTokenReserve, agentDecisionOutputTokenReserve)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The estimateAgentDecisionTokens function manually accesses a large number of fields from the DecisionRequest and WorldSnapshot structs. This approach is fragile and likely to break or become inaccurate as the domain model evolves. Consider using a more maintainable approach, such as implementing a TokenEstimator interface on the request objects or using a more generalized heuristic that doesn't require manual field-by-field summation.

@JOY
Copy link
Copy Markdown
Contributor Author

Delta after Gemini review on limiter maintenance:

  • Accepted the pruning concern. The limiter no longer scans the full player map on every request.
  • Added lastPruned and only prunes expired limiter state at most once per minute while preserving the 25-hour stale-state TTL.
  • Kept the token estimate as the lightweight field-size heuristic introduced in the prior delta. It is intentionally approximate until provider usage accounting is wired; moving it to a shared estimator can happen when more LLM endpoints need the same policy.

Verification on latest head bf15f18:

  • go test -count=1 ./... in backend/gateway passed.
  • go vet ./... in backend/gateway passed.
  • go build -trimpath in backend/gateway passed.
  • Changed-file em-dash/en-dash scan passed.
  • git diff --check passed.

Gemini (@gemini-code-assist) please delta re-review latest commit bf15f18.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a local HS256 JWT verifier for Supabase-compatible authentication and an agentDecisionLimiter to enforce per-player rate limits and daily token budgets. The limiter is integrated into the agent decision endpoint to prevent abuse and manage LLM costs. Feedback focuses on performance and scalability issues within the limiter, specifically the O(N) pruning operation performed under a global mutex and the limitations of in-memory state for multi-instance deployments. Additionally, a security concern was raised regarding the potential for Denial of Service attacks via unauthenticated, oversized JWT headers.

Comment on lines +98 to +110
func (l *agentDecisionLimiter) pruneExpiredIfDue(now time.Time) {
if !l.lastPruned.IsZero() && now.Sub(l.lastPruned) < time.Minute {
return
}
l.lastPruned = now

cutoff := now.Add(-agentDecisionLimitStateTTL)
for playerID, state := range l.players {
if !state.lastSeen.IsZero() && state.lastSeen.Before(cutoff) {
delete(l.players, playerID)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pruneExpiredIfDue function iterates over the entire players map while holding a global mutex. In a production environment with a large number of unique players over a 25-hour window, this O(N) operation will cause significant latency spikes for the request that triggers the pruning (once per minute). Consider moving the pruning logic to a background goroutine or using a more efficient data structure for TTL management (e.g., a linked list or a specialized cache library) to avoid blocking the request path.

Comment on lines +22 to +28
type agentDecisionLimiter struct {
mu sync.Mutex
cfg *config.Config
now func() time.Time
lastPruned time.Time
players map[string]*agentDecisionLimitState
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current limiter implementation is entirely in-memory. While this works for a single-instance deployment, it will not correctly enforce budgets or rate limits across multiple gateway instances. Given that the configuration already includes a RedisURL, consider transitioning this state to Redis to support horizontal scaling and persistence across restarts.

Comment on lines +104 to +106
if err := decodeJWTPart(parts[0], &header); err != nil {
return Identity{}, ErrInvalidJWT
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The JWT header is decoded and its algorithm is checked before the signature is verified. While this is necessary to determine the verification method, the decodeJWTPart function uses json.Unmarshal on unauthenticated input. A maliciously crafted JWT with an extremely large header could lead to excessive memory consumption or a Denial of Service (DoS) before signature verification occurs. Consider adding a maximum length check for the JWT string or its parts before processing.

@JOY
Copy link
Copy Markdown
Contributor Author

Delta after Gemini review on oversized JWT/header DoS and multi-instance limiter tracking:

  • Accepted the oversized unauthenticated JWT concern. The gateway now caps Authorization header size, bearer token size, and individual JWT part size before base64/JSON parsing or signature verification.
  • Added auth unit tests for oversized bearer extraction and oversized JWT parts.
  • Added an endpoint regression test proving /v1/agent/decide returns 401 for oversized bearer input before the decider/model path is called.
  • Opened tracking issue Track distributed agent decision limiter storage #13 for moving the in-memory limiter to Redis or another shared store before running multiple gateway instances.
  • Added TODO(Track distributed agent decision limiter storage #13) at the in-memory limiter state so the production-scaling boundary is visible in code.

Verification on latest head

@JOY
Copy link
Copy Markdown
Contributor Author

Clean delta summary after Gemini review on oversized JWT/header DoS and multi-instance limiter tracking:

  • Accepted the oversized unauthenticated JWT concern. The gateway now caps Authorization header size, bearer token size, and individual JWT part size before base64/JSON parsing or signature verification.
  • Added auth unit tests for oversized bearer extraction and oversized JWT parts.
  • Added an endpoint regression test proving /v1/agent/decide returns 401 for oversized bearer input before the decider/model path is called.
  • Opened tracking issue Track distributed agent decision limiter storage #13 for moving the in-memory limiter to Redis or another shared store before running multiple gateway instances.
  • Added TODO(#13) at the in-memory limiter state so the production-scaling boundary is visible in code.

Verification on latest head 0ac2fa1:

  • go test -count=1 ./... in backend/gateway passed.
  • go vet ./... in backend/gateway passed.
  • go build -trimpath in backend/gateway passed.
  • Changed-file em-dash/en-dash scan passed.
  • git diff --check passed.

Gemini (@gemini-code-assist) please delta re-review latest commit 0ac2fa1.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements JWT-based authentication for Supabase and introduces a per-player rate limiter and token budget tracker for agent decisions. The feedback highlights that the manual token estimation logic is fragile and should be replaced with a more robust JSON-based approach. Additionally, the state pruning mechanism in the limiter is noted as a potential performance bottleneck due to its O(N) complexity while holding a global mutex, suggesting a move to background processing or a TTL cache.

Comment on lines +215 to +247
func estimateAgentDecisionTokens(req agent.DecisionRequest) int {
chars := len(req.Context.Player.PlayerID) +
len(req.Context.Player.DisplayName) +
len(req.Context.Body.BodyID) +
len(req.Context.Body.ArchetypeID) +
len(req.Context.Body.VisualPrefabKey) +
len(req.Context.Body.Equipment.PrimaryWeapon) +
len(req.Context.Body.Cultivation.Tier) +
len(req.Context.Body.AgentPolicy.Mode) +
len(req.Context.Body.Soul.Name) +
len(req.Context.Body.Soul.CoreDrive) +
len(req.Context.Body.Soul.Temperament) +
len(req.Context.Body.Soul.CombatStyle) +
len(req.Context.Body.Soul.SocialStyle) +
len(req.Context.Body.Soul.PlayerNotes) +
len(req.WorldSnapshot.ZoneID)
for _, goal := range req.Context.Body.Soul.LongTermGoals {
chars += len(goal)
}
for _, boundary := range req.Context.Body.Soul.MoralBoundaries {
chars += len(boundary)
}
for _, memory := range req.Context.Body.Memory {
chars += len(memory.ID) + len(memory.Kind) + len(memory.Summary) + 16
}
chars += len(req.Allowed) * 12
chars += len(req.WorldSnapshot.NearbyTargets) * 48
chars += len(req.WorldSnapshot.NearbyObjects) * 40

// Rough English/JSON estimate: four characters per token plus the
// completion reserve used by the model-backed decision path.
return max(chars/4+agentDecisionOutputTokenReserve, agentDecisionOutputTokenReserve)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current manual token estimation is fragile and omits several fields from the DecisionRequest (e.g., ReincarnationLore, Characteristics, AgentPolicy details, and most WorldSnapshot scalar fields). This leads to underestimating the actual token usage. A more robust and maintainable approach is to marshal the request to JSON and use its length as the character count, which directly reflects the payload size sent to the LLM provider.

func estimateAgentDecisionTokens(req agent.DecisionRequest) int {
	payload, _ := json.Marshal(req)
	return max(len(payload)/4+agentDecisionOutputTokenReserve, agentDecisionOutputTokenReserve)
}

Comment on lines +100 to +112
func (l *agentDecisionLimiter) pruneExpiredIfDue(now time.Time) {
if !l.lastPruned.IsZero() && now.Sub(l.lastPruned) < time.Minute {
return
}
l.lastPruned = now

cutoff := now.Add(-agentDecisionLimitStateTTL)
for playerID, state := range l.players {
if !state.lastSeen.IsZero() && state.lastSeen.Before(cutoff) {
delete(l.players, playerID)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pruneExpiredIfDue function performs an O(N) scan over the entire players map while holding the global mu mutex. As the number of unique players grows, this will cause significant latency spikes for the request that triggers the pruning (once per minute). Consider performing the pruning in a background goroutine or using a data structure that supports more efficient expiration (like a TTL cache) to avoid blocking the request path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant