feat(gateway): enforce agent decision budgets#10
Conversation
66bb738 to
90c7eeb
Compare
Local code-review skill passTarget: PR #10 Engine Specialist Findings: N/A - no engine specialist configured
Testability: TESTABLE
ADR Compliance: NO ADRS FOUNDNo ADR reference was found in the implementation file headers or relevant commit messages. The change stays within the accepted gateway boundary: rate and token budget checks happen before LLM/provider execution, while authoritative gameplay state remains outside the gateway. Standards Compliance: PASS
Architecture: CLEAN
SOLID: COMPLIANTThe limiter has one reason to change, and server construction still supports test doubles for the store and decider. Game-Specific Concerns
Positive Observations
Required ChangesNone from local code-review. Suggestions
Verdict: APPROVED WITH SUGGESTIONSGemini (@gemini-code-assist) review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Code Review
This pull request implements per-player rate limiting and daily token budgeting for agent decisions by introducing an agentDecisionLimiter and updating the handleAgentDecide endpoint. The implementation includes in-memory usage tracking, a Retry-After header for rate-limited requests, and comprehensive unit tests. Feedback identifies a potential memory leak in the player state map, performance concerns regarding token estimation via JSON serialization, and the redundancy of the maxInt helper function given Go 1.21's built-in max function.
I am having trouble creating individual review comments. Click here to see my feedback.
backend/gateway/internal/server/server.go (195)
The players map in agentDecisionLimiter grows indefinitely as new players interact with the gateway. This will lead to a memory leak in a long-running process. Consider using a cache with an eviction policy (e.g., an LRU cache or a map with TTL) to manage player states. Since the project already has a Redis configuration, leveraging Redis for distributed rate limiting would be a more robust solution.
backend/gateway/internal/server/server.go (279)
Go 1.21 introduced the built-in max function, making the custom maxInt helper redundant. You can use the built-in function directly.
TokenBudgetRemaining: max(tokenBudget-state.tokensUsed, 0),
backend/gateway/internal/server/server.go (289)
Calling json.Marshal(req) on every request to estimate the token count is computationally expensive, especially as the AgentContext (which includes character memories) grows. Since this is only a rough estimate, consider calculating the length based on the sizes of the individual fields or using a more efficient estimation heuristic that avoids full JSON serialization.
backend/gateway/internal/server/server.go (302-307)
This helper function is redundant as Go 1.21+ provides a built-in max function. It can be removed after updating the call site.
There was a problem hiding this comment.
Code Review
This pull request implements a rate limiting and token budgeting system for agent decisions, introducing an agentDecisionLimiter to track per-player usage. The feedback highlights a high-severity security vulnerability where player IDs are retrieved from the request body instead of authenticated sessions, as well as a potential memory leak in the unbounded player state map. Additionally, the reviewer suggested optimizing the token estimation logic and replacing a custom maxInt utility with Go 1.21's built-in max function.
| mu sync.Mutex | ||
| cfg *config.Config | ||
| now func() time.Time | ||
| players map[string]*agentDecisionLimitState |
There was a problem hiding this comment.
The players map grows indefinitely as new player IDs are encountered, which will lead to a memory leak over time. Since player state is only relevant for the current day/minute, consider using a cache with expiration (like an LRU cache) or implementing a periodic cleanup routine to remove stale entries.
| } | ||
| req.Allowed = ensureStopAllowed(req.Allowed) | ||
| // TODO(#6): enforce per-player decision rate limits and daily token budgets here. | ||
| if allowed, result := s.limiter.Allow(req.Context.Player.PlayerID, estimateAgentDecisionTokens(req)); !allowed { |
There was a problem hiding this comment.
The PlayerID used for rate limiting and budgeting is taken directly from the request body (req.Context.Player.PlayerID). This allows a malicious user to impersonate another player by providing their ID, potentially exhausting their budget or bypassing their own limits. The PlayerID should be derived from the authenticated user session (e.g., from the Supabase JWT) rather than the client-provided JSON body.
| } | ||
|
|
||
| playerID = normalizeLimitPlayerID(playerID) | ||
| tokenEstimate = maxInt(tokenEstimate, 1) |
| TokenEstimate: tokenEstimate, | ||
| TokenBudgetPerDay: tokenBudget, | ||
| TokenBudgetUsedToday: s.tokensUsed, | ||
| TokenBudgetRemaining: maxInt(tokenBudget-s.tokensUsed, 0), |
| func maxInt(a int, b int) int { | ||
| if a > b { | ||
| return a | ||
| } | ||
| return b | ||
| } |
| func estimateAgentDecisionTokens(req agent.DecisionRequest) int { | ||
| payload, err := json.Marshal(req) | ||
| if err != nil { | ||
| return agentDecisionOutputTokenReserve | ||
| } | ||
| // Rough English/JSON estimate: four bytes per token plus the configured | ||
| // maximum completion reserve used by the model-backed decision path. | ||
| estimate := len(payload)/4 + agentDecisionOutputTokenReserve | ||
| if estimate < agentDecisionOutputTokenReserve { | ||
| return agentDecisionOutputTokenReserve | ||
| } | ||
| return estimate | ||
| } |
There was a problem hiding this comment.
The estimateAgentDecisionTokens function has a few areas for improvement:\n\n1. Marshaling the entire request back to JSON is inefficient since it was just unmarshaled. This happens on every request, including those that are rate-limited.\n2. The check if estimate < agentDecisionOutputTokenReserve is redundant because len(payload)/4 is always non-negative.\n\nConsider using the original request body length if available, or a simpler heuristic.
|
Delta after Gemini review on PR #10:
Verification on latest head
Gemini (@gemini-code-assist) please delta re-review latest commit 61bab25. |
There was a problem hiding this comment.
Code Review
This pull request implements JWT-based authentication for the gateway and introduces per-player rate limiting and token budget enforcement for agent decisions. The changes include a new HS256 JWT verifier, a rate-limiting service, and updates to the server handler to integrate these features. The reviewer highlighted a potential performance bottleneck in the rate limiter's pruning logic, which currently operates with O(N) complexity under a global lock, and suggested improving the maintainability of the token estimation logic, which currently relies on manual field-by-field summation.
| l.mu.Lock() | ||
| defer l.mu.Unlock() | ||
|
|
||
| l.pruneExpired(now) |
There was a problem hiding this comment.
The pruneExpired method is called inside a global mutex lock on every request. This method iterates over the entire players map, resulting in O(N) complexity under lock. As the number of unique players grows, this will become a significant performance bottleneck and increase latency for all users. Consider pruning periodically in a background goroutine or using a data structure that supports efficient TTL-based eviction.
| func estimateAgentDecisionTokens(req agent.DecisionRequest) int { | ||
| chars := len(req.Context.Player.PlayerID) + | ||
| len(req.Context.Player.DisplayName) + | ||
| len(req.Context.Body.BodyID) + | ||
| len(req.Context.Body.ArchetypeID) + | ||
| len(req.Context.Body.VisualPrefabKey) + | ||
| len(req.Context.Body.Equipment.PrimaryWeapon) + | ||
| len(req.Context.Body.Cultivation.Tier) + | ||
| len(req.Context.Body.AgentPolicy.Mode) + | ||
| len(req.Context.Body.Soul.Name) + | ||
| len(req.Context.Body.Soul.CoreDrive) + | ||
| len(req.Context.Body.Soul.Temperament) + | ||
| len(req.Context.Body.Soul.CombatStyle) + | ||
| len(req.Context.Body.Soul.SocialStyle) + | ||
| len(req.Context.Body.Soul.PlayerNotes) + | ||
| len(req.WorldSnapshot.ZoneID) | ||
| for _, goal := range req.Context.Body.Soul.LongTermGoals { | ||
| chars += len(goal) | ||
| } | ||
| for _, boundary := range req.Context.Body.Soul.MoralBoundaries { | ||
| chars += len(boundary) | ||
| } | ||
| for _, memory := range req.Context.Body.Memory { | ||
| chars += len(memory.ID) + len(memory.Kind) + len(memory.Summary) + 16 | ||
| } | ||
| chars += len(req.Allowed) * 12 | ||
| chars += len(req.WorldSnapshot.NearbyTargets) * 48 | ||
| chars += len(req.WorldSnapshot.NearbyObjects) * 40 | ||
|
|
||
| // Rough English/JSON estimate: four characters per token plus the | ||
| // completion reserve used by the model-backed decision path. | ||
| return max(chars/4+agentDecisionOutputTokenReserve, agentDecisionOutputTokenReserve) | ||
| } |
There was a problem hiding this comment.
The estimateAgentDecisionTokens function manually accesses a large number of fields from the DecisionRequest and WorldSnapshot structs. This approach is fragile and likely to break or become inaccurate as the domain model evolves. Consider using a more maintainable approach, such as implementing a TokenEstimator interface on the request objects or using a more generalized heuristic that doesn't require manual field-by-field summation.
|
Delta after Gemini review on limiter maintenance:
Verification on latest head
Gemini (@gemini-code-assist) please delta re-review latest commit bf15f18. |
There was a problem hiding this comment.
Code Review
This pull request introduces a local HS256 JWT verifier for Supabase-compatible authentication and an agentDecisionLimiter to enforce per-player rate limits and daily token budgets. The limiter is integrated into the agent decision endpoint to prevent abuse and manage LLM costs. Feedback focuses on performance and scalability issues within the limiter, specifically the O(N) pruning operation performed under a global mutex and the limitations of in-memory state for multi-instance deployments. Additionally, a security concern was raised regarding the potential for Denial of Service attacks via unauthenticated, oversized JWT headers.
| func (l *agentDecisionLimiter) pruneExpiredIfDue(now time.Time) { | ||
| if !l.lastPruned.IsZero() && now.Sub(l.lastPruned) < time.Minute { | ||
| return | ||
| } | ||
| l.lastPruned = now | ||
|
|
||
| cutoff := now.Add(-agentDecisionLimitStateTTL) | ||
| for playerID, state := range l.players { | ||
| if !state.lastSeen.IsZero() && state.lastSeen.Before(cutoff) { | ||
| delete(l.players, playerID) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The pruneExpiredIfDue function iterates over the entire players map while holding a global mutex. In a production environment with a large number of unique players over a 25-hour window, this O(N) operation will cause significant latency spikes for the request that triggers the pruning (once per minute). Consider moving the pruning logic to a background goroutine or using a more efficient data structure for TTL management (e.g., a linked list or a specialized cache library) to avoid blocking the request path.
| type agentDecisionLimiter struct { | ||
| mu sync.Mutex | ||
| cfg *config.Config | ||
| now func() time.Time | ||
| lastPruned time.Time | ||
| players map[string]*agentDecisionLimitState | ||
| } |
There was a problem hiding this comment.
The current limiter implementation is entirely in-memory. While this works for a single-instance deployment, it will not correctly enforce budgets or rate limits across multiple gateway instances. Given that the configuration already includes a RedisURL, consider transitioning this state to Redis to support horizontal scaling and persistence across restarts.
| if err := decodeJWTPart(parts[0], &header); err != nil { | ||
| return Identity{}, ErrInvalidJWT | ||
| } |
There was a problem hiding this comment.
The JWT header is decoded and its algorithm is checked before the signature is verified. While this is necessary to determine the verification method, the decodeJWTPart function uses json.Unmarshal on unauthenticated input. A maliciously crafted JWT with an extremely large header could lead to excessive memory consumption or a Denial of Service (DoS) before signature verification occurs. Consider adding a maximum length check for the JWT string or its parts before processing.
|
Delta after Gemini review on oversized JWT/header DoS and multi-instance limiter tracking:
Verification on latest head |
|
Clean delta summary after Gemini review on oversized JWT/header DoS and multi-instance limiter tracking:
Verification on latest head
Gemini (@gemini-code-assist) please delta re-review latest commit 0ac2fa1. |
There was a problem hiding this comment.
Code Review
This pull request implements JWT-based authentication for Supabase and introduces a per-player rate limiter and token budget tracker for agent decisions. The feedback highlights that the manual token estimation logic is fragile and should be replaced with a more robust JSON-based approach. Additionally, the state pruning mechanism in the limiter is noted as a potential performance bottleneck due to its O(N) complexity while holding a global mutex, suggesting a move to background processing or a TTL cache.
| func estimateAgentDecisionTokens(req agent.DecisionRequest) int { | ||
| chars := len(req.Context.Player.PlayerID) + | ||
| len(req.Context.Player.DisplayName) + | ||
| len(req.Context.Body.BodyID) + | ||
| len(req.Context.Body.ArchetypeID) + | ||
| len(req.Context.Body.VisualPrefabKey) + | ||
| len(req.Context.Body.Equipment.PrimaryWeapon) + | ||
| len(req.Context.Body.Cultivation.Tier) + | ||
| len(req.Context.Body.AgentPolicy.Mode) + | ||
| len(req.Context.Body.Soul.Name) + | ||
| len(req.Context.Body.Soul.CoreDrive) + | ||
| len(req.Context.Body.Soul.Temperament) + | ||
| len(req.Context.Body.Soul.CombatStyle) + | ||
| len(req.Context.Body.Soul.SocialStyle) + | ||
| len(req.Context.Body.Soul.PlayerNotes) + | ||
| len(req.WorldSnapshot.ZoneID) | ||
| for _, goal := range req.Context.Body.Soul.LongTermGoals { | ||
| chars += len(goal) | ||
| } | ||
| for _, boundary := range req.Context.Body.Soul.MoralBoundaries { | ||
| chars += len(boundary) | ||
| } | ||
| for _, memory := range req.Context.Body.Memory { | ||
| chars += len(memory.ID) + len(memory.Kind) + len(memory.Summary) + 16 | ||
| } | ||
| chars += len(req.Allowed) * 12 | ||
| chars += len(req.WorldSnapshot.NearbyTargets) * 48 | ||
| chars += len(req.WorldSnapshot.NearbyObjects) * 40 | ||
|
|
||
| // Rough English/JSON estimate: four characters per token plus the | ||
| // completion reserve used by the model-backed decision path. | ||
| return max(chars/4+agentDecisionOutputTokenReserve, agentDecisionOutputTokenReserve) | ||
| } |
There was a problem hiding this comment.
The current manual token estimation is fragile and omits several fields from the DecisionRequest (e.g., ReincarnationLore, Characteristics, AgentPolicy details, and most WorldSnapshot scalar fields). This leads to underestimating the actual token usage. A more robust and maintainable approach is to marshal the request to JSON and use its length as the character count, which directly reflects the payload size sent to the LLM provider.
func estimateAgentDecisionTokens(req agent.DecisionRequest) int {
payload, _ := json.Marshal(req)
return max(len(payload)/4+agentDecisionOutputTokenReserve, agentDecisionOutputTokenReserve)
}| func (l *agentDecisionLimiter) pruneExpiredIfDue(now time.Time) { | ||
| if !l.lastPruned.IsZero() && now.Sub(l.lastPruned) < time.Minute { | ||
| return | ||
| } | ||
| l.lastPruned = now | ||
|
|
||
| cutoff := now.Add(-agentDecisionLimitStateTTL) | ||
| for playerID, state := range l.players { | ||
| if !state.lastSeen.IsZero() && state.lastSeen.Before(cutoff) { | ||
| delete(l.players, playerID) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The pruneExpiredIfDue function performs an O(N) scan over the entire players map while holding the global mu mutex. As the number of unique players grows, this will cause significant latency spikes for the request that triggers the pruning (once per minute). Consider performing the pruning in a background goroutine or using a data structure that supports more efficient expiration (like a TTL cache) to avoid blocking the request path.
Summary
Fixes #6
Verification