Skip to content

Conversation

@hammadtq
Copy link
Collaborator

Fix Rate Limiting & Align Development/Production Parity

Summary

This PR fixes critical issues with the rate limiting feature and ensures functional parity between the development entry point (main.py) and production entry point (attach/gateway.py).

Issues Fixed

Rate Limiting Not Working

  • Root Cause: TokenQuotaMiddleware was looking for user info in request.headers.get("x-attach-user") but this header is only set in the response by session middleware
  • Fix: Changed to use request.state.sub (set by auth middleware) for user identification
  • Result: MAX_TOKENS_PER_MIN now properly enforces token quotas with 429 responses

Development vs Production Inconsistencies

  • Issue: main.py and attach/gateway.py had different middleware configurations, route setups, and behavior
  • Fixes Applied:
    • ✅ Aligned middleware order (CORS outermost)
    • ✅ Synchronized conditional TokenQuotaMiddleware loading
    • ✅ Added missing routes (logs_router, /a2a prefix, /auth/config)
    • ✅ Aligned MemoryEvent query fields (["timestamp", "event", "user", "state"])
    • ✅ Added lifespan management for proper resource cleanup
    • ✅ Unified Weaviate URL configuration
    • ✅ Removed unused imports

Code Quality & Production Readiness

  • Removed dev comments: Cleaned up main.py to look professional
  • Removed unused imports: mem_write function was imported but never used
  • Standardized middleware pattern: Both entry points now use app.add_middleware() consistently

Testing

# Development mode
uvicorn main:app --host 0.0.0.0 --port 8080

# Production mode  
pip install -e .
attach-gateway --port 8080

# Test rate limiting
export MAX_TOKENS_PER_MIN=10
curl -H "Authorization: Bearer $JWT" \
     -d '{"model":"tinyllama","messages":[{"role":"user","content":"test"}]}' \
     http://localhost:8080/api/chat
# Should return 429 after exceeding limit

📋 Environment Variables

Updated documentation for OpenMeter integration:

export MAX_TOKENS_PER_MIN=60000              # Required for quota middleware
export USAGE_METERING=openmeter              # Activates OpenMeter backend  
export OPENMETER_API_KEY=your-api-key-here   # Required for OpenMeter

Impact

  • ✅ Rate limiting now works correctly in both dev and production
  • uvicorn main:app and attach-gateway behave identically
  • ✅ Clean, production-ready codebase
  • ✅ Proper resource cleanup on shutdown
  • ✅ CORS issues resolved
  • ✅ OpenMeter integration properly documented

Breaking Changes

None - this is purely a bug fix and alignment update.

@hammadtq hammadtq merged commit 5552b5d into dev Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants