-
Notifications
You must be signed in to change notification settings - Fork 399
Description
🌐 Performance - HTTP/2 & Keep-Alive Transport
Goal
Optimize HTTP protocol and connection handling for better performance:
- Enable HTTP/2 support in both development (uvicorn) and production (gunicorn) servers
- Configure HTTP Keep-Alive settings to reuse connections efficiently
- Enable connection pooling optimizations in httpx client
- Configure timeouts and connection limits appropriately
- Add uvicorn[standard] dependency for HTTP/2 support
This reduces connection overhead, enables multiplexing, and improves performance through HPACK header compression and connection reuse.
Why Now?
HTTP/2 and Keep-Alive optimizations provide measurable performance improvements:
- Connection Multiplexing: Multiple requests over single TCP connection reduces latency by 50-200ms
- Header Compression: HPACK compression reduces overhead by 50-90% for headers
- Browser Support: 97%+ of browsers support HTTP/2
- Connection Reuse: Keep-Alive eliminates connection setup/teardown overhead
- Better Mobile Performance: Reduced connection overhead critical for mobile networks
- Modern Standard: HTTP/2 is now the default for most web services
📖 User Stories
US-1: API Client - HTTP/2 Multiplexing for Multiple Requests
As an API Client
I want to use HTTP/2 multiplexing for concurrent requests
So that multiple API calls use a single connection and reduce latency
Acceptance Criteria:
Given I am an API client that supports HTTP/2
When I make 10 concurrent requests to GET /tools
Then all requests should multiplex over a single TCP connection
And the response should include HTTP/2 protocol headers
And total request time should be 50% faster than HTTP/1.1
Given I am monitoring network connections
When I make multiple API requests
Then I should see only 1-2 connections to the server
And no head-of-line blocking should occur
Given I am using curl with HTTP/2
When I run "curl --http2 -v https://localhost:4444/health"
Then the response should show "HTTP/2 200"
And the connection should be reused for subsequent requestsTechnical Requirements:
- Install uvicorn[standard] with h2 library
- Enable HTTP/2 in uvicorn and gunicorn
- Verify multiplexing with browser DevTools
- Measure latency reduction for concurrent requests
US-2: Admin UI User - Faster Page Loads with HTTP/2
As an Admin UI User
I want the admin interface to load quickly using HTTP/2
So that pages with many assets load efficiently
Acceptance Criteria:
Given I am viewing the Admin UI dashboard
When the page loads with HTTP/2 enabled
Then all CSS, JS, and API requests should multiplex over one connection
And the page should load 30-40% faster than HTTP/1.1
And browser DevTools should show "h2" protocol
Given I am navigating between admin pages
When HTTP/2 is enabled
Then connection reuse should eliminate handshake overhead
And page transitions should feel instantTechnical Requirements:
- HTTP/2 enabled for all endpoints
- Browser automatically uses HTTP/2 when available
- No JavaScript changes required
US-3: DevOps Engineer - Configure HTTP/2 and Keep-Alive
As a DevOps Engineer
I want to configure HTTP/2 and Keep-Alive settings
So that I can optimize connection handling for my deployment
Acceptance Criteria:
Given I want to enable HTTP/2 in development
When I run "make dev"
Then uvicorn should start with HTTP/2 support
And the logs should show "HTTP/2 enabled"
Given I want to configure Keep-Alive timeout
When I set GUNICORN_KEEPALIVE=10 in environment
Then connections should be kept alive for 10 seconds
And the Connection: keep-alive header should be present
Given I want to optimize connection pooling
When I configure httpx client limits
Then outgoing requests should reuse connections
And connection pool metrics should be availableTechnical Requirements:
- Environment variables for Keep-Alive configuration
- gunicorn.config.py with HTTP/2 settings
- Connection pool configuration in httpx client
🏗 Architecture
HTTP/2 Multiplexing Flow
graph TD
A[Browser/Client] -->|Single TCP Connection| B[HTTP/2 Server]
B --> C[Stream 1: GET /tools]
B --> D[Stream 2: GET /servers]
B --> E[Stream 3: GET /static/css/style.css]
B --> F[Stream 4: GET /static/js/app.js]
C --> G[Multiplexed Response]
D --> G
E --> G
F --> G
G -->|Single Connection| A
H[HTTP/1.1 Comparison] -->|6 Separate Connections| I[Sequential Requests]
I --> J[Request 1] --> K[Request 2] --> L[Request 3]
Connection Reuse with Keep-Alive
sequenceDiagram
participant Client
participant Server
Note over Client,Server: HTTP/1.1 with Keep-Alive
Client->>Server: Request 1 + Connection: keep-alive
Server->>Client: Response 1 + Connection: keep-alive
Note over Client,Server: Connection stays open (5s timeout)
Client->>Server: Request 2 (reuse connection)
Server->>Client: Response 2
Client->>Server: Request 3 (reuse connection)
Server->>Client: Response 3
Note over Client,Server: HTTP/1.1 without Keep-Alive
Client->>Server: Request 1
Server->>Client: Response 1 + Connection: close
Note over Client,Server: Connection closed
Client->>Server: Request 2 (new TCP handshake)
Server->>Client: Response 2 + Connection: close
Configuration Examples
# gunicorn.config.py
import os
import multiprocessing
# Server socket
bind = f"0.0.0.0:{os.getenv('PORT', '4444')}"
# Worker processes
workers = int(os.getenv("GUNICORN_WORKERS", multiprocessing.cpu_count() * 2 + 1))
worker_class = "uvicorn.workers.UvicornWorker" # Automatically enables HTTP/2 if h2 installed
# Keep-Alive settings
keepalive = int(os.getenv("GUNICORN_KEEPALIVE", "5")) # Keep connections alive for 5 seconds
worker_connections = 1000 # Max simultaneous connections per worker
# Timeouts
timeout = int(os.getenv("GUNICORN_TIMEOUT", "600"))
graceful_timeout = 30# mcpgateway/utils/retry_manager.py - HTTP Client Configuration
import httpx
# Configure connection pooling for outgoing requests
limits = httpx.Limits(
max_keepalive_connections=20, # Keep 20 connections in pool
max_connections=100, # Max total connections
keepalive_expiry=30.0 # Keep connections alive for 30 seconds
)
client = httpx.AsyncClient(
limits=limits,
http2=True, # Enable HTTP/2 for outgoing requests
timeout=httpx.Timeout(30.0)
)📋 Implementation Tasks
Phase 1: Dependencies & Setup ✅
- Add HTTP/2 Dependencies
- Add
uvicorn[standard]>=0.30.0to pyproject.toml dependencies section - This includes h2, httptools, uvloop, and websockets for optimal performance
- Run
make install-devto install the package - Verify h2 library installed:
python -c "import h2; print(h2.__version__)"
- Add
Phase 2: Development Server Configuration ✅
-
Enable HTTP/2 in Development Server
- Update Makefile
devtarget (around line 194) - Add
--http h2flag to uvicorn command - Full command:
uvicorn mcpgateway.main:app --host 0.0.0.0 --port 8000 --reload --http h2 - Add comment explaining HTTP/2 requirement (needs uvicorn[standard])
- Update Makefile
-
Test Development Server
- Start dev server:
make dev - Verify startup logs show HTTP/2 support
- Test with curl:
curl --http2 -v http://localhost:8000/health - Verify response shows HTTP/2 headers
- Start dev server:
Phase 3: Production Server Configuration ✅
-
Create/Update gunicorn.config.py
- Create gunicorn.config.py in project root if it doesn't exist
- Add HTTP/2 configuration using UvicornWorker
- Configure Keep-Alive settings (keepalive=5)
- Set worker_connections=1000 for concurrent handling
- Add environment variable overrides for all settings
- Add comprehensive comments explaining each setting
-
Verify UvicornWorker HTTP/2 Support
- Document that UvicornWorker automatically enables HTTP/2 when h2 is installed
- No additional flags needed for gunicorn
- Test production server:
make serve - Verify with curl:
curl --http2 -v http://localhost:4444/health
Phase 4: Keep-Alive Configuration ✅
-
Configure Keep-Alive Settings
- Set
keepalive = 5in gunicorn.config.py (keep connections alive for 5 seconds) - Add
GUNICORN_KEEPALIVEenvironment variable support - Add
--timeout-keep-alive 5to uvicorn dev command - Document optimal Keep-Alive values (5-10 seconds typical)
- Set
-
Verify Keep-Alive Headers
- Test with curl:
curl -v http://localhost:4444/health - Verify
Connection: keep-aliveheader in response - Test connection reuse with multiple sequential requests
- Measure latency reduction from connection reuse
- Test with curl:
Phase 5: HTTP Client Optimization ✅
-
Review ResilientHttpClient Configuration
- Open mcpgateway/utils/retry_manager.py
- Check if httpx.AsyncClient has
limitsparameter configured - Verify connection pooling settings exist
-
Add Connection Pool Configuration
- If missing, add httpx.Limits configuration:
- max_keepalive_connections=20
- max_connections=100
- keepalive_expiry=30.0
- Enable HTTP/2 for outgoing requests:
http2=True - Add comments explaining pooling benefits
- If missing, add httpx.Limits configuration:
-
Test Outgoing Connection Pooling
- Make multiple requests to same upstream server
- Verify connection reuse in logs
- Measure performance improvement for federation sync
Phase 6: Testing & Validation ✅
-
Test HTTP/2 Multiplexing
- Open admin UI in Chrome browser
- Open DevTools → Network tab
- Verify "Protocol" column shows "h2"
- Verify all requests use same connection ID
- Take screenshot of multiplexing in action
-
Test Header Compression
- Compare header sizes in HTTP/1.1 vs HTTP/2
- Verify HPACK compression reduces header overhead
- Measure header size reduction (typically 50-90%)
-
Load Test HTTP/2 vs HTTP/1.1
- Run benchmark with wrk:
wrk -t4 -c100 -d30s http://localhost:4444/tools - Record: requests/second, latency percentiles
- Disable HTTP/2 and repeat benchmark
- Compare results, document improvement
- Run benchmark with wrk:
-
Test Connection Reuse
- Use curl with verbose output for multiple requests
- Verify "Re-using existing connection" messages
- Measure time saved from avoiding TCP handshake
Phase 7: Documentation ✅
-
Update CLAUDE.md
- Add section on HTTP/2 configuration
- Document uvicorn[standard] requirement
- Explain Keep-Alive settings and benefits
- Add testing instructions for HTTP/2
-
Update .env.example
- Add
GUNICORN_KEEPALIVE=5with explanation - Add
HTTP2_ENABLED=true(optional, default when h2 installed) - Add
GUNICORN_WORKER_CONNECTIONS=1000 - Document connection pooling settings
- Add
-
Create Performance Documentation
- Document HTTP/2 benefits (multiplexing, header compression)
- Document Keep-Alive benefits (connection reuse)
- Add troubleshooting section (TLS requirement for browsers)
- Add performance comparison charts
Phase 8: Quality Assurance ✅
-
Code Quality
- Run
make autoflake isort blackto format code - Run
make flake8and fix any issues - Run
make pylintand address warnings - Pass
make verifychecks
- Run
-
Testing
- Verify all existing tests still pass
- Add integration test for HTTP/2 support
- Add test for Keep-Alive behavior
- Test TLS/SSL with HTTP/2 (browsers require it)
✅ Success Criteria
- uvicorn[standard] installed with h2, httptools, uvloop
- HTTP/2 enabled in development server (uvicorn)
- HTTP/2 enabled in production server (gunicorn)
- Keep-Alive configured and working (Connection: keep-alive header)
- Connection pooling optimized in httpx client
- Browser DevTools shows "h2" protocol for all requests
- Multiple requests multiplex over single connection
- Header compression (HPACK) verified (50-90% reduction)
- Performance improvement measurable (20-40% faster page loads)
- Connection reuse working (no repeated TCP handshakes)
- Documentation updated with configuration examples
- Load testing confirms performance gains
🏁 Definition of Done
- uvicorn[standard] added to pyproject.toml and installed
- HTTP/2 enabled in Makefile dev target (--http h2 flag)
- gunicorn.config.py created/updated with HTTP/2 and Keep-Alive config
- Keep-Alive settings configured (keepalive=5, worker_connections=1000)
- HTTP client connection pooling verified/optimized
- Timeout settings reviewed and documented
- Browser testing confirms HTTP/2 working (DevTools shows h2)
- Load testing shows 20-40% performance improvement
- Code passes
make verifychecks - Documentation updated (CLAUDE.md, .env.example)
- No regression in existing tests
- Ready for production deployment
📝 Additional Notes
🔹 HTTP/2 Benefits:
- Multiplexing: 6-10x more efficient than HTTP/1.1 (no head-of-line blocking at HTTP layer)
- Header Compression: 50-90% reduction in header overhead using HPACK
- Binary Protocol: More efficient parsing than text-based HTTP/1.1
- Server Push: Can push assets before client requests (optional, rarely used)
- Stream Prioritization: Allows client to prioritize important requests
🔹 Keep-Alive Benefits:
- Eliminates TCP handshake overhead (typically 50-200ms) for subsequent requests
- Reduces server load from constant connection open/close
- Critical for API clients making multiple sequential requests
- Improves throughput on high-latency networks (mobile, satellite)
- Reduces TIME_WAIT socket exhaustion on high-traffic servers
🔹 Connection Pooling Benefits:
- Reuses connections for outgoing requests (gateway → MCP servers)
- Reduces load on upstream servers
- Improves federation performance (faster tool catalog sync)
- Prevents connection exhaustion under load
🔹 TLS Requirement for HTTP/2:
- Most browsers require HTTP/2 over TLS (HTTPS) due to security
- Plain HTTP/2 (h2c) supported by curl and some clients
- For local development, HTTP/2 works over plain HTTP
- For production, use
make serve-sslwith valid TLS certificates - HTTP/2 without TLS is called "h2c" (HTTP/2 Cleartext)
🔹 Performance Comparison (typical):
- HTTP/1.1 without Keep-Alive: 100 req/s, 500ms p95 latency
- HTTP/1.1 with Keep-Alive: 250 req/s, 200ms p95 latency
- HTTP/2 with multiplexing: 400 req/s, 100ms p95 latency
- Result: 4x throughput improvement, 5x latency reduction
🔹 Troubleshooting:
- If browser doesn't use HTTP/2, verify TLS is enabled
- If curl shows HTTP/1.1, add
--http2flag explicitly - If h2 not installed, HTTP/2 silently falls back to HTTP/1.1
- Check server logs for HTTP/2 startup messages
🔗 Related Issues
- Part of Performance Optimization initiative
- Related to [Feature]🔐 Configurable Password Expiration with Forced Password Change on Login #1282 (Compression) - HTTP/2 and compression work together
- Related to Epic 4: Performance - Static Asset Caching (HTTP/2 better for many small files)
- Related to Epic 6: Performance - Production Tuning (gunicorn configuration)