Skip to content

[Epic] 🌐 Performance - HTTP/2 & Keep-Alive Transport #1293

@crivetimihai

Description

@crivetimihai

🌐 Performance - HTTP/2 & Keep-Alive Transport

Goal

Optimize HTTP protocol and connection handling for better performance:

  1. Enable HTTP/2 support in both development (uvicorn) and production (gunicorn) servers
  2. Configure HTTP Keep-Alive settings to reuse connections efficiently
  3. Enable connection pooling optimizations in httpx client
  4. Configure timeouts and connection limits appropriately
  5. Add uvicorn[standard] dependency for HTTP/2 support

This reduces connection overhead, enables multiplexing, and improves performance through HPACK header compression and connection reuse.

Why Now?

HTTP/2 and Keep-Alive optimizations provide measurable performance improvements:

  1. Connection Multiplexing: Multiple requests over single TCP connection reduces latency by 50-200ms
  2. Header Compression: HPACK compression reduces overhead by 50-90% for headers
  3. Browser Support: 97%+ of browsers support HTTP/2
  4. Connection Reuse: Keep-Alive eliminates connection setup/teardown overhead
  5. Better Mobile Performance: Reduced connection overhead critical for mobile networks
  6. Modern Standard: HTTP/2 is now the default for most web services

📖 User Stories

US-1: API Client - HTTP/2 Multiplexing for Multiple Requests

As an API Client
I want to use HTTP/2 multiplexing for concurrent requests
So that multiple API calls use a single connection and reduce latency

Acceptance Criteria:

Given I am an API client that supports HTTP/2
When I make 10 concurrent requests to GET /tools
Then all requests should multiplex over a single TCP connection
And the response should include HTTP/2 protocol headers
And total request time should be 50% faster than HTTP/1.1

Given I am monitoring network connections
When I make multiple API requests
Then I should see only 1-2 connections to the server
And no head-of-line blocking should occur

Given I am using curl with HTTP/2
When I run "curl --http2 -v https://localhost:4444/health"
Then the response should show "HTTP/2 200"
And the connection should be reused for subsequent requests

Technical Requirements:

  • Install uvicorn[standard] with h2 library
  • Enable HTTP/2 in uvicorn and gunicorn
  • Verify multiplexing with browser DevTools
  • Measure latency reduction for concurrent requests
US-2: Admin UI User - Faster Page Loads with HTTP/2

As an Admin UI User
I want the admin interface to load quickly using HTTP/2
So that pages with many assets load efficiently

Acceptance Criteria:

Given I am viewing the Admin UI dashboard
When the page loads with HTTP/2 enabled
Then all CSS, JS, and API requests should multiplex over one connection
And the page should load 30-40% faster than HTTP/1.1
And browser DevTools should show "h2" protocol

Given I am navigating between admin pages
When HTTP/2 is enabled
Then connection reuse should eliminate handshake overhead
And page transitions should feel instant

Technical Requirements:

  • HTTP/2 enabled for all endpoints
  • Browser automatically uses HTTP/2 when available
  • No JavaScript changes required
US-3: DevOps Engineer - Configure HTTP/2 and Keep-Alive

As a DevOps Engineer
I want to configure HTTP/2 and Keep-Alive settings
So that I can optimize connection handling for my deployment

Acceptance Criteria:

Given I want to enable HTTP/2 in development
When I run "make dev"
Then uvicorn should start with HTTP/2 support
And the logs should show "HTTP/2 enabled"

Given I want to configure Keep-Alive timeout
When I set GUNICORN_KEEPALIVE=10 in environment
Then connections should be kept alive for 10 seconds
And the Connection: keep-alive header should be present

Given I want to optimize connection pooling
When I configure httpx client limits
Then outgoing requests should reuse connections
And connection pool metrics should be available

Technical Requirements:

  • Environment variables for Keep-Alive configuration
  • gunicorn.config.py with HTTP/2 settings
  • Connection pool configuration in httpx client

🏗 Architecture

HTTP/2 Multiplexing Flow

graph TD
    A[Browser/Client] -->|Single TCP Connection| B[HTTP/2 Server]
    B --> C[Stream 1: GET /tools]
    B --> D[Stream 2: GET /servers]
    B --> E[Stream 3: GET /static/css/style.css]
    B --> F[Stream 4: GET /static/js/app.js]
    C --> G[Multiplexed Response]
    D --> G
    E --> G
    F --> G
    G -->|Single Connection| A

    H[HTTP/1.1 Comparison] -->|6 Separate Connections| I[Sequential Requests]
    I --> J[Request 1] --> K[Request 2] --> L[Request 3]
Loading

Connection Reuse with Keep-Alive

sequenceDiagram
    participant Client
    participant Server

    Note over Client,Server: HTTP/1.1 with Keep-Alive
    Client->>Server: Request 1 + Connection: keep-alive
    Server->>Client: Response 1 + Connection: keep-alive
    Note over Client,Server: Connection stays open (5s timeout)
    Client->>Server: Request 2 (reuse connection)
    Server->>Client: Response 2
    Client->>Server: Request 3 (reuse connection)
    Server->>Client: Response 3

    Note over Client,Server: HTTP/1.1 without Keep-Alive
    Client->>Server: Request 1
    Server->>Client: Response 1 + Connection: close
    Note over Client,Server: Connection closed
    Client->>Server: Request 2 (new TCP handshake)
    Server->>Client: Response 2 + Connection: close
Loading

Configuration Examples

# gunicorn.config.py

import os
import multiprocessing

# Server socket
bind = f"0.0.0.0:{os.getenv('PORT', '4444')}"

# Worker processes
workers = int(os.getenv("GUNICORN_WORKERS", multiprocessing.cpu_count() * 2 + 1))
worker_class = "uvicorn.workers.UvicornWorker"  # Automatically enables HTTP/2 if h2 installed

# Keep-Alive settings
keepalive = int(os.getenv("GUNICORN_KEEPALIVE", "5"))  # Keep connections alive for 5 seconds
worker_connections = 1000  # Max simultaneous connections per worker

# Timeouts
timeout = int(os.getenv("GUNICORN_TIMEOUT", "600"))
graceful_timeout = 30
# mcpgateway/utils/retry_manager.py - HTTP Client Configuration

import httpx

# Configure connection pooling for outgoing requests
limits = httpx.Limits(
    max_keepalive_connections=20,  # Keep 20 connections in pool
    max_connections=100,           # Max total connections
    keepalive_expiry=30.0          # Keep connections alive for 30 seconds
)

client = httpx.AsyncClient(
    limits=limits,
    http2=True,  # Enable HTTP/2 for outgoing requests
    timeout=httpx.Timeout(30.0)
)

📋 Implementation Tasks

Phase 1: Dependencies & Setup ✅

  • Add HTTP/2 Dependencies
    • Add uvicorn[standard]>=0.30.0 to pyproject.toml dependencies section
    • This includes h2, httptools, uvloop, and websockets for optimal performance
    • Run make install-dev to install the package
    • Verify h2 library installed: python -c "import h2; print(h2.__version__)"

Phase 2: Development Server Configuration ✅

  • Enable HTTP/2 in Development Server

    • Update Makefile dev target (around line 194)
    • Add --http h2 flag to uvicorn command
    • Full command: uvicorn mcpgateway.main:app --host 0.0.0.0 --port 8000 --reload --http h2
    • Add comment explaining HTTP/2 requirement (needs uvicorn[standard])
  • Test Development Server

    • Start dev server: make dev
    • Verify startup logs show HTTP/2 support
    • Test with curl: curl --http2 -v http://localhost:8000/health
    • Verify response shows HTTP/2 headers

Phase 3: Production Server Configuration ✅

  • Create/Update gunicorn.config.py

    • Create gunicorn.config.py in project root if it doesn't exist
    • Add HTTP/2 configuration using UvicornWorker
    • Configure Keep-Alive settings (keepalive=5)
    • Set worker_connections=1000 for concurrent handling
    • Add environment variable overrides for all settings
    • Add comprehensive comments explaining each setting
  • Verify UvicornWorker HTTP/2 Support

    • Document that UvicornWorker automatically enables HTTP/2 when h2 is installed
    • No additional flags needed for gunicorn
    • Test production server: make serve
    • Verify with curl: curl --http2 -v http://localhost:4444/health

Phase 4: Keep-Alive Configuration ✅

  • Configure Keep-Alive Settings

    • Set keepalive = 5 in gunicorn.config.py (keep connections alive for 5 seconds)
    • Add GUNICORN_KEEPALIVE environment variable support
    • Add --timeout-keep-alive 5 to uvicorn dev command
    • Document optimal Keep-Alive values (5-10 seconds typical)
  • Verify Keep-Alive Headers

    • Test with curl: curl -v http://localhost:4444/health
    • Verify Connection: keep-alive header in response
    • Test connection reuse with multiple sequential requests
    • Measure latency reduction from connection reuse

Phase 5: HTTP Client Optimization ✅

  • Review ResilientHttpClient Configuration

    • Open mcpgateway/utils/retry_manager.py
    • Check if httpx.AsyncClient has limits parameter configured
    • Verify connection pooling settings exist
  • Add Connection Pool Configuration

    • If missing, add httpx.Limits configuration:
      • max_keepalive_connections=20
      • max_connections=100
      • keepalive_expiry=30.0
    • Enable HTTP/2 for outgoing requests: http2=True
    • Add comments explaining pooling benefits
  • Test Outgoing Connection Pooling

    • Make multiple requests to same upstream server
    • Verify connection reuse in logs
    • Measure performance improvement for federation sync

Phase 6: Testing & Validation ✅

  • Test HTTP/2 Multiplexing

    • Open admin UI in Chrome browser
    • Open DevTools → Network tab
    • Verify "Protocol" column shows "h2"
    • Verify all requests use same connection ID
    • Take screenshot of multiplexing in action
  • Test Header Compression

    • Compare header sizes in HTTP/1.1 vs HTTP/2
    • Verify HPACK compression reduces header overhead
    • Measure header size reduction (typically 50-90%)
  • Load Test HTTP/2 vs HTTP/1.1

    • Run benchmark with wrk: wrk -t4 -c100 -d30s http://localhost:4444/tools
    • Record: requests/second, latency percentiles
    • Disable HTTP/2 and repeat benchmark
    • Compare results, document improvement
  • Test Connection Reuse

    • Use curl with verbose output for multiple requests
    • Verify "Re-using existing connection" messages
    • Measure time saved from avoiding TCP handshake

Phase 7: Documentation ✅

  • Update CLAUDE.md

    • Add section on HTTP/2 configuration
    • Document uvicorn[standard] requirement
    • Explain Keep-Alive settings and benefits
    • Add testing instructions for HTTP/2
  • Update .env.example

    • Add GUNICORN_KEEPALIVE=5 with explanation
    • Add HTTP2_ENABLED=true (optional, default when h2 installed)
    • Add GUNICORN_WORKER_CONNECTIONS=1000
    • Document connection pooling settings
  • Create Performance Documentation

    • Document HTTP/2 benefits (multiplexing, header compression)
    • Document Keep-Alive benefits (connection reuse)
    • Add troubleshooting section (TLS requirement for browsers)
    • Add performance comparison charts

Phase 8: Quality Assurance ✅

  • Code Quality

    • Run make autoflake isort black to format code
    • Run make flake8 and fix any issues
    • Run make pylint and address warnings
    • Pass make verify checks
  • Testing

    • Verify all existing tests still pass
    • Add integration test for HTTP/2 support
    • Add test for Keep-Alive behavior
    • Test TLS/SSL with HTTP/2 (browsers require it)

✅ Success Criteria

  • uvicorn[standard] installed with h2, httptools, uvloop
  • HTTP/2 enabled in development server (uvicorn)
  • HTTP/2 enabled in production server (gunicorn)
  • Keep-Alive configured and working (Connection: keep-alive header)
  • Connection pooling optimized in httpx client
  • Browser DevTools shows "h2" protocol for all requests
  • Multiple requests multiplex over single connection
  • Header compression (HPACK) verified (50-90% reduction)
  • Performance improvement measurable (20-40% faster page loads)
  • Connection reuse working (no repeated TCP handshakes)
  • Documentation updated with configuration examples
  • Load testing confirms performance gains

🏁 Definition of Done

  • uvicorn[standard] added to pyproject.toml and installed
  • HTTP/2 enabled in Makefile dev target (--http h2 flag)
  • gunicorn.config.py created/updated with HTTP/2 and Keep-Alive config
  • Keep-Alive settings configured (keepalive=5, worker_connections=1000)
  • HTTP client connection pooling verified/optimized
  • Timeout settings reviewed and documented
  • Browser testing confirms HTTP/2 working (DevTools shows h2)
  • Load testing shows 20-40% performance improvement
  • Code passes make verify checks
  • Documentation updated (CLAUDE.md, .env.example)
  • No regression in existing tests
  • Ready for production deployment

📝 Additional Notes

🔹 HTTP/2 Benefits:

  • Multiplexing: 6-10x more efficient than HTTP/1.1 (no head-of-line blocking at HTTP layer)
  • Header Compression: 50-90% reduction in header overhead using HPACK
  • Binary Protocol: More efficient parsing than text-based HTTP/1.1
  • Server Push: Can push assets before client requests (optional, rarely used)
  • Stream Prioritization: Allows client to prioritize important requests

🔹 Keep-Alive Benefits:

  • Eliminates TCP handshake overhead (typically 50-200ms) for subsequent requests
  • Reduces server load from constant connection open/close
  • Critical for API clients making multiple sequential requests
  • Improves throughput on high-latency networks (mobile, satellite)
  • Reduces TIME_WAIT socket exhaustion on high-traffic servers

🔹 Connection Pooling Benefits:

  • Reuses connections for outgoing requests (gateway → MCP servers)
  • Reduces load on upstream servers
  • Improves federation performance (faster tool catalog sync)
  • Prevents connection exhaustion under load

🔹 TLS Requirement for HTTP/2:

  • Most browsers require HTTP/2 over TLS (HTTPS) due to security
  • Plain HTTP/2 (h2c) supported by curl and some clients
  • For local development, HTTP/2 works over plain HTTP
  • For production, use make serve-ssl with valid TLS certificates
  • HTTP/2 without TLS is called "h2c" (HTTP/2 Cleartext)

🔹 Performance Comparison (typical):

  • HTTP/1.1 without Keep-Alive: 100 req/s, 500ms p95 latency
  • HTTP/1.1 with Keep-Alive: 250 req/s, 200ms p95 latency
  • HTTP/2 with multiplexing: 400 req/s, 100ms p95 latency
  • Result: 4x throughput improvement, 5x latency reduction

🔹 Troubleshooting:

  • If browser doesn't use HTTP/2, verify TLS is enabled
  • If curl shows HTTP/1.1, add --http2 flag explicitly
  • If h2 not installed, HTTP/2 silently falls back to HTTP/1.1
  • Check server logs for HTTP/2 startup messages

🔗 Related Issues


📚 References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformancePerformance related items

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions