[Epic] 🌐 Performance - HTTP/2 & Keep-Alive Transport

# 🌐 Performance - HTTP/2 & Keep-Alive Transport

## Goal

Optimize HTTP protocol and connection handling for better performance:
1. **Enable HTTP/2 support** in both development (uvicorn) and production (gunicorn) servers
2. **Configure HTTP Keep-Alive** settings to reuse connections efficiently
3. **Enable connection pooling** optimizations in httpx client
4. **Configure timeouts** and connection limits appropriately
5. **Add uvicorn[standard]** dependency for HTTP/2 support

This reduces connection overhead, enables multiplexing, and improves performance through HPACK header compression and connection reuse.

## Why Now?

HTTP/2 and Keep-Alive optimizations provide measurable performance improvements:

1. **Connection Multiplexing**: Multiple requests over single TCP connection reduces latency by 50-200ms
2. **Header Compression**: HPACK compression reduces overhead by 50-90% for headers
3. **Browser Support**: 97%+ of browsers support HTTP/2
4. **Connection Reuse**: Keep-Alive eliminates connection setup/teardown overhead
5. **Better Mobile Performance**: Reduced connection overhead critical for mobile networks
6. **Modern Standard**: HTTP/2 is now the default for most web services

---

## 📖 User Stories

<details>
<summary>US-1: API Client - HTTP/2 Multiplexing for Multiple Requests</summary>

**As an** API Client
**I want** to use HTTP/2 multiplexing for concurrent requests
**So that** multiple API calls use a single connection and reduce latency

**Acceptance Criteria:**

```gherkin
Given I am an API client that supports HTTP/2
When I make 10 concurrent requests to GET /tools
Then all requests should multiplex over a single TCP connection
And the response should include HTTP/2 protocol headers
And total request time should be 50% faster than HTTP/1.1

Given I am monitoring network connections
When I make multiple API requests
Then I should see only 1-2 connections to the server
And no head-of-line blocking should occur

Given I am using curl with HTTP/2
When I run "curl --http2 -v https://localhost:4444/health"
Then the response should show "HTTP/2 200"
And the connection should be reused for subsequent requests
```

**Technical Requirements:**
- Install uvicorn[standard] with h2 library
- Enable HTTP/2 in uvicorn and gunicorn
- Verify multiplexing with browser DevTools
- Measure latency reduction for concurrent requests

</details>

<details>
<summary>US-2: Admin UI User - Faster Page Loads with HTTP/2</summary>

**As an** Admin UI User
**I want** the admin interface to load quickly using HTTP/2
**So that** pages with many assets load efficiently

**Acceptance Criteria:**

```gherkin
Given I am viewing the Admin UI dashboard
When the page loads with HTTP/2 enabled
Then all CSS, JS, and API requests should multiplex over one connection
And the page should load 30-40% faster than HTTP/1.1
And browser DevTools should show "h2" protocol

Given I am navigating between admin pages
When HTTP/2 is enabled
Then connection reuse should eliminate handshake overhead
And page transitions should feel instant
```

**Technical Requirements:**
- HTTP/2 enabled for all endpoints
- Browser automatically uses HTTP/2 when available
- No JavaScript changes required

</details>

<details>
<summary>US-3: DevOps Engineer - Configure HTTP/2 and Keep-Alive</summary>

**As a** DevOps Engineer
**I want** to configure HTTP/2 and Keep-Alive settings
**So that** I can optimize connection handling for my deployment

**Acceptance Criteria:**

```gherkin
Given I want to enable HTTP/2 in development
When I run "make dev"
Then uvicorn should start with HTTP/2 support
And the logs should show "HTTP/2 enabled"

Given I want to configure Keep-Alive timeout
When I set GUNICORN_KEEPALIVE=10 in environment
Then connections should be kept alive for 10 seconds
And the Connection: keep-alive header should be present

Given I want to optimize connection pooling
When I configure httpx client limits
Then outgoing requests should reuse connections
And connection pool metrics should be available
```

**Technical Requirements:**
- Environment variables for Keep-Alive configuration
- gunicorn.config.py with HTTP/2 settings
- Connection pool configuration in httpx client

</details>

---

## 🏗 Architecture

### HTTP/2 Multiplexing Flow

```mermaid
graph TD
 A[Browser/Client] -->|Single TCP Connection| B[HTTP/2 Server]
 B --> C[Stream 1: GET /tools]
 B --> D[Stream 2: GET /servers]
 B --> E[Stream 3: GET /static/css/style.css]
 B --> F[Stream 4: GET /static/js/app.js]
 C --> G[Multiplexed Response]
 D --> G
 E --> G
 F --> G
 G -->|Single Connection| A

 H[HTTP/1.1 Comparison] -->|6 Separate Connections| I[Sequential Requests]
 I --> J[Request 1] --> K[Request 2] --> L[Request 3]
```

### Connection Reuse with Keep-Alive

```mermaid
sequenceDiagram
 participant Client
 participant Server

 Note over Client,Server: HTTP/1.1 with Keep-Alive
 Client->>Server: Request 1 + Connection: keep-alive
 Server->>Client: Response 1 + Connection: keep-alive
 Note over Client,Server: Connection stays open (5s timeout)
 Client->>Server: Request 2 (reuse connection)
 Server->>Client: Response 2
 Client->>Server: Request 3 (reuse connection)
 Server->>Client: Response 3

 Note over Client,Server: HTTP/1.1 without Keep-Alive
 Client->>Server: Request 1
 Server->>Client: Response 1 + Connection: close
 Note over Client,Server: Connection closed
 Client->>Server: Request 2 (new TCP handshake)
 Server->>Client: Response 2 + Connection: close
```

### Configuration Examples

```python
# gunicorn.config.py

import os
import multiprocessing

# Server socket
bind = f"0.0.0.0:{os.getenv('PORT', '4444')}"

# Worker processes
workers = int(os.getenv("GUNICORN_WORKERS", multiprocessing.cpu_count() * 2 + 1))
worker_class = "uvicorn.workers.UvicornWorker" # Automatically enables HTTP/2 if h2 installed

# Keep-Alive settings
keepalive = int(os.getenv("GUNICORN_KEEPALIVE", "5")) # Keep connections alive for 5 seconds
worker_connections = 1000 # Max simultaneous connections per worker

# Timeouts
timeout = int(os.getenv("GUNICORN_TIMEOUT", "600"))
graceful_timeout = 30
```

```python
# mcpgateway/utils/retry_manager.py - HTTP Client Configuration

import httpx

# Configure connection pooling for outgoing requests
limits = httpx.Limits(
 max_keepalive_connections=20, # Keep 20 connections in pool
 max_connections=100, # Max total connections
 keepalive_expiry=30.0 # Keep connections alive for 30 seconds
)

client = httpx.AsyncClient(
 limits=limits,
 http2=True, # Enable HTTP/2 for outgoing requests
 timeout=httpx.Timeout(30.0)
)
```

---

## 📋 Implementation Tasks

### Phase 1: Dependencies & Setup ✅

- [ ] **Add HTTP/2 Dependencies**
 - [ ] Add `uvicorn[standard]>=0.30.0` to pyproject.toml dependencies section
 - [ ] This includes h2, httptools, uvloop, and websockets for optimal performance
 - [ ] Run `make install-dev` to install the package
 - [ ] Verify h2 library installed: `python -c "import h2; print(h2.__version__)"`

### Phase 2: Development Server Configuration ✅

- [ ] **Enable HTTP/2 in Development Server**
 - [ ] Update Makefile `dev` target (around line 194)
 - [ ] Add `--http h2` flag to uvicorn command
 - [ ] Full command: `uvicorn mcpgateway.main:app --host 0.0.0.0 --port 8000 --reload --http h2`
 - [ ] Add comment explaining HTTP/2 requirement (needs uvicorn[standard])

- [ ] **Test Development Server**
 - [ ] Start dev server: `make dev`
 - [ ] Verify startup logs show HTTP/2 support
 - [ ] Test with curl: `curl --http2 -v http://localhost:8000/health`
 - [ ] Verify response shows HTTP/2 headers

### Phase 3: Production Server Configuration ✅

- [ ] **Create/Update gunicorn.config.py**
 - [ ] Create gunicorn.config.py in project root if it doesn't exist
 - [ ] Add HTTP/2 configuration using UvicornWorker
 - [ ] Configure Keep-Alive settings (keepalive=5)
 - [ ] Set worker_connections=1000 for concurrent handling
 - [ ] Add environment variable overrides for all settings
 - [ ] Add comprehensive comments explaining each setting

- [ ] **Verify UvicornWorker HTTP/2 Support**
 - [ ] Document that UvicornWorker automatically enables HTTP/2 when h2 is installed
 - [ ] No additional flags needed for gunicorn
 - [ ] Test production server: `make serve`
 - [ ] Verify with curl: `curl --http2 -v http://localhost:4444/health`

### Phase 4: Keep-Alive Configuration ✅

- [ ] **Configure Keep-Alive Settings**
 - [ ] Set `keepalive = 5` in gunicorn.config.py (keep connections alive for 5 seconds)
 - [ ] Add `GUNICORN_KEEPALIVE` environment variable support
 - [ ] Add `--timeout-keep-alive 5` to uvicorn dev command
 - [ ] Document optimal Keep-Alive values (5-10 seconds typical)

- [ ] **Verify Keep-Alive Headers**
 - [ ] Test with curl: `curl -v http://localhost:4444/health`
 - [ ] Verify `Connection: keep-alive` header in response
 - [ ] Test connection reuse with multiple sequential requests
 - [ ] Measure latency reduction from connection reuse

### Phase 5: HTTP Client Optimization ✅

- [ ] **Review ResilientHttpClient Configuration**
 - [ ] Open mcpgateway/utils/retry_manager.py
 - [ ] Check if httpx.AsyncClient has `limits` parameter configured
 - [ ] Verify connection pooling settings exist

- [ ] **Add Connection Pool Configuration**
 - [ ] If missing, add httpx.Limits configuration:
 - max_keepalive_connections=20
 - max_connections=100
 - keepalive_expiry=30.0
 - [ ] Enable HTTP/2 for outgoing requests: `http2=True`
 - [ ] Add comments explaining pooling benefits

- [ ] **Test Outgoing Connection Pooling**
 - [ ] Make multiple requests to same upstream server
 - [ ] Verify connection reuse in logs
 - [ ] Measure performance improvement for federation sync

### Phase 6: Testing & Validation ✅

- [ ] **Test HTTP/2 Multiplexing**
 - [ ] Open admin UI in Chrome browser
 - [ ] Open DevTools → Network tab
 - [ ] Verify "Protocol" column shows "h2"
 - [ ] Verify all requests use same connection ID
 - [ ] Take screenshot of multiplexing in action

- [ ] **Test Header Compression**
 - [ ] Compare header sizes in HTTP/1.1 vs HTTP/2
 - [ ] Verify HPACK compression reduces header overhead
 - [ ] Measure header size reduction (typically 50-90%)

- [ ] **Load Test HTTP/2 vs HTTP/1.1**
 - [ ] Run benchmark with wrk: `wrk -t4 -c100 -d30s http://localhost:4444/tools`
 - [ ] Record: requests/second, latency percentiles
 - [ ] Disable HTTP/2 and repeat benchmark
 - [ ] Compare results, document improvement

- [ ] **Test Connection Reuse**
 - [ ] Use curl with verbose output for multiple requests
 - [ ] Verify "Re-using existing connection" messages
 - [ ] Measure time saved from avoiding TCP handshake

### Phase 7: Documentation ✅

- [ ] **Update CLAUDE.md**
 - [ ] Add section on HTTP/2 configuration
 - [ ] Document uvicorn[standard] requirement
 - [ ] Explain Keep-Alive settings and benefits
 - [ ] Add testing instructions for HTTP/2

- [ ] **Update .env.example**
 - [ ] Add `GUNICORN_KEEPALIVE=5` with explanation
 - [ ] Add `HTTP2_ENABLED=true` (optional, default when h2 installed)
 - [ ] Add `GUNICORN_WORKER_CONNECTIONS=1000`
 - [ ] Document connection pooling settings

- [ ] **Create Performance Documentation**
 - [ ] Document HTTP/2 benefits (multiplexing, header compression)
 - [ ] Document Keep-Alive benefits (connection reuse)
 - [ ] Add troubleshooting section (TLS requirement for browsers)
 - [ ] Add performance comparison charts

### Phase 8: Quality Assurance ✅

- [ ] **Code Quality**
 - [ ] Run `make autoflake isort black` to format code
 - [ ] Run `make flake8` and fix any issues
 - [ ] Run `make pylint` and address warnings
 - [ ] Pass `make verify` checks

- [ ] **Testing**
 - [ ] Verify all existing tests still pass
 - [ ] Add integration test for HTTP/2 support
 - [ ] Add test for Keep-Alive behavior
 - [ ] Test TLS/SSL with HTTP/2 (browsers require it)

---

## ✅ Success Criteria

- [ ] uvicorn[standard] installed with h2, httptools, uvloop
- [ ] HTTP/2 enabled in development server (uvicorn)
- [ ] HTTP/2 enabled in production server (gunicorn)
- [ ] Keep-Alive configured and working (Connection: keep-alive header)
- [ ] Connection pooling optimized in httpx client
- [ ] Browser DevTools shows "h2" protocol for all requests
- [ ] Multiple requests multiplex over single connection
- [ ] Header compression (HPACK) verified (50-90% reduction)
- [ ] Performance improvement measurable (20-40% faster page loads)
- [ ] Connection reuse working (no repeated TCP handshakes)
- [ ] Documentation updated with configuration examples
- [ ] Load testing confirms performance gains

---

## 🏁 Definition of Done

- [ ] uvicorn[standard] added to pyproject.toml and installed
- [ ] HTTP/2 enabled in Makefile dev target (--http h2 flag)
- [ ] gunicorn.config.py created/updated with HTTP/2 and Keep-Alive config
- [ ] Keep-Alive settings configured (keepalive=5, worker_connections=1000)
- [ ] HTTP client connection pooling verified/optimized
- [ ] Timeout settings reviewed and documented
- [ ] Browser testing confirms HTTP/2 working (DevTools shows h2)
- [ ] Load testing shows 20-40% performance improvement
- [ ] Code passes `make verify` checks
- [ ] Documentation updated (CLAUDE.md, .env.example)
- [ ] No regression in existing tests
- [ ] Ready for production deployment

---

## 📝 Additional Notes

🔹 **HTTP/2 Benefits**:
 - **Multiplexing**: 6-10x more efficient than HTTP/1.1 (no head-of-line blocking at HTTP layer)
 - **Header Compression**: 50-90% reduction in header overhead using HPACK
 - **Binary Protocol**: More efficient parsing than text-based HTTP/1.1
 - **Server Push**: Can push assets before client requests (optional, rarely used)
 - **Stream Prioritization**: Allows client to prioritize important requests

🔹 **Keep-Alive Benefits**:
 - Eliminates TCP handshake overhead (typically 50-200ms) for subsequent requests
 - Reduces server load from constant connection open/close
 - Critical for API clients making multiple sequential requests
 - Improves throughput on high-latency networks (mobile, satellite)
 - Reduces TIME_WAIT socket exhaustion on high-traffic servers

🔹 **Connection Pooling Benefits**:
 - Reuses connections for outgoing requests (gateway → MCP servers)
 - Reduces load on upstream servers
 - Improves federation performance (faster tool catalog sync)
 - Prevents connection exhaustion under load

🔹 **TLS Requirement for HTTP/2**:
 - Most browsers require HTTP/2 over TLS (HTTPS) due to security
 - Plain HTTP/2 (h2c) supported by curl and some clients
 - For local development, HTTP/2 works over plain HTTP
 - For production, use `make serve-ssl` with valid TLS certificates
 - HTTP/2 without TLS is called "h2c" (HTTP/2 Cleartext)

🔹 **Performance Comparison** (typical):
 - HTTP/1.1 without Keep-Alive: 100 req/s, 500ms p95 latency
 - HTTP/1.1 with Keep-Alive: 250 req/s, 200ms p95 latency
 - HTTP/2 with multiplexing: 400 req/s, 100ms p95 latency
 - **Result**: 4x throughput improvement, 5x latency reduction

🔹 **Troubleshooting**:
 - If browser doesn't use HTTP/2, verify TLS is enabled
 - If curl shows HTTP/1.1, add `--http2` flag explicitly
 - If h2 not installed, HTTP/2 silently falls back to HTTP/1.1
 - Check server logs for HTTP/2 startup messages

---

## 🔗 Related Issues

- Part of Performance Optimization initiative
- Related to #1282 (Compression) - HTTP/2 and compression work together
- Related to Epic 4: Performance - Static Asset Caching (HTTP/2 better for many small files)
- Related to Epic 6: Performance - Production Tuning (gunicorn configuration)

---

## 📚 References

- [HTTP/2 RFC 7540](https://httpwg.org/specs/rfc7540.html)
- [HTTP/2 Explained](https://http2-explained.haxx.se/)
- [Uvicorn HTTP/2 Support](https://www.uvicorn.org/)
- [HPACK Header Compression RFC 7541](https://httpwg.org/specs/rfc7541.html)
- [MDN: HTTP/2](https://developer.mozilla.org/en-US/docs/Glossary/HTTP_2)
- [Can I Use HTTP/2](https://caniuse.com/http2)
- [Gunicorn with Uvicorn Workers](https://www.uvicorn.org/deployment/#gunicorn)
- [httpx Connection Pooling](https://www.python-httpx.org/advanced/#pool-limit-configuration)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Epic] 🌐 Performance - HTTP/2 & Keep-Alive Transport #1293

🌐 Performance - HTTP/2 & Keep-Alive Transport

Goal

Why Now?

📖 User Stories

🏗 Architecture

HTTP/2 Multiplexing Flow

Connection Reuse with Keep-Alive

Configuration Examples

📋 Implementation Tasks

Phase 1: Dependencies & Setup ✅

Phase 2: Development Server Configuration ✅

Phase 3: Production Server Configuration ✅

Phase 4: Keep-Alive Configuration ✅

Phase 5: HTTP Client Optimization ✅

Phase 6: Testing & Validation ✅

Phase 7: Documentation ✅

Phase 8: Quality Assurance ✅

✅ Success Criteria

🏁 Definition of Done

📝 Additional Notes

🔗 Related Issues

📚 References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Epic] 🌐 Performance - HTTP/2 & Keep-Alive Transport #1293

Description

🌐 Performance - HTTP/2 & Keep-Alive Transport

Goal

Why Now?

📖 User Stories

🏗 Architecture

HTTP/2 Multiplexing Flow

Connection Reuse with Keep-Alive

Configuration Examples

📋 Implementation Tasks

Phase 1: Dependencies & Setup ✅

Phase 2: Development Server Configuration ✅

Phase 3: Production Server Configuration ✅

Phase 4: Keep-Alive Configuration ✅

Phase 5: HTTP Client Optimization ✅

Phase 6: Testing & Validation ✅

Phase 7: Documentation ✅

Phase 8: Quality Assurance ✅

✅ Success Criteria

🏁 Definition of Done

📝 Additional Notes

🔗 Related Issues

📚 References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions