Skip to content

Add dedicated /metrics endpoints for Gateway and LLM services#46

Merged
SorraTheOrc merged 18 commits into
mainfrom
copilot/add-dedicated-metrics-endpoints
Dec 3, 2025
Merged

Add dedicated /metrics endpoints for Gateway and LLM services#46
SorraTheOrc merged 18 commits into
mainfrom
copilot/add-dedicated-metrics-endpoints

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 2, 2025

  • Understand issue 8.3.2 — Dedicated Metrics Endpoints for Gateway and LLM Services #39 requirements for dedicated metrics endpoints
  • Explore repository structure and existing implementations
  • Add /metrics endpoint to gateway service
  • Add /metrics endpoint to LLM service
  • Update K8s deployments - prometheus.io/path: "/metrics"
  • Update ServiceMonitor - gateway and LLM paths to /metrics
  • Update documentation with health vs metrics distinction
  • Add tests for metrics functionality
  • Refactored to use prometheus_client for Prometheus-compatible format (review feedback)
    • Gateway metrics: Counter, Gauge, Histogram for requests, connections, latencies
    • LLM metrics: Counter, Histogram for requests, latencies, tokens
    • Both /metrics endpoints now return OpenMetrics text format
    • Made endpoints async for thread safety
  • All tests pass (90.26% coverage)
  • Linting passes
  • CodeQL security scan passed (0 alerts)
Original prompt

"isseu 39 - 8.3.2 — Dedicated Metrics Endpoints for Gateway and LLM Services" to gamedev agent

Created from Copilot CLI via the copilot delegate command.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 2, 2025 19:12
…issue #39)

- Add /metrics endpoint to gateway service with request counts, latencies, error breakdowns, WebSocket connection stats, and LLM integration stats
- Add /metrics endpoint to LLM service with request counts, latencies, error breakdowns, provider-level stats, and token usage tracking
- Update K8s deployments to change prometheus.io/path from /healthz to /metrics for gateway and LLM
- Update ServiceMonitors to scrape /metrics path for gateway and LLM services
- Update Deploy_GEngine_To_Kubernetes.md documentation with health vs. metrics distinction and example metrics for all three services
- Add comprehensive tests for new metrics functionality (9 gateway tests, 11 LLM tests)

Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
- Extract token usage logic into _extract_token_usage helper function for consistency
- Add documentation clarifying metrics counter relationships in GatewayMetrics
- All 685 tests pass with 90.21% coverage

Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
Copilot AI changed the title [WIP] Add dedicated metrics endpoints for gateway and LLM services Add dedicated /metrics endpoints for Gateway and LLM services Dec 2, 2025
Copilot AI requested a review from SorraTheOrc December 2, 2025 19:19
@SorraTheOrc
Copy link
Copy Markdown
Member

@copilot there have been many linting changes in main. Can you merge them into this branch

Copy link
Copy Markdown
Author

Copilot AI commented Dec 2, 2025

@copilot there have been many linting changes in main. Can you merge them into this branch

I'm unable to merge main into this branch due to sandbox limitations - I can't pull branches from GitHub. You can merge main using either:

  1. GitHub UI: Use the "Update branch" button on the PR page
  2. Locally:
git fetch origin main
git merge origin/main --no-edit
git push origin copilot/add-dedicated-metrics-endpoints

Let me know once you've merged main and I can address any conflicts or linting issues that arise.

@SorraTheOrc
Copy link
Copy Markdown
Member

SorraTheOrc commented Dec 2, 2025

@copilot Review feedback:\n\n1. The new /metrics endpoints return JSON, but Prometheus expects OpenMetrics/text format. The current implementation will not work with Prometheus ServiceMonitor or annotations as described. Please refactor to use the prometheus_client library to expose metrics in the correct format.\n2. The Gateway service's get_metrics endpoint is synchronous (def, not async def) but is accessed from both the event loop and thread pool, which can cause concurrency issues. Please make it async and ensure safe access to shared state.\n3. For performance, consider using collections.deque(maxlen=N) for latency samples instead of list+pop(0).\n\n@copilot Please address these issues to ensure Prometheus compatibility and robust metrics handling.

@SorraTheOrc
Copy link
Copy Markdown
Member

@copilot see the review feedback inthe previous comment and fix it please.

SorraTheOrc and others added 4 commits December 2, 2025 11:46
- Replace custom metrics classes with Prometheus Counter, Gauge, Histogram
- Gateway /metrics now returns OpenMetrics text format
- LLM /metrics now returns OpenMetrics text format
- Make metrics endpoints async for thread safety
- Add prometheus_client>=0.20.0 dependency
- Update documentation with Prometheus format examples
- Update tests to parse Prometheus text format

Addresses review feedback:
1. Use prometheus_client for Prometheus-compatible format
2. Make get_metrics async for thread safety
3. Linting fixes (B904: raise from e)

Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Dec 2, 2025

@copilot Review feedback:\n\n1. The new /metrics endpoints return JSON, but Prometheus expects OpenMetrics/text format. The current implementation will not work with Prometheus ServiceMonitor or a...

Addressed all review feedback in commit 3a9a364:

  1. Prometheus format: Refactored Gateway and LLM metrics to use prometheus_client library. Both /metrics endpoints now return OpenMetrics text format instead of JSON.

  2. Async endpoint: Made get_metrics async in both services for thread-safe access.

  3. Performance: Replaced list+pop(0) with Prometheus's built-in Histogram metrics which handle latency bucketing efficiently.

Example metrics output now looks like:

# HELP gateway_requests_total Total number of requests processed
# TYPE gateway_requests_total counter
gateway_requests_total 150.0
# HELP gateway_request_latency_seconds Request latency in seconds
# TYPE gateway_request_latency_seconds histogram
gateway_request_latency_seconds_bucket{request_type="command",le="0.1"} 80.0

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com//advisories
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (http block)

If you need me to access, download, or install something from one of these locations, you can either:

@SorraTheOrc SorraTheOrc merged commit 659f19c into main Dec 3, 2025
3 checks passed
@SorraTheOrc SorraTheOrc deleted the copilot/add-dedicated-metrics-endpoints branch December 3, 2025 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants