feat: Complete system observability stack integration#22121
Conversation
- Added `otel-collector` and `jaeger` services to the main `docker-compose.yml` to support distributed tracing. - Created `otel-collector-config.yaml` to aggregate logs, metrics, and traces. - Corrected volume mount paths for `grafana` and `prometheus` in `docker-compose.yml`. - Created required Grafana provisioning files for dashboards and the Prometheus datasource. - Updated `ops/prometheus/prometheus.yml` to scrape the proper target hostnames (`server` and `gateway`). Co-authored-by: BrianCLong <6404035+BrianCLong@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
WalkthroughThe observability stack is enhanced by adding OpenTelemetry Collector and Jaeger services to the docker-compose configuration, simplifying the OTEL collector configuration by removing Elasticsearch and Loki exporters, and migrating Prometheus monitoring targets from maestro-based services to new summit-server and intelgraph-gateway services. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes a complete local observability environment by incorporating key tools like OpenTelemetry, Jaeger, Prometheus, and Grafana. It streamlines the collection and visualization of application telemetry data, enabling better monitoring and debugging capabilities for local development. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
❌ Lint Gate ResultsESLint
Code Quality
❌ Lint gate failedPlease fix the linter errors and warnings before merging. About Lint GateThis gate enforces zero linter warnings/errors and production code quality:
|
There was a problem hiding this comment.
Code Review
This pull request refactors the observability stack by introducing OpenTelemetry Collector and Jaeger services in docker-compose.yml. It also updates the Prometheus configuration in ops/prometheus/prometheus.yml to reflect new service names (summit-server, intelgraph-gateway) and removes outdated jobs (maestro-worker, pyroscope), along with simplifying the OpenTelemetry Collector configuration by removing Elasticsearch and Loki exporters. A review comment highlights that the intelgraph-gateway job in prometheus.yml is missing the relabel_configs section for consistent instance label extraction.
| labels: | ||
| service: maestro-gateway | ||
| service: gateway | ||
| tier: edge |
There was a problem hiding this comment.
The intelgraph-gateway job is missing the relabel_configs section to extract the instance label from the __address__. This configuration is present in the summit-server job and is important for consistent metric labeling and proper functioning of dashboards or alerts that rely on the instance label. It is recommended to include this for consistency and maintainability.
tier: edge
relabel_configs:
- source_labels: [__address__]
regex: '(.+):.*'
target_label: instance
❌ TypeScript Gate ResultsType Safety
About TypeScript GateThis gate enforces type safety:
To fix TypeScript errors:
|
🔒 Hardened Security Gate✅ Security Posture: Summary:
Component Status:
Recommendation: Safe to proceed Artifacts: Generated by Hardened Security Gate on 2026-03-26T10:09:34.688Z |
Co-authored-by: BrianCLong <6404035+BrianCLong@users.noreply.github.com>
❌ TypeScript Gate ResultsType Safety
About TypeScript GateThis gate enforces type safety:
To fix TypeScript errors:
|
❌ Lint Gate ResultsESLint
Code Quality
❌ Lint gate failedPlease fix the linter errors and warnings before merging. About Lint GateThis gate enforces zero linter warnings/errors and production code quality:
|
🔒 Hardened Security Gate✅ Security Posture: Summary:
Component Status:
Recommendation: Safe to proceed Artifacts: Generated by Hardened Security Gate on 2026-03-26T13:15:37.161Z |
Co-authored-by: BrianCLong <6404035+BrianCLong@users.noreply.github.com>
❌ TypeScript Gate ResultsType Safety
About TypeScript GateThis gate enforces type safety:
To fix TypeScript errors:
|
🔒 Hardened Security Gate✅ Security Posture: Summary:
Component Status:
Recommendation: Safe to proceed Artifacts: Generated by Hardened Security Gate on 2026-03-26T13:30:03.667Z |
❌ Lint Gate ResultsESLint
Code Quality
❌ Lint gate failedPlease fix the linter errors and warnings before merging. About Lint GateThis gate enforces zero linter warnings/errors and production code quality:
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docker-compose.yml (1)
270-281:⚠️ Potential issue | 🟠 MajorThe mounted Prometheus config still expects services this compose file never starts.
ops/prometheus/prometheus.ymlscrapes or alerts againstalertmanager,blackbox-exporter,node-exporter,loki, andelasticsearch-apm, but none of those services are defined here. A fresh local stack will come up with a large set of permanent DOWN targets, which makes the new observability setup noisy by default.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 270 - 281, The Prometheus service mounts ops/prometheus/prometheus.yml which references targets (alertmanager, blackbox-exporter, node-exporter, loki, elasticsearch-apm) that the compose file does not start, causing many DOWN targets; either (A) add minimal service definitions for the referenced exporters/alertmanager (e.g., alertmanager, blackbox-exporter, node-exporter, loki, elasticsearch-apm) into this compose so the scrape_targets exist, or (B) create a local-friendly prometheus config (e.g., ops/prometheus/prometheus.local.yml) that removes or conditionally disables those scrape_configs/alerting_rules and update the prometheus service volume/command to mount/use that file instead; update references in the prometheus service (volumes/command) and ensure any rule files under ops/alerts are consistent with the chosen config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose.yml`:
- Around line 249-253: The docker-compose service otel-collector is mounting
ops/otel/otel-collector-config.yaml which contains kubernetes_sd_configs for the
Prometheus receiver and thus fails/noises in a local Compose environment;
replace or override that mount with a Compose-specific collector config (e.g.,
ops/otel/otel-collector-config.compose.yaml) that removes or disables the
Prometheus receiver/kubernetes_sd_configs, or modify the existing config to
guard the Prometheus receiver behind a build/runtime flag and set that flag in
docker-compose so the receiver is disabled locally; update the otel-collector
service mount reference and ensure the Prometheus receiver and any
kubernetes_sd_configs are not present in the compose-mounted config.
In `@ops/otel/otel-collector-config.yaml`:
- Around line 68-71: The logs/intake pipeline currently exports only to the
debug-only logging/metrics exporter, so replace or augment the exporters list
for the logs/intake pipeline (symbol: logs/intake, field: exporters) to point to
a persistent backend (for example add the otlp exporter or a file/remote logging
exporter configured elsewhere) or remove the logs/intake pipeline entirely;
ensure the processors (attributes/add_env, attributes/enrich_correlation) remain
compatible with the chosen exporter and update any exporter configuration blocks
accordingly so logs become queryable/persistent rather than just written to
stdout/stderr.
In `@ops/prometheus/prometheus.yml`:
- Around line 41-47: Prometheus is scraping the gateway at gateway:8080 but the
metrics exporter (PrometheusExporter in apps/gateway/src/instrumentation.ts)
listens on port 9464; update the intelgraph-gateway job (job_name:
intelgraph-gateway) target from gateway:8080 to gateway:9464 so /metrics is
reachable, and likewise locate any other scrape job that targets the gateway or
uses label service: gateway (the other job referenced in the review) and change
its target port to 9464 as well to ensure Prometheus scrapes the correct
endpoint.
---
Outside diff comments:
In `@docker-compose.yml`:
- Around line 270-281: The Prometheus service mounts
ops/prometheus/prometheus.yml which references targets (alertmanager,
blackbox-exporter, node-exporter, loki, elasticsearch-apm) that the compose file
does not start, causing many DOWN targets; either (A) add minimal service
definitions for the referenced exporters/alertmanager (e.g., alertmanager,
blackbox-exporter, node-exporter, loki, elasticsearch-apm) into this compose so
the scrape_targets exist, or (B) create a local-friendly prometheus config
(e.g., ops/prometheus/prometheus.local.yml) that removes or conditionally
disables those scrape_configs/alerting_rules and update the prometheus service
volume/command to mount/use that file instead; update references in the
prometheus service (volumes/command) and ensure any rule files under ops/alerts
are consistent with the chosen config.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ade5d09c-e7df-4b36-bd89-d923987b75b8
📒 Files selected for processing (4)
docker-compose.ymlops/observability/grafana/dashboards/placeholder.jsonops/otel/otel-collector-config.yamlops/prometheus/prometheus.yml
| otel-collector: | ||
| image: otel/opentelemetry-collector:0.103.0 | ||
| command: ['--config=/etc/otel-collector-config.yaml'] | ||
| volumes: | ||
| - ./ops/otel/otel-collector-config.yaml:/etc/otel-collector-config.yaml:ro |
There was a problem hiding this comment.
This mounts a Kubernetes-only scrape config into the local Compose collector.
ops/otel/otel-collector-config.yaml still uses kubernetes_sd_configs for its Prometheus receiver. In this local Docker Compose stack there is no kube API or service account to discover against, so that receiver will never find the intended targets and will only add noisy discovery failures. Ship a Compose-specific collector config here or disable the Prometheus receiver locally.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.yml` around lines 249 - 253, The docker-compose service
otel-collector is mounting ops/otel/otel-collector-config.yaml which contains
kubernetes_sd_configs for the Prometheus receiver and thus fails/noises in a
local Compose environment; replace or override that mount with a
Compose-specific collector config (e.g.,
ops/otel/otel-collector-config.compose.yaml) that removes or disables the
Prometheus receiver/kubernetes_sd_configs, or modify the existing config to
guard the Prometheus receiver behind a build/runtime flag and set that flag in
docker-compose so the receiver is disabled locally; update the otel-collector
service mount reference and ensure the Prometheus receiver and any
kubernetes_sd_configs are not present in the compose-mounted config.
| logs/intake: | ||
| receivers: [otlp] | ||
| processors: [batch, attributes/add_env, attributes/enrich_correlation] | ||
| exporters: [otlp/elasticsearch, loki, logging/metrics] | ||
| exporters: [logging/metrics] |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In OpenTelemetry Collector, does the logging exporter persist OTLP logs to a backend, or only write them to the collector process output?
💡 Result:
In OpenTelemetry Collector, the logging exporter (deprecated and removed in v0.111.0, replaced by debug exporter) only writes OTLP logs to the collector process output (stdout/stderr via console logging), not to a persistent backend. It is intended for debugging and does not persist data to any storage like files or external systems by default. For persistence, use exporters like file or OTLP to a backend.
Citations:
- 1: [announcement] logging exporter has been replaced with debug exporter open-telemetry/opentelemetry-collector#11337
- 2: https://oneuptime.com/blog/post/2026-02-06-debug-exporter-opentelemetry-collector/view
- 3: https://opentelemetry.io/docs/collector/troubleshooting
- 4: https://pkg.go.dev/github.com/oodle-ai/opentelemetry-collector/exporter/debugexporter
- 5: https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/debugexporter/README.md
🏁 Script executed:
cat ops/otel/otel-collector-config.yaml | head -100Repository: BrianCLong/summit
Length of output: 1987
🏁 Script executed:
rg -A 10 "exporters:" ops/otel/otel-collector-config.yaml | head -150Repository: BrianCLong/summit
Length of output: 641
logs/intake pipeline exports only to the logging exporter, which writes to stdout/stderr and provides no persistence.
OTLP logs are accepted and enriched but then discarded from an observability perspective—they won't be queryable or available for dashboarding. The logging/metrics exporter (OpenTelemetry's logging exporter) is for debugging only and does not persist to any backend. Attach this pipeline to a real log backend (e.g., otlp to a logging service) or remove it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ops/otel/otel-collector-config.yaml` around lines 68 - 71, The logs/intake
pipeline currently exports only to the debug-only logging/metrics exporter, so
replace or augment the exporters list for the logs/intake pipeline (symbol:
logs/intake, field: exporters) to point to a persistent backend (for example add
the otlp exporter or a file/remote logging exporter configured elsewhere) or
remove the logs/intake pipeline entirely; ensure the processors
(attributes/add_env, attributes/enrich_correlation) remain compatible with the
chosen exporter and update any exporter configuration blocks accordingly so logs
become queryable/persistent rather than just written to stdout/stderr.
| - job_name: intelgraph-gateway | ||
| metrics_path: /metrics | ||
| static_configs: | ||
| - targets: | ||
| - maestro-worker:9400 | ||
| - gateway:8080 | ||
| labels: | ||
| service: maestro-worker | ||
| tier: data | ||
|
|
||
| - job_name: maestro-gateway | ||
| metrics_path: /metrics | ||
| static_configs: | ||
| - targets: | ||
| - maestro-gateway:8081 | ||
| labels: | ||
| service: maestro-gateway | ||
| service: gateway |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -e
for base in apps/gateway gateway; do
if [ -d "$base" ]; then
rg -n -C3 --glob '!**/dist/**' --glob '!**/node_modules/**' \
"setupObservability|['\"]/metrics['\"]|['\"]/health['\"]" "$base"
fi
doneRepository: BrianCLong/summit
Length of output: 2032
🏁 Script executed:
#!/bin/bash
# Search for /metrics handler registration
rg -n '/metrics' --glob '!**/dist/**' --glob '!**/node_modules/**' -A 2 apps/gateway gatewayRepository: BrianCLong/summit
Length of output: 1449
🏁 Script executed:
#!/bin/bash
# Search for prometheus client or metrics setup
rg -n 'prometheus|prom-client|metrics.*express|metricsMiddleware' --glob '!**/dist/**' --glob '!**/node_modules/**' -i apps/gateway gatewayRepository: BrianCLong/summit
Length of output: 2897
🏁 Script executed:
#!/bin/bash
# Check for any default prometheus middleware in the main gateway server file
head -150 apps/gateway/src/server.ts | tail -60Repository: BrianCLong/summit
Length of output: 2154
Update the Prometheus scrape target to port 9464 where the metrics endpoint is exposed.
The gateway does bind both /metrics and /health endpoints, but not where the current config expects them. The /metrics endpoint is exposed on port 9464 (via OpenTelemetry's PrometheusExporter in apps/gateway/src/instrumentation.ts:41), while the Prometheus job configuration at lines 41-47 attempts to scrape from gateway:8080. This will result in a 404. Change the target to gateway:9464 to match where the metrics exporter actually listens.
Similarly, verify line 85-92 scrapes from the correct port and service endpoint where /metrics is actually exposed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ops/prometheus/prometheus.yml` around lines 41 - 47, Prometheus is scraping
the gateway at gateway:8080 but the metrics exporter (PrometheusExporter in
apps/gateway/src/instrumentation.ts) listens on port 9464; update the
intelgraph-gateway job (job_name: intelgraph-gateway) target from gateway:8080
to gateway:9464 so /metrics is reachable, and likewise locate any other scrape
job that targets the gateway or uses label service: gateway (the other job
referenced in the review) and change its target port to 9464 as well to ensure
Prometheus scrapes the correct endpoint.
Co-authored-by: BrianCLong <6404035+BrianCLong@users.noreply.github.com>
❌ TypeScript Gate ResultsType Safety
About TypeScript GateThis gate enforces type safety:
To fix TypeScript errors:
|
🔒 Hardened Security Gate✅ Security Posture: Summary:
Component Status:
Recommendation: Safe to proceed Artifacts: Generated by Hardened Security Gate on 2026-03-26T14:02:59.825Z |
❌ Lint Gate ResultsESLint
Code Quality
❌ Lint gate failedPlease fix the linter errors and warnings before merging. About Lint GateThis gate enforces zero linter warnings/errors and production code quality:
|
Sets up the comprehensive observability stack for the application locally by integrating OpenTelemetry, Jaeger, Prometheus, and Grafana in the core Docker Compose setup. It configures the OpenTelemetry Collector and automatically provisions Prometheus as a data source and loads dashboards in Grafana.
PR created automatically by Jules for task 17676075864050424083 started by @BrianCLong
Summary by CodeRabbit