317 Refactor Observability Stack with OpenTelemetry by ginaxu1 · Pull Request #371 · OpenDIF/opendif-core

ginaxu1 · 2025-12-05T07:16:16Z

Summary

This PR connects Go services to the observability stack by implementing OpenTelemetry-based metrics. This enables Prometheus to scrape metrics from these services and display them in Grafana dashboards. Services now use vendor-agnostic OpenTelemetry instrumentation to allow seamless switching between Prometheus (default for local dev), Datadog, New Relic, or any OTLP-compatible backend without changing code - just environment variables.

All services now expose the following Prometheus metrics:

http_requests_total{http_method, http_route, http_status_code} - Total HTTP request count by method, route, and status code
http_request_duration_seconds{http_method, http_route} - HTTP request latency histogram by method and route
external_calls_total{external_target, external_operation} - External service call metrics (exchange services)
business_events_total{business_action, business_outcome} - Business event metrics (exchange services)

Why these changes are needed:

The observability stack (Prometheus + Grafana) is configured but cannot collect data without service instrumentation
OpenTelemetry provides vendor-agnostic instrumentation, allowing teams to choose their observability backend (Prometheus, Datadog, New Relic, etc.) without code changes

Type of Change

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Other (please describe):

Architectural Changes

OpenTelemetry Integration: All services now use OpenTelemetry SDK for vendor-agnostic metrics collection
Shared Monitoring Package: Created reusable monitoring package in exchange/shared/monitoring/ for all exchange services
Backward-Compatible API: Old API (monitoring.Handler(), monitoring.HTTPMetricsMiddleware()) still works, delegates to OpenTelemetry
Auto-Initialization: Metrics initialize automatically when first used (no explicit initialization needed)
Non-Breaking Integration: Metrics are added without modifying existing handler logic
Vendor-Agnostic: Switch between Prometheus, Datadog, New Relic via environment variables (no code changes)

Testing

I have tested this change locally
I have added unit tests for new functionality
I have tested edge cases
All existing tests pass

Test Results

Runtime Testing

To verify the observability stack is working:

Start observability stack:

cd observability
./start-grafana.sh  # or: docker compose up -d

Check Prometheus targets:
- Open http://localhost:9091/targets
- Services should show as "UP" (green)

Generate sample traffic:

cd observability
./generate_sample_traffic.sh

View metrics in Grafana:
- Open http://localhost:3002/d/go-services/go-services-metrics
- Metrics should appear after services receive traffic

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have checked that there are no merge conflicts
I have verified all services are on opendif-network
I have verified Prometheus can scrape all service endpoints

Related Issues

Deployment Notes

Pre-Deployment Checklist

Service Restart Required: Services must be restarted to load the new monitoring code
```
# Stop existing services
# Rebuild: go build .
# Restart services
```

Network Setup: Ensure opendif-network exists before starting services

docker network create opendif-network  # if it doesn't exist
# Or use: cd observability && ./start-grafana.sh

No Configuration Changes Required: Services use Prometheus exporter by default (no env vars needed for local dev)
Prometheus Already Configured: Prometheus is already configured to scrape these services (see observability/prometheus/prometheus.yml)
Grafana Dashboard Ready: Grafana dashboard is already configured to display these metrics

Environment Variables (Optional)

For local development, no environment variables are needed (Prometheus is default).

To switch to other backends (Datadog, New Relic, etc.), set:

export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=<your-endpoint>
export OTEL_EXPORTER_OTLP_HEADERS="<your-headers>"
export SERVICE_NAME=<service-name>

Post-Deployment Verification

Check Metrics Endpoints:

curl http://localhost:4000/metrics | grep http_requests_total
curl http://localhost:8082/metrics | grep http_requests_total
curl http://localhost:3000/metrics | grep http_requests_total

Verify Prometheus Scraping:
- Open http://localhost:9091/targets
- All services should show as "UP" (green)
View in Grafana:
- Open http://localhost:3002/d/go-services/go-services-metrics
- Metrics should appear after services receive some traffic

Generate Sample Traffic:

cd observability
./generate_sample_traffic.sh

Migration Notes

Backward Compatible: The API remains the same - existing code continues to work
Auto-initialization: Metrics initialize automatically when first used
No code changes required: Services using monitoring.Handler() or monitoring.HTTPMetricsMiddleware() work without changes
OpenTelemetry Under the Hood: Prometheus client is now indirect dependency via OpenTelemetry Prometheus exporter

Future Work

Add metrics instrumentation to Consent Engine and Audit Service
Add custom business metrics for specific use cases
Configure alerting rules in Prometheus
Add distributed tracing (OpenTelemetry traces)

Copilot

Pull request overview

This PR integrates OpenTelemetry-based metrics collection into Portal Backend, Orchestration Engine, and Policy Decision Point services, enabling vendor-agnostic observability with support for Prometheus (default), Datadog, New Relic, and other OTLP-compatible backends.

Key Changes:

Created shared monitoring package (exchange/shared/monitoring/) with OpenTelemetry instrumentation
Added portal-backend middleware (v1/middleware/) for metrics collection
Configured Prometheus scraping and added sample traffic generation script

Reviewed changes

Copilot reviewed 20 out of 24 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`exchange/shared/monitoring/otel_metrics.go`	Core OpenTelemetry metrics implementation for exchange services with support for Prometheus and OTLP exporters
`exchange/shared/monitoring/metrics.go`	Backward-compatible API wrapper with route normalization to prevent cardinality explosion
`portal-backend/v1/middleware/otel_metrics.go`	Portal-specific OpenTelemetry middleware implementation
`portal-backend/main.go`	Integrates metrics middleware and handler into portal-backend
`exchange/policy-decision-point/main.go`	Adds metrics instrumentation to PDP service
`exchange/orchestration-engine/server/server.go`	Adds metrics instrumentation to orchestration engine
`observability/prometheus/prometheus.yml`	Updates scrape configuration for instrumented services
`observability/generate_sample_traffic.sh`	Script to generate sample HTTP traffic for testing metrics
`exchange/shared/monitoring/go.mod`	New module definition with invalid Go version 1.24.6
`exchange/orchestration-engine/go.mod`	Updated with monitoring dependency and invalid Go version 1.25.0
`portal-backend/go.mod`	Updated with OpenTelemetry dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

exchange/shared/monitoring/otel_metrics.go

portal-backend/v1/middleware/otel_metrics.go

observability/prometheus/prometheus.yml

portal-backend/v1/middleware/otel_metrics.go

exchange/shared/monitoring/otel_metrics.go

portal-backend/v1/middleware/otel_metrics.go

observability/generate_sample_traffic.sh

observability/README.md

observability/prometheus/prometheus.yml

exchange/shared/monitoring/otel_metrics.go

exchange/shared/monitoring/metrics.go

exchange/consent-engine/main.go

exchange/shared/monitoring/metrics.go

mushrafmim

Other than that this is looking good, since the consent-engine is added, I will validate whether it is working along properly and approve the PR.

exchange/shared/monitoring/metrics.go

gemini-code-assist

Code Review

This pull request is a significant and well-executed refactoring to introduce a vendor-agnostic observability stack using OpenTelemetry. The new shared monitoring package is well-structured, and the documentation updates are excellent and very thorough. My review focuses on a few key areas to further improve the robustness and security of the implementation. I've identified a critical security regression in the Nginx configuration, a high-risk issue with route normalization that could lead to metric cardinality explosion, and a medium-severity inconsistency in histogram bucket configuration. Addressing these points will make this already strong contribution even better.

portals/consent-portal/nginx.conf

exchange/shared/monitoring/metrics.go

exchange/shared/monitoring/otel_metrics.go

exchange/consent-engine/main.go

ginaxu1 requested review from mushrafmim and sthanikan2000 and removed request for sthanikan2000 December 5, 2025 07:17

ginaxu1 changed the title ~~317 part2 connect~~ 317 Observability connect with OE and PDP Dec 5, 2025

ginaxu1 added Priority/Highest Priority/Normal and removed Priority/Highest labels Dec 5, 2025

ginaxu1 force-pushed the 317-part2-connect branch from b74937a to e39b5fb Compare December 8, 2025 07:25

ginaxu1 changed the title ~~317 Observability connect with OE and PDP~~ 317 Connect OE, PDP, Portal Backend to Observability Stack with OpenTelemetry Dec 8, 2025

sthanikan2000 requested a review from Copilot December 8, 2025 07:55

Copilot started reviewing on behalf of sthanikan2000 December 8, 2025 07:55 View session

Copilot AI reviewed Dec 8, 2025

View reviewed changes

ginaxu1 changed the title ~~317 Connect OE, PDP, Portal Backend to Observability Stack with OpenTelemetry~~ 317 Connect OE, PDP to Observability Stack with OpenTelemetry Dec 8, 2025

sthanikan2000 requested a review from Copilot December 8, 2025 12:25

Copilot started reviewing on behalf of sthanikan2000 December 8, 2025 12:25 View session

This comment was marked as outdated.

Sign in to view

ginaxu1 force-pushed the 317-part2-connect branch from bcf663b to c6939c1 Compare December 9, 2025 01:58

ginaxu1 changed the title ~~317 Connect OE, PDP to Observability Stack with OpenTelemetry~~ 317 Refactor Observability Stack with OpenTelemetry Dec 9, 2025

ginaxu1 mentioned this pull request Dec 9, 2025

317 Connect Orchestration Engine and Policy Decision Point to Observability Stack #380

Closed

19 tasks

sthanikan2000 reviewed Dec 14, 2025

View reviewed changes

observability/prometheus/prometheus.yml Show resolved Hide resolved

ginaxu1 force-pushed the 317-part2-connect branch from 276c2ee to 4f318d9 Compare December 15, 2025 11:21

mushrafmim requested changes Dec 15, 2025

View reviewed changes

ginaxu1 force-pushed the 317-part2-connect branch from 4f318d9 to 5a0c3f0 Compare December 15, 2025 16:52

mushrafmim requested changes Dec 15, 2025

View reviewed changes

exchange/shared/monitoring/metrics.go Show resolved Hide resolved

ginaxu1 requested review from mushrafmim and sthanikan2000 December 15, 2025 16:56

mushrafmim requested changes Dec 16, 2025

View reviewed changes

exchange/shared/monitoring/metrics.go Outdated Show resolved Hide resolved

ginaxu1 force-pushed the 317-part2-connect branch from 9b67ac4 to 448532b Compare December 17, 2025 06:00

ginaxu1 requested a review from mushrafmim December 17, 2025 06:04

ginaxu1 force-pushed the 317-part2-connect branch from 448532b to af8ba3f Compare December 18, 2025 09:24

gemini-code-assist bot reviewed Dec 19, 2025

View reviewed changes

portals/consent-portal/nginx.conf Show resolved Hide resolved

exchange/shared/monitoring/metrics.go Outdated Show resolved Hide resolved

exchange/shared/monitoring/otel_metrics.go Show resolved Hide resolved

ginaxu1 force-pushed the 317-part2-connect branch 2 times, most recently from eba3284 to 3825981 Compare December 19, 2025 06:45

mushrafmim requested changes Dec 23, 2025

View reviewed changes

exchange/consent-engine/main.go Show resolved Hide resolved

Refactor observability stack with OpenTelemetry

e5d0cda

ginaxu1 force-pushed the 317-part2-connect branch from 4ee78bb to e5d0cda Compare December 23, 2025 08:43

ginaxu1 requested a review from mushrafmim December 23, 2025 08:52

mushrafmim approved these changes Dec 23, 2025

View reviewed changes

OpenDIF deleted a comment from sthanikan2000 Dec 23, 2025

ginaxu1 merged commit eef05e8 into main Dec 23, 2025
11 checks passed

ginaxu1 deleted the 317-part2-connect branch December 23, 2025 15:32

ginaxu1 added a commit that referenced this pull request Jan 3, 2026

Refactor observability stack with OpenTelemetry (#371)

bd29d97

ginaxu1 added a commit that referenced this pull request Jan 3, 2026

Refactor observability stack with OpenTelemetry (#371)

9abfa88

ginaxu1 mentioned this pull request Jan 5, 2026

Add Observability: Metrics for the Project #317

Open

3 tasks

sthanikan2000 pushed a commit that referenced this pull request Jan 13, 2026

Refactor observability stack with OpenTelemetry (#371)

75b1eef

Conversation

ginaxu1 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

Architectural Changes

Testing

Test Results

Runtime Testing

Checklist

Related Issues

Deployment Notes

Pre-Deployment Checklist

Environment Variables (Optional)

Post-Deployment Verification

Migration Notes

Future Work

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mushrafmim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ginaxu1 commented Dec 5, 2025 •

edited

Loading