Conversation
📝 WalkthroughWalkthroughAdds OpenTelemetry-based log exporting to the Python app and workers, introduces Loki service and configuration, wires the OTel Collector to export logs to Loki, updates Grafana to use Loki (with trace linking), and adds docs and Docker Compose integration for the observability stack. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Python App
participant SDK as OpenTelemetry SDK
participant Collector as OTel Collector
participant Loki as Loki
participant Grafana as Grafana
App->>SDK: setup_log_exporter(settings, logger)
Note over SDK: Configure OTLP HTTP exporter (endpoint http://loki:3100/otlp)
App->>App: Emit log records
App->>SDK: Export log records (batched)
SDK->>Collector: Send logs via OTLP HTTP
Collector->>Collector: Process logs (memory_limiter, batch, resource, attributes)
Collector->>Loki: Export logs via OTLP HTTP
Loki->>Loki: Index & store logs (TSDB, filesystem)
Grafana->>Loki: Query logs (LogQL / derivedFields trace link)
Loki->>Grafana: Return logs & trace IDs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/workers/run_event_replay.py (1)
17-56:⚠️ Potential issue | 🟡 MinorAdd Google‑style docstrings for all functions in this module.
main(), run(), init_replay(), and shutdown() are missing Google‑style docstrings with Args/Returns/Raises.📌 Proposed docstring updates
def main() -> None: - """Main entry point for event replay service""" + """Main entry point for event replay service. + + Returns: + None + """ @@ - async def run() -> None: + async def run() -> None: + """Start the event replay runtime. + + Returns: + None + """ @@ - async def init_replay() -> None: + async def init_replay() -> None: + """Initialize the replay cleanup scheduler. + + Returns: + None + """ @@ - async def shutdown() -> None: + async def shutdown() -> None: + """Shut down the scheduler and DI container. + + Returns: + None + """As per coding guidelines, “Use Google-style docstrings with Args/Returns/Raises sections for all functions and classes.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/workers/run_event_replay.py` around lines 17 - 56, Add Google-style docstrings for each function in this module: main, the nested async run, init_replay, and shutdown. For each function include a brief summary line and Args/Returns/Raises sections as appropriate (e.g., main() returns None, run() is async and returns None, init_replay() and shutdown() are async and return None; document any injected variables or side effects like initializing Beanie, starting the scheduler, or retrieving EventReplayService). Place the docstrings immediately below each def line using the Google docstring format so linters and docs extractors can pick them up.
🧹 Nitpick comments (1)
backend/app/core/logging.py (1)
136-137:addHandleris not idempotent — consider guarding against repeated calls.
logging.getLogger().addHandler(otel_handler)appends a newLoggingHandlerevery timesetup_log_exporteris called in the same process. Ifcreate_app(or any workermain) is accidentally invoked more than once per process, log records will be double- (or triple-) exported to the OTLP collector.🛡️ Proposed fix
+ # Guard against accidental double-initialization in the same process + root_logger = logging.getLogger() + if any(isinstance(h, LoggingHandler) for h in root_logger.handlers): + return + otel_handler = LoggingHandler(level=logging.DEBUG, logger_provider=logger_provider) - logging.getLogger().addHandler(otel_handler) + root_logger.addHandler(otel_handler)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/core/logging.py` around lines 136 - 137, setup_log_exporter currently creates an otel_handler = LoggingHandler(...) and unconditionally calls logging.getLogger().addHandler(otel_handler), which will append duplicates on repeated setup; change setup_log_exporter to be idempotent by checking for an existing LoggingHandler instance (or a handler with a unique attribute) on the root logger before adding otel_handler, or set and check a module-level flag like _otel_handler_installed; specifically reference and guard the code that creates otel_handler and calls logging.getLogger().addHandler so the handler is only added once per process.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/core/logging.py`:
- Around line 115-123: The ternary for "deployment.environment" inside
Resource.create is dead code because settings.TESTING is already guarded by the
early return; update the Resource.create call in logging.py (the dict that
includes SERVICE_NAME, SERVICE_VERSION, "service.namespace",
"deployment.environment", "service.instance.id") to set a deterministic
environment value instead of the unreachable ternary — either hard-code
"production" or read a dedicated setting (e.g., settings.DEPLOYMENT_ENV or
settings.ENVIRONMENT) and use that value so the field is meaningful at runtime.
---
Outside diff comments:
In `@backend/workers/run_event_replay.py`:
- Around line 17-56: Add Google-style docstrings for each function in this
module: main, the nested async run, init_replay, and shutdown. For each function
include a brief summary line and Args/Returns/Raises sections as appropriate
(e.g., main() returns None, run() is async and returns None, init_replay() and
shutdown() are async and return None; document any injected variables or side
effects like initializing Beanie, starting the scheduler, or retrieving
EventReplayService). Place the docstrings immediately below each def line using
the Google docstring format so linters and docs extractors can pick them up.
---
Nitpick comments:
In `@backend/app/core/logging.py`:
- Around line 136-137: setup_log_exporter currently creates an otel_handler =
LoggingHandler(...) and unconditionally calls
logging.getLogger().addHandler(otel_handler), which will append duplicates on
repeated setup; change setup_log_exporter to be idempotent by checking for an
existing LoggingHandler instance (or a handler with a unique attribute) on the
root logger before adding otel_handler, or set and check a module-level flag
like _otel_handler_installed; specifically reference and guard the code that
creates otel_handler and calls logging.getLogger().addHandler so the handler is
only added once per process.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
backend/app/core/logging.pybackend/app/main.pybackend/grafana/grafana.inibackend/grafana/provisioning/datasources/datasources.ymlbackend/loki/loki-config.yamlbackend/otel-collector-config.yamlbackend/workers/run_dlq_processor.pybackend/workers/run_event_replay.pybackend/workers/run_k8s_worker.pybackend/workers/run_pod_monitor.pybackend/workers/run_result_processor.pybackend/workers/run_saga_orchestrator.pydocker-compose.yamldocs/operations/grafana-integration.mddocs/operations/loki.mdmkdocs.yml
There was a problem hiding this comment.
Pull request overview
This PR extends the observability stack to include Loki-based log aggregation in Grafana, and adjusts Grafana auth/startup behavior to address access issues.
Changes:
- Add Loki service + volume in
docker-compose.yaml, and wire Grafana + OTel Collector to depend on Loki health. - Export Python application logs via OpenTelemetry OTLP to the collector, then to Loki; provision Loki as a Grafana datasource with trace linking to Jaeger.
- Add/extend docs and MkDocs navigation for Loki log aggregation and updated Grafana integration.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| mkdocs.yml | Adds Loki documentation page to Operations nav. |
| docs/operations/loki.md | New end-to-end Loki/OTel/Grafana wiring + usage documentation. |
| docs/operations/grafana-integration.md | Updates Grafana integration doc to include Loki. |
| docker-compose.yaml | Adds Loki container, Grafana password reset entrypoint, and updates dependencies/volumes. |
| backend/otel-collector-config.yaml | Adds otlphttp/loki exporter and routes logs pipeline to Loki. |
| backend/loki/loki-config.yaml | New Loki single-instance config (filesystem + structured metadata). |
| backend/grafana/provisioning/datasources/datasources.yml | Provisions Loki datasource with derived trace link to Jaeger. |
| backend/grafana/grafana.ini | Disables anonymous Grafana auth. |
| backend/app/main.py | Initializes setup_log_exporter during app creation. |
| backend/app/core/logging.py | Introduces setup_log_exporter (OTLP log export via OTel LoggingHandler). |
| backend/workers/run_*.py (6 files) | Initializes setup_log_exporter in each worker process. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
backend/app/core/logging.py
Outdated
| SERVICE_NAME: settings.SERVICE_NAME, | ||
| SERVICE_VERSION: settings.SERVICE_VERSION, | ||
| "service.namespace": "integr8scode", | ||
| "deployment.environment": "test" if settings.TESTING else "production", |
There was a problem hiding this comment.
deployment.environment is computed with "test" if settings.TESTING else "production", but this function returns early when settings.TESTING is true, so the "test" branch is unreachable. Also hardcoding "production" ignores Settings.ENVIRONMENT (used elsewhere for metrics). Consider setting this attribute from settings.ENVIRONMENT (or remove it) to avoid incorrect resource metadata.
| "deployment.environment": "test" if settings.TESTING else "production", | |
| "deployment.environment": settings.ENVIRONMENT, |
| resource = Resource.create({ | ||
| SERVICE_NAME: settings.SERVICE_NAME, |
There was a problem hiding this comment.
service.name is set from settings.SERVICE_NAME, but the per-worker override TOMLs only override TRACING_SERVICE_NAME, not SERVICE_NAME. As a result, all workers will likely export logs with the same service.name (integr8scode-backend), which makes the Loki queries in the new docs (per-worker service_name) misleading and reduces log/trace correlation. Consider deriving the log SERVICE_NAME from settings.TRACING_SERVICE_NAME or updating worker override configs to set SERVICE_NAME too.
| resource = Resource.create({ | |
| SERVICE_NAME: settings.SERVICE_NAME, | |
| # Prefer per-worker tracing service name for logs when available, to keep | |
| # log/trace correlation by service consistent across workers. | |
| service_name = getattr(settings, "TRACING_SERVICE_NAME", None) or settings.SERVICE_NAME | |
| resource = Resource.create({ | |
| SERVICE_NAME: service_name, |
| logger_provider = LoggerProvider(resource=resource) | ||
| logger_provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter)) | ||
| set_logger_provider(logger_provider) | ||
|
|
||
| otel_handler = LoggingHandler(level=logging.DEBUG, logger_provider=logger_provider) | ||
| logging.getLogger().addHandler(otel_handler) | ||
|
|
There was a problem hiding this comment.
setup_log_exporter is not idempotent: each call adds a new LoggingHandler to the root logger and sets a new global LoggerProvider. If create_app() is invoked more than once in a process (e.g., reloads, certain test/CLI paths), this will duplicate log exports. Consider guarding with a module-level flag or checking existing handlers/providers before adding another handler.
| otel_handler = LoggingHandler(level=logging.DEBUG, logger_provider=logger_provider) | ||
| logging.getLogger().addHandler(otel_handler) | ||
|
|
There was a problem hiding this comment.
New setup_log_exporter behavior isn’t covered by unit tests, while this module already has tests for setup_logger and related processors. Adding tests that verify the early-return guard (TESTING/missing endpoint) and that an OTel LoggingHandler is attached exactly once would help prevent regressions.
| otel_handler = LoggingHandler(level=logging.DEBUG, logger_provider=logger_provider) | |
| logging.getLogger().addHandler(otel_handler) | |
| # Ensure the OTel LoggingHandler is attached only once to avoid duplicate exports. | |
| root_logger = logging.getLogger() | |
| otel_handler = None | |
| for handler in root_logger.handlers: | |
| if isinstance(handler, LoggingHandler): | |
| otel_handler = handler | |
| break | |
| if otel_handler is None: | |
| otel_handler = LoggingHandler(level=logging.DEBUG, logger_provider=logger_provider) | |
| root_logger.addHandler(otel_handler) |
| The log exporter is initialized once per process, right after `setup_logger`. The backend calls it in `create_app`, and each worker calls it in its `main()`: | ||
|
|
||
| ```python | ||
| --8<-- "backend/app/core/logging.py:setup_logger" |
There was a problem hiding this comment.
This section says the backend/workers initialize setup_log_exporter after setup_logger, but the included snippet embeds only setup_logger. Consider including setup_log_exporter (or a small excerpt showing both calls) so the documentation matches the wiring described in the text.
| --8<-- "backend/app/core/logging.py:setup_logger" | |
| --8<-- "backend/app/core/logging.py:setup_logger" | |
| --8<-- "backend/app/core/logging.py:setup_log_exporter" |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/core/logging.py`:
- Around line 108-116: The docstring for setup_log_exporter is stale and missing
Google-style sections: remove the misleading phrase "Skipped in tests or when no
OTLP endpoint is configured" (it only checks
settings.OTEL_EXPORTER_OTLP_ENDPOINT) and replace the entire docstring with a
Google-style docstring that clearly describes the function, its behavior, and
includes Args: (settings: Settings, logger: structlog.stdlib.BoundLogger) and
Returns: (None) entries; if you want to note test behavior, only do so if you
add an explicit ENVIRONMENT or TEST flag check in setup_log_exporter, otherwise
omit any mention of tests. Ensure the updated docstring references the function
name setup_log_exporter and the parameters so readers can locate the behavior.
- Around line 132-137: setup_log_exporter currently unconditionally creates and
attaches a new LoggingHandler and LoggerProvider each call, causing duplicate
handlers on repeated invocations; make it idempotent by tracking whether you
already set up the provider/handler (e.g., a module-level flag or stored
references) and only create/set/add when not already present, or check the root
logger's handlers for an existing instance of LoggingHandler (the otel_handler
type) before calling logging.getLogger().addHandler; ensure you also retain the
LoggerProvider reference returned/created by LoggerProvider/ set_logger_provider
so future shutdown/flush can operate on the single tracked instance.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
backend/app/core/dishka_lifespan.pybackend/app/core/logging.pybackend/app/core/middlewares/metrics.pybackend/app/core/tracing/tracer.pybackend/app/dlq/manager.pybackend/app/services/notification_service.pybackend/app/settings.pybackend/config.test.toml
💤 Files with no reviewable changes (1)
- backend/app/settings.py
|



Summary by cubic
Fixes Grafana login by disabling anonymous access and resetting the admin password on startup. Adds Loki-based log aggregation with OTLP logs from the backend and workers, enabling log-to-trace linking in Grafana.
Bug Fixes
New Features
Written for commit 08edbbc. Summary will update on new commits.
Summary by CodeRabbit
New Features
Configuration
Documentation