Conversation
Alloy is a recommended replacement for promtail and makes collecting docker logs convenient bypassing file exporter limitations (e.g. missing container_name label). Logs now do not go via OTEL exporter by to loki directly.
Pull Request Test Coverage Report for Build 23599234188Details
💛 - Coveralls |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds basic monitoring to our docker compose. It can be optionally run for both community and enterprise edition as additional docker compose stack.
Dependencies are always installed in dockerfile but telemetry is enabled with
OTEL_ENABLEDflag. We use Opentelemetry (OTEL) collector as centralized receiver for metrics, logs and traces. They are then passed to backends and can be visualized in grafana via prometheus, loki or tempo respectively.Logging was set up in a way that it contains trace id in logs so can be linked via derived fields with traces.
Traces were instrumented for flask app, celery worker app, celery beat and sqlalchemy. For production there is an option to set up sampling
OTEL_TRACES_SAMPLER_ARGwith parent relationship (e.g. flask-celery worker, celery beat - celery worker). For celery metrics there is special flagOTEL_MANUAL_CELERY_TRACINGto enable in case of threads/gevent workers.Metrics are collected from gunicorn, flask and celery. We replaced statsd sidecar container with statsd receiver in OTEL to get basic metrics like worker count. Standard http requests metrics are then scraped from flask workers directly. We also defined custom metrics for celery tasks and manually provisioned both metrics and traces as auto-provisioning was not working properly. There is also an option to scrape redis metrics, but it is only system related, there is no way to get custom metrics like celery queue length without custom exporter.
New env variables used (there is probably bunch of others coming from otel by default)