-
-
Notifications
You must be signed in to change notification settings - Fork 0
Observability Stack
Norm Brandinger edited this page Nov 20, 2025
·
1 revision
- Overview
- Prometheus Setup
- Grafana Dashboards
- Loki Log Aggregation
- Vector Pipeline
- cAdvisor Container Metrics
- Alerting Setup
- Custom Dashboards
- Related Pages
The observability stack provides comprehensive monitoring, logging, and visualization for all services.
Components:
- Prometheus: Metrics collection and alerting
- Grafana: Visualization and dashboards
- Loki: Log aggregation
- Vector: Unified observability pipeline
- cAdvisor: Container resource monitoring
Access: http://localhost:9090
Configuration: configs/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'redis'
static_configs:
- targets: ['redis-1:6379', 'redis-2:6379', 'redis-3:6379']
- job_name: 'rabbitmq'
static_configs:
- targets: ['rabbitmq:15692']
- job_name: 'fastapi'
static_configs:
- targets: ['reference-api:8000']
metrics_path: /metricsCommon Queries:
# CPU usage
rate(container_cpu_usage_seconds_total[5m]) * 100
# Memory usage
container_memory_usage_bytes / 1024 / 1024
# Network I/O
rate(container_network_receive_bytes_total[5m])
Access: http://localhost:3001
Default credentials: admin/admin
Pre-configured dashboards:
- Container Overview
- PostgreSQL Performance
- Redis Cluster
- RabbitMQ Metrics
- Application Metrics
Add data source:
- Configuration → Data Sources → Add data source
- Select Prometheus
- URL:
http://prometheus:9090 - Save & Test
Access: http://localhost:3100
Configuration: configs/loki/loki.yml
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
chunk_idle_period: 5m
chunk_retain_period: 30s
schema_config:
configs:
- from: 2024-01-01
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/cache
filesystem:
directory: /loki/chunks
limits_config:
retention_period: 720h # 30 daysQuery logs in Grafana Explore:
{container_name="dev-postgres"}
{container_name=~"dev-.*"} |= "ERROR"
rate({container_name="dev-postgres"}[5m])
Configuration: configs/vector/vector.toml
[sources.docker_logs]
type = "docker_logs"
include_containers = ["dev-*"]
[sinks.loki]
type = "loki"
inputs = ["docker_logs"]
endpoint = "http://loki:3100"
encoding.codec = "json"
labels.container_name = "{{ container_name }}"
[sinks.prometheus]
type = "prometheus_exporter"
inputs = ["docker_logs"]
address = "0.0.0.0:9598"Access: http://localhost:8080
Metrics exposed:
- CPU usage per container
- Memory usage and limits
- Network I/O
- Disk I/O
- Filesystem usage
Prometheus scrapes metrics automatically
Create alert in Grafana:
- Dashboard → Panel → Alert tab
- Define query and conditions
- Configure notification channel
- Set alert name and message
Example alert rules:
# configs/prometheus/alerts.yml
groups:
- name: service_alerts
rules:
- alert: HighMemoryUsage
expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High memory on {{ $labels.name }}"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: criticalImport dashboard:
- Dashboards → Import
- Upload JSON or enter dashboard ID
- Select data source
- Import
Popular dashboards:
- Docker Container & Host Metrics (ID: 10619)
- PostgreSQL Database (ID: 9628)
- Redis Dashboard (ID: 11835)
- Health-Monitoring - Health checks
- Prometheus-Queries - PromQL examples
- Grafana-Dashboards - Dashboard guide