diff --git a/docs/observability/customization/custom-prometheus-configs.md b/docs/observability/customization/custom-prometheus-configs.md index 6f2c34f..f662085 100644 --- a/docs/observability/customization/custom-prometheus-configs.md +++ b/docs/observability/customization/custom-prometheus-configs.md @@ -12,7 +12,7 @@ Mount these files under `site/prometheus/scrape-configs/exporters/*.yml` in dock ### Add Exporters to Prometheus -- `prometheus/scrape-configs/exporters/` - +- `prometheus/scrape-configs/exporters/` Add yaml files into this folder. These file should contain all exporter prometheus metrics, for example from node_exporter or CAdvisor. Add any hosts and ip addresses you want to collect /metrics from will be retrieved ```yaml @@ -26,6 +26,9 @@ Mount these files under `site/prometheus/scrape-configs/exporters/*.yml` in dock # __metrics_path__: /path/metrics # Optionally override the metrics path, the default is just /metrics # ... add all targets ``` + +Note that this project is setup to run Grafana Alloy to push metrics from individual VMs. The config in `prometheus/scrape-configs/exporters/` is an alternative way, allowing you to pull metrics from any other services. + ## Custom Prometheus Scrape Configs You can add compeltely custom prometheus scrape configs and recording rules by mounting in docker. diff --git a/docs/observability/get-started/quickstart.md b/docs/observability/get-started/quickstart.md index 35c0e8c..4a3bac8 100644 --- a/docs/observability/get-started/quickstart.md +++ b/docs/observability/get-started/quickstart.md @@ -28,7 +28,7 @@ If you can't use the script, see the [Manual Quickstart](../reference/quickstart ### Optional Step: Probe your own web page Now you can look at getting monitoring of your own page -1. In your current folder, in the file `prometheus/scrape-configs/probers/probe-simple.yml` add the following yml to the bottom of the file: +1. In your current folder, in the file `alloy/probers/probe-observability.yml` add the following yml to the bottom of the file: ```yaml - targets: @@ -52,11 +52,7 @@ This is the end of this quickstart tutorial, that enables probing availability o For the next steps we can: - Look deeper into the observability dashboards, on [Dashboards Userguide](./userguide-tutorial.md) -- Productionise our deployment to enable further features -- Configure *Telemetry* like VM memory usage, and Elasticsearch index size, by running Exporters -- Enable *Alerting* based on our availability and a defined Service Level Objective (SLO) -- Setup further *Probing* of our running services to get availability metrics -- Fully customize the stack with our own dashboards, recording rules and metrics +- Productionise our deployment to enable further features by following [Production Setup](../setup/production-setup.md) diff --git a/docs/observability/setup/production-setup.md b/docs/observability/setup/production-setup.md index 2c01ac2..054f11b 100644 --- a/docs/observability/setup/production-setup.md +++ b/docs/observability/setup/production-setup.md @@ -6,6 +6,12 @@ If you're new, we recommend completing the [Quickstart Tutorial](../get-started/ By the end of the tutorial, you will have a complete stack offering all the observability features, customized to your usage. +We will run the stack and then: +- Configure *Telemetry* like VM memory usage, and Elasticsearch index size, by running Exporters +- Enable *Alerting* based on our availability and a defined Service Level Objective (SLO) +- Setup further *Probing* of our running services to get availability metrics + + --- ## Step 1: Understand the Folder Structure @@ -44,7 +50,7 @@ Downloads the example docker compose files: Downloads the configurations: - [alloy/probers/probe-external.yml](../../../observability/examples/full/alloy/probers/probe-external.yml) -- [alloy/probers/probe-internal.yml ](../../../observability/examples/full/alloy/probers/probe-internal.yml) +- [alloy/probers/probe-observability.yml ](../../../observability/examples/full/alloy/probers/probe-observability.yml) - [prometheus/scrape-configs/exporters/exporters.yml](../../../observability/examples/full/prometheus/scrape-configs/exporters/exporters.yml) - [prometheus/scrape-configs/recording-rules/slo.yml](../../../observability/examples/full/prometheus/scrape-configs/recording-rules/slo.yml) @@ -58,10 +64,10 @@ The files come with basic defaults, so we can now run the stack ``` docker compose up -d - docker compose -f exporters.docker-compose.yml up -d ``` -This will launch Prometheus, Grafana, and all required services with +This will launch Prometheus, Grafana, and Alloy + ## Step 4: Create Site-Specific Config Files @@ -70,21 +76,24 @@ You must provide your own scrape and recording rules to tell Prometheus what to This is probably the hardest step: You will actually need to know what is running, and where it is! Building out these config files will give you that inventory, and give a real definition of what is running where. - Probers: HTTP endpoints you want to monitor for availability - - Add files in `scrape-configs/probers/*.yml` + - Add files in `alloy/probers/*.yml` - [Configure Probers](./probing.md) - -- Exporters: Targets like Elasticsearch or Docker - - Add files in `scrape-configs/exporters/*.yml` - - [Add Exporters](./telemetry.md) + +- Telemetry: Run Grafana Alloy on every VM you want telemetry from + - [Configure Telemetry](./probing.md) - Recording Rules: Define uptime goals or custom aggregations - Add files in `recording-rules/*.yml` - [Enable Alerting](./alerting.md) -## Step 5: Run Exporters Everywhere -The exporters need to be run on each VM that you want information from. It's a pull model, not push. +## Step 5: Run Grafana Alloy on every VM +The Grafana Alloy image needs to be run on each VM that you want to get information from. +Use the example docker compose file in [exporters.docker-compose.yml](../../../observability/examples/full/exporters.docker-compose.yml) which will start up alloy and get metrics + ``` + docker compose -f exporters.docker-compose.yml up -d + ``` --- ## What’s Next? @@ -93,7 +102,7 @@ Your observability stack is now monitoring your services, and you have a product You can now setup prometheus with any telemetry or probers required following the remaining steps in [Setup](./_index.md) -For the last steps, you can +For the last steps, you can: - Run the exporters on all the VMs that you want access to - Deploy the stack in produciton diff --git a/docs/observability/setup/telemetry.md b/docs/observability/setup/telemetry.md index acbdabf..3544e60 100644 --- a/docs/observability/setup/telemetry.md +++ b/docs/observability/setup/telemetry.md @@ -10,7 +10,6 @@ Grafana Alloy is used to get telemetry. These features are configured by default - Elastic Search Exporter: Get ES metrics like index size - CAdvisor: This gives docker metrics, eg what containers are running - ## How to get Telemetry - Copy this docker compose file: (exporters.docker-compose.yml)[observability/examples/full/exporters.docker-compose.yml] diff --git a/observability/examples/alloy/docker-compose.yml b/observability/examples/alloy/docker-compose.yml deleted file mode 100755 index 4795443..0000000 --- a/observability/examples/alloy/docker-compose.yml +++ /dev/null @@ -1,78 +0,0 @@ -# Observability main stack. Prometheus and Grafana. -# Depends on docker-compose.exporters.yml for the network -name: "cogstack-observability" -services: - prometheus: - image: cogstacksystems/cogstack-observability-prometheus:latest - restart: unless-stopped - ports: - - "9090:9090" - volumes: - - ${BASE_DIR-.}/prometheus:/etc/prometheus/cogstack/site/ - - prometheus-data:/prometheus - networks: - - observability - command: - - "--config.file=/etc/prometheus/cogstack/defaults/prometheus.yml" - - "--storage.tsdb.path=/prometheus" - - "--storage.tsdb.retention.time=30d" - - "--web.external-url=/prometheus" - - "--web.route-prefix=/prometheus" - - "--web.enable-remote-write-receiver" - grafana: - image: cogstacksystems/cogstack-observability-grafana:latest - restart: unless-stopped - volumes: - - grafana-data:/var/lib/grafana - networks: - - observability - environment: - - GF_AUTH_ANONYMOUS_ENABLED=true # Allows use of grafana without sign in - - GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer - traefik: - image: cogstacksystems/cogstack-observability-traefik:latest - networks: - - observability - restart: unless-stopped - ports: - - "80:80" - volumes: - - /var/run/docker.sock:/var/run/docker.sock:ro # So that Traefik can listen to the Docker events - blackbox-exporter: - image: cogstacksystems/cogstack-observability-blackbox-exporter:latest - restart: unless-stopped - networks: - - observability - alloy: - image: grafana/alloy:latest - command: - - run - - --server.http.listen-addr=0.0.0.0:12345 - - --storage.path=/var/lib/alloy/data - - --server.http.ui-path-prefix=/alloy - - /etc/alloy - ports: - - "12345:12345" - volumes: - - ${BASE_DIR-.}/grafana-alloy/config.alloy:/etc/alloy/config.alloy - # CAdvisor - - /:/rootfs:ro - - /var/run:/var/run:rw - - /sys:/sys:ro - - /var/lib/docker/:/var/lib/docker:ro - labels: - - "traefik.enable=true" - - "traefik.http.routers.alloy.rule=PathPrefix(`/alloy`)" - environment: - - PROMETHEUS_URL=${PROMETHEUS_URL-http://cogstack-observability-prometheus-1:9090/prometheus/api/v1/write} - - ALLOY_HOSTNAME=${ALLOY_HOSTNAME} # Used to add a label to metrics - - ALLOY_IP_ADDRESS=${ALLOY_IP_ADDRESS} # Used to add a label to metrics - networks: - - observability -networks: - observability: - driver: bridge - -volumes: - prometheus-data: - grafana-data: \ No newline at end of file diff --git a/observability/examples/alloy/prometheus/scrape-configs/probers/probe-internal.yml b/observability/examples/alloy/prometheus/scrape-configs/probers/probe-internal.yml deleted file mode 100644 index b3d7353..0000000 --- a/observability/examples/alloy/prometheus/scrape-configs/probers/probe-internal.yml +++ /dev/null @@ -1,6 +0,0 @@ -# Example of probe targets -- targets: - - https://cogstack.org - labels: - name: cogstack-homepage - job: probe-services \ No newline at end of file diff --git a/observability/examples/alloy/prometheus/scrape-configs/recording-rules/slo.yml b/observability/examples/alloy/prometheus/scrape-configs/recording-rules/slo.yml deleted file mode 100644 index 440913c..0000000 --- a/observability/examples/alloy/prometheus/scrape-configs/recording-rules/slo.yml +++ /dev/null @@ -1,8 +0,0 @@ -groups: - - name: slo-target-rules - rules: - # What SLO am I targeting - - record: slo_target_over_30_days - expr: 0.95 # We target 95% uptime over 30 days - labels: - job: "probe-external-demo-apps" #Job here must match the job in the probe targets \ No newline at end of file diff --git a/observability/examples/full/alloy/probers/probe-internal.yml b/observability/examples/full/alloy/probers/probe-internal.yml deleted file mode 100644 index fa16233..0000000 --- a/observability/examples/full/alloy/probers/probe-internal.yml +++ /dev/null @@ -1,6 +0,0 @@ -# Example of probe targets in a different file. -- targets: - - https://cogstack.org - labels: - name: cogstack-homepage - job: probe-internal-services \ No newline at end of file diff --git a/observability/examples/full/alloy/probers/probe-observability.yml b/observability/examples/full/alloy/probers/probe-observability.yml new file mode 100644 index 0000000..974e527 --- /dev/null +++ b/observability/examples/full/alloy/probers/probe-observability.yml @@ -0,0 +1,12 @@ +- targets: + - cogstack-observability-traefik-1/grafana/api/health + labels: + name: grafana + job: probe-observability-stack + host: localhost +- targets: + - cogstack-observability-traefik-1/prometheus/-/healthy + labels: + name: prometheus + job: probe-observability-stack + host: localhost \ No newline at end of file diff --git a/observability/examples/full/docker-compose.yml b/observability/examples/full/docker-compose.yml index 87b6a8a..52d03e8 100755 --- a/observability/examples/full/docker-compose.yml +++ b/observability/examples/full/docker-compose.yml @@ -6,6 +6,8 @@ services: image: cogstacksystems/cogstack-observability-alloy:latest ports: - "12345:12345" + networks: + - observability volumes: - ${BASE_DIR-.}/alloy/probers:/etc/alloy/probers # CAdvisor diff --git a/observability/examples/full/exporters.elastic.docker-compose.yml b/observability/examples/full/exporters.elastic.docker-compose.yml index 0b5c69f..29d41f8 100644 --- a/observability/examples/full/exporters.elastic.docker-compose.yml +++ b/observability/examples/full/exporters.elastic.docker-compose.yml @@ -15,7 +15,7 @@ services: - PROMETHEUS_URL=http://cogstack-observability-prometheus-1:9090/prometheus - ALLOY_HOSTNAME=${ALLOY_HOSTNAME-localhost} # Used to add a label to metrics - ALLOY_IP_ADDRESS=${ALLOY_IP_ADDRESS-localhost} # Used to add a label to metrics - - ELASTICSEARCH_URL=${ELASTICSEARCH_URL-https://elassticsearch-1:9200} + - ELASTICSEARCH_URL=${ELASTICSEARCH_URL-https://elasticsearch-1:9200} - ELASTICSEARCH_USERNAME=${ELASTICSEARCH_USERNAME-user} # Used to get metrics from Elasticsearch - ELASTICSEARCH_PASSWORD=${ELASTICSEARCH_PASSWORD-pass} # Used to get metrics from Elasticsearch networks: diff --git a/observability/examples/full/full-quickstart.sh b/observability/examples/full/full-quickstart.sh index c811f3d..319c67e 100644 --- a/observability/examples/full/full-quickstart.sh +++ b/observability/examples/full/full-quickstart.sh @@ -10,6 +10,7 @@ download_to() { curl -fsSL -o "$path" "$url" } +mkdir -p cogstack-observability/alloy/probers mkdir -p cogstack-observability/prometheus/scrape-configs/probers mkdir -p cogstack-observability/prometheus/scrape-configs/exporters mkdir -p cogstack-observability/prometheus/scrape-configs/recording-rules diff --git a/observability/examples/simple/alloy/probers/probe-observability.yml b/observability/examples/simple/alloy/probers/probe-observability.yml new file mode 100644 index 0000000..974e527 --- /dev/null +++ b/observability/examples/simple/alloy/probers/probe-observability.yml @@ -0,0 +1,12 @@ +- targets: + - cogstack-observability-traefik-1/grafana/api/health + labels: + name: grafana + job: probe-observability-stack + host: localhost +- targets: + - cogstack-observability-traefik-1/prometheus/-/healthy + labels: + name: prometheus + job: probe-observability-stack + host: localhost \ No newline at end of file diff --git a/observability/examples/simple/alloy/probers/probe-simple.yml b/observability/examples/simple/alloy/probers/probe-simple.yml deleted file mode 100644 index b3d7353..0000000 --- a/observability/examples/simple/alloy/probers/probe-simple.yml +++ /dev/null @@ -1,6 +0,0 @@ -# Example of probe targets -- targets: - - https://cogstack.org - labels: - name: cogstack-homepage - job: probe-services \ No newline at end of file diff --git a/observability/examples/simple/quickstart.sh b/observability/examples/simple/quickstart.sh index ecc7569..577430b 100644 --- a/observability/examples/simple/quickstart.sh +++ b/observability/examples/simple/quickstart.sh @@ -9,8 +9,8 @@ curl -fsSL -o docker-compose.yml \ https://raw.githubusercontent.com/CogStack/cogstack-platform-toolkit/main/observability/examples/simple/docker-compose.yml echo "Downloading probe-simple.yml into alloy/probers/..." -curl -fsSL -o probers/probe-simple.yml \ - https://raw.githubusercontent.com/CogStack/cogstack-platform-toolkit/main/observability/examples/simple/probers/probe-simple.yml +curl -fsSL -o probers/probe-observability.yml \ + https://raw.githubusercontent.com/CogStack/cogstack-platform-toolkit/main/observability/examples/simple/probers/probe-observability.yml echo "Setup complete in observability-simple/" diff --git a/observability/grafana-alloy/Dockerfile b/observability/grafana-alloy/Dockerfile index affb334..525e85f 100644 --- a/observability/grafana-alloy/Dockerfile +++ b/observability/grafana-alloy/Dockerfile @@ -3,6 +3,8 @@ FROM grafana/alloy:latest LABEL traefik.enable="true" \ traefik.http.routers.alloy.rule="PathPrefix(`/alloy`)" +RUN mkdir -p /etc/alloy/probers + COPY ./defaults /etc/alloy CMD [ \ diff --git a/observability/grafana-alloy/defaults/config.alloy b/observability/grafana-alloy/defaults/config.alloy index 34f8c7a..7f57192 100644 --- a/observability/grafana-alloy/defaults/config.alloy +++ b/observability/grafana-alloy/defaults/config.alloy @@ -8,31 +8,7 @@ prometheus.remote_write "default" { url = sys.env("PROMETHEUS_URL") + "/api/v1/write" } external_labels = { - host = sys.env("ALLOY_HOSTNAME"), - ip_address = sys.env("ALLOY_IP_ADDRESS"), + alloy_hostname = sys.env("ALLOY_HOSTNAME"), + alloy_ip_address = sys.env("ALLOY_IP_ADDRESS"), } -} - -prometheus.scrape "exporter" { - scrape_interval = "15s" - targets = array.concat( - prometheus.exporter.self.alloy.targets, - prometheus.exporter.cadvisor.local_cadvisor.targets, - prometheus.exporter.unix.local_node_exporter.targets, - ) - forward_to = [prometheus.remote_write.default.receiver] -} - -// Alloys internal metrics -prometheus.exporter.self "alloy" { -} - -// CAdvisor -prometheus.exporter.cadvisor "local_cadvisor" { - docker_host = "unix:///var/run/docker.sock" - storage_duration = "5m" -} - -// Node exporter -prometheus.exporter.unix "local_node_exporter" { -} +} \ No newline at end of file diff --git a/observability/examples/alloy/grafana-alloy/config.alloy b/observability/grafana-alloy/defaults/default-exporters.alloy similarity index 51% rename from observability/examples/alloy/grafana-alloy/config.alloy rename to observability/grafana-alloy/defaults/default-exporters.alloy index eaa2c5c..28aa51b 100644 --- a/observability/examples/alloy/grafana-alloy/config.alloy +++ b/observability/grafana-alloy/defaults/default-exporters.alloy @@ -1,25 +1,7 @@ -logging { - level = "debug" - format = "logfmt" -} - -prometheus.remote_write "default" { - endpoint { - url = sys.env("PROMETHEUS_URL") - } - external_labels = { - host = sys.env("ALLOY_HOSTNAME"), - ip_address = sys.env("ALLOY_IP_ADDRESS"), - } -} - +// Default exporters to be run on every VM to get metrics from. prometheus.scrape "exporter" { scrape_interval = "15s" - targets = array.concat( - prometheus.exporter.self.alloy.targets, - prometheus.exporter.cadvisor.local_cadvisor.targets, - prometheus.exporter.unix.local_node_exporter.targets, - ) + targets = discovery.relabel.exporters_with_default_labels.output forward_to = [prometheus.remote_write.default.receiver] } @@ -29,10 +11,28 @@ prometheus.exporter.self "alloy" { // CAdvisor prometheus.exporter.cadvisor "local_cadvisor" { - docker_host = "unix:///var/run/docker.sock" - storage_duration = "5m" + docker_host = "unix:///var/run/docker.sock" + storage_duration = "5m" } // Node exporter prometheus.exporter.unix "local_node_exporter" { } + + +discovery.relabel "exporters_with_default_labels" { + targets = array.concat( + prometheus.exporter.self.alloy.targets, + prometheus.exporter.cadvisor.local_cadvisor.targets, + prometheus.exporter.unix.local_node_exporter.targets, + ) + + rule { + target_label = "host" + replacement = sys.env("ALLOY_HOSTNAME") + } + rule { + target_label = "ip_address" + replacement = sys.env("ALLOY_IP_ADDRESS") + } +} diff --git a/observability/grafana-alloy/defaults/probe-default-config.alloy b/observability/grafana-alloy/defaults/default-probe.alloy similarity index 56% rename from observability/grafana-alloy/defaults/probe-default-config.alloy rename to observability/grafana-alloy/defaults/default-probe.alloy index 8c55d47..2b5d28e 100644 --- a/observability/grafana-alloy/defaults/probe-default-config.alloy +++ b/observability/grafana-alloy/defaults/default-probe.alloy @@ -1,48 +1,6 @@ - -// Blackbox Exporter - Probe Observability Stack -prometheus.exporter.blackbox "probe_observability_stack" { - config_file = "/etc/alloy/blackbox-exporter.yml" - target { - name = "grafana" - address = "cogstack-observability-traefik-1/grafana/api/health" - module = "http_get_200" - labels = { - "name" = "grafana", - "host" = "localhost", - } - } - - target { - name = "prometheus" - address = "cogstack-observability-traefik-1/prometheus/-/healthy" - module = "http_get_200" - labels = { - "name" = "prometheus", - "host" = "localhost", - } - } -} -discovery.relabel "probe_observability_stack_results" { - targets = prometheus.exporter.blackbox.probe_observability_stack.targets - - rule { - source_labels = ["__param_target"] - target_label = "instance" - regex = "([^/]+)/(.*)" - replacement = "prometheus-host/$2" - } - - rule { - target_label = "job" - replacement = "probe-observability-stack" - } - rule { - target_label = "host" - replacement = "prometheus-host" - } -} - // Blackbox Exporter - Probe External Services +// Mount any yaml into the discovery folder, and this will probe it + discovery.file "probe_target_files" { files = ["/etc/alloy/probers/*.yml"] refresh_interval = "5m" @@ -104,11 +62,8 @@ discovery.relabel "probe_discovered_targets_results" { } // Scrape Probe targets -prometheus.scrape "blackbox_exporter" { +prometheus.scrape "probe_targets_scrape" { scrape_interval = "15s" - targets = array.concat( - discovery.relabel.probe_discovered_targets_results.output, - discovery.relabel.probe_observability_stack_results.output, - ) + targets = discovery.relabel.probe_discovered_targets_results.output forward_to = [prometheus.remote_write.default.receiver] } \ No newline at end of file diff --git a/observability/grafana/Dockerfile.alloy b/observability/grafana/Dockerfile.alloy deleted file mode 100644 index f87f5c1..0000000 --- a/observability/grafana/Dockerfile.alloy +++ /dev/null @@ -1 +0,0 @@ -# TODO \ No newline at end of file diff --git a/observability/grafana/provisioning/dashboards/default/infrastructure/docker-metrics-cadvisor.json b/observability/grafana/provisioning/dashboards/default/cogstack/docker-metrics-cadvisor.json similarity index 100% rename from observability/grafana/provisioning/dashboards/default/infrastructure/docker-metrics-cadvisor.json rename to observability/grafana/provisioning/dashboards/default/cogstack/docker-metrics-cadvisor.json diff --git a/observability/grafana/provisioning/dashboards/default/infrastructure/vm-metrics-node-exporter.json b/observability/grafana/provisioning/dashboards/default/cogstack/vm-metrics-node-exporter.json similarity index 100% rename from observability/grafana/provisioning/dashboards/default/infrastructure/vm-metrics-node-exporter.json rename to observability/grafana/provisioning/dashboards/default/cogstack/vm-metrics-node-exporter.json diff --git a/observability/grafana/provisioning/dashboards/default/infrastructure/grafana-metrics.json b/observability/grafana/provisioning/dashboards/default/internals/grafana-metrics.json similarity index 100% rename from observability/grafana/provisioning/dashboards/default/infrastructure/grafana-metrics.json rename to observability/grafana/provisioning/dashboards/default/internals/grafana-metrics.json diff --git a/observability/grafana/provisioning/dashboards/default/infrastructure/prometheus-metrics.json b/observability/grafana/provisioning/dashboards/default/internals/prometheus-metrics.json similarity index 100% rename from observability/grafana/provisioning/dashboards/default/infrastructure/prometheus-metrics.json rename to observability/grafana/provisioning/dashboards/default/internals/prometheus-metrics.json diff --git a/observability/grafana/provisioning/dashboards/default/infrastructure/traefik-dashboard.json b/observability/grafana/provisioning/dashboards/default/internals/traefik-dashboard.json similarity index 100% rename from observability/grafana/provisioning/dashboards/default/infrastructure/traefik-dashboard.json rename to observability/grafana/provisioning/dashboards/default/internals/traefik-dashboard.json