Skip to content

Commit

Permalink
Add alerts for graph-node subgraphs (#821)
Browse files Browse the repository at this point in the history
Part of [Deploy v2 and updated v3 sushiswap subgraphs](https://www.notion.so/Deploy-v2-and-updated-v3-sushiswap-subgraphs-e331945fdeea487c890706fc22c6cc94)

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/821
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
  • Loading branch information
prathamesh0 authored and ashwin committed May 17, 2024
1 parent 254f95e commit 8f2da38
Show file tree
Hide file tree
Showing 4 changed files with 93 additions and 3 deletions.
6 changes: 6 additions & 0 deletions stack_orchestrator/data/compose/docker-compose-grafana.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,16 @@ services:
restart: always
environment:
GF_SERVER_ROOT_URL: ${GF_SERVER_ROOT_URL}
CERC_GRAFANA_ALERTS_SUBGRAPH_IDS: ${CERC_GRAFANA_ALERTS_SUBGRAPH_IDS}
volumes:
- ../config/monitoring/grafana/provisioning:/etc/grafana/provisioning
- ../config/monitoring/grafana/dashboards:/etc/grafana/dashboards
- ../config/monitoring/update-grafana-alerts-config.sh:/update-grafana-alerts-config.sh
- grafana_storage:/var/lib/grafana
user: root
entrypoint: ["bash", "-c"]
command: |
"/update-grafana-alerts-config.sh && /run.sh"
ports:
- "3000"
healthcheck:
Expand Down
64 changes: 64 additions & 0 deletions stack_orchestrator/data/config/monitoring/subgraph-alert-rules.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
apiVersion: 1
groups:
- orgId: 1
name: subgraph
folder: SubgraphAlerts
interval: 30s
rules:
- uid: b2a9144b-6104-46fc-92b5-352f4e643c4c
title: subgraph_head_tracking
condition: condition
data:
- refId: diff
relativeTimeRange:
from: 600
to: 0
datasourceUid: PBFA97CFB590B2093
model:
datasource:
type: prometheus
uid: PBFA97CFB590B2093
editorMode: code
expr: ethereum_chain_head_number - on(network) group_right deployment_head{deployment=~"REPLACE_WITH_SUBGRAPH_IDS"}
instant: true
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: false
refId: diff
- refId: condition
relativeTimeRange:
from: 600
to: 0
datasourceUid: __expr__
model:
conditions:
- evaluator:
params:
- 15
- 0
type: gt
operator:
type: and
query:
params: []
reducer:
params: []
type: avg
type: query
datasource:
name: Expression
type: __expr__
uid: __expr__
expression: diff
intervalMs: 1000
maxDataPoints: 43200
refId: condition
type: threshold
noDataState: OK
execErrState: Alerting
for: 5m
annotations:
summary: Subgraph deployment {{ index $labels "deployment" }} is falling behind head by {{ index $values "diff" }}
labels: {}
isPaused: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

echo Using CERC_GRAFANA_ALERTS_SUBGRAPH_IDS ${CERC_GRAFANA_ALERTS_SUBGRAPH_IDS}

# Replace subgraph ids in subgraph alerting config
# Note: Requires the grafana container to be run with user root
sed -i "s/REPLACE_WITH_SUBGRAPH_IDS/$CERC_GRAFANA_ALERTS_SUBGRAPH_IDS/g" /etc/grafana/provisioning/alerting/subgraph-alert-rules.yml
19 changes: 16 additions & 3 deletions stack_orchestrator/data/stacks/monitoring/monitoring-watchers.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,29 +113,39 @@ Add the following scrape configs to prometheus config file (`monitoring-watchers
labels:
instance: 'ajna'
chain: 'filecoin'

- job_name: graph-node
metrics_path: /metrics
scrape_interval: 30s
static_configs:
- targets: ['GRAPH_NODE_HOST:GRAPH_NODE_HOST_METRICS_PORT']
```

Add scrape config as done above for any additional watcher to add it to the Watchers dashboard.

### Grafana alerts config

Place the pre-configured watcher alerts rules in Grafana provisioning directory:
Place the pre-configured alerts rules in Grafana provisioning directory:

```bash
# watcher alert rules
cp monitoring-watchers-deployment/config/monitoring/watcher-alert-rules.yml monitoring-watchers-deployment/config/monitoring/grafana/provisioning/alerting/

# subgraph alert rules
cp monitoring-watchers-deployment/config/monitoring/subgraph-alert-rules.yml monitoring-watchers-deployment/config/monitoring/grafana/provisioning/alerting/
```

Update the alerting contact points config (`monitoring-watchers-deployment/config/monitoring/grafana/provisioning/alerting/contactpoints.yml`) with desired contact points

Add corresponding routes to the notification policies config (`monitoring-watchers-deployment/monitoring/grafana/provisioning/alerting/policies.yaml`) with appropriate object-matchers:
Add corresponding routes to the notification policies config (`monitoring-watchers-deployment/config/monitoring/grafana/provisioning/alerting/policies.yml`) with appropriate object-matchers:

```yml
...
routes:
- receiver: SlackNotifier
object_matchers:
# Add matchers below
- ['grafana_folder', '=', 'WatcherAlerts']
- ['grafana_folder', '=~', 'WatcherAlerts|SubgraphAlerts']
```

### Env
Expand All @@ -149,6 +159,9 @@ Set the following env variables in the deployment env config file (`monitoring-w
# Grafana server host URL to be used
# (Optional, default: http://localhost:3000)
GF_SERVER_ROOT_URL=

# List of subgraph ids to configure alerts for (separated by |)
CERC_GRAFANA_ALERTS_SUBGRAPH_IDS=
```

## Start the stack
Expand Down

0 comments on commit 8f2da38

Please sign in to comment.