feat: add backup metrics to grafana dashboard #10521

lenaschoenburg · 2022-09-27T14:09:28Z

lenaschoenburg · 2022-09-27T14:19:28Z

monitor/grafana/zeebe.json

+      "targets": [
+        {
+          "exemplar": false,
+          "expr": "min(max_over_time(zeebe_backup_operations_in_progress{namespace=~\"$namespace\", partition=~\"$partition\", pod=~\"$pod\", operation=\"take\"}[5m])) by (partition)",


❓ This query is a bit weird. I tried to remove time series where a gauge would be stuck at a specific value. This happened during testing when a broker would transition to follower while taking a backup. Because that follower never saw the backup complete or fail it would still report an in-progress backup.

I've tried to explain the query I came up here but ultimately I'm not sure why exactly this works or even if it is correct at all 🙈

May be we should also update the in progress metrics while closing BackupService. If the node restarts, the metrics is reset anyway, I think, as observed in other metrics. Can you verify it? If it works, then we probably don't need this complex query.

In this query, it is not clear what would be the value if inprogress backup takes more than 5m. Will max_over_time query ignored it and shows the result as 0?

May be we should also update the in progress metrics while closing BackupService

Good idea. Alternatively we could update when marking the in-progress backup as failed during startup of a new BackupService.

I fixed it by resetting the in-progress counters to 0 when the BackupService actor closes. Query is adjusted too.

github-actions · 2022-09-27T14:34:24Z

Test Results

  763 files -   169   763 suites - 169 1h 50m 26s ⏱️ - 14m 33s
6 013 tests - 1 477 6 002 ✔️ - 1 478 10 💤 ±0 1 ❌ +1
6 191 runs - 1 487 6 180 ✔️ - 1 488 10 💤 ±0 1 ❌ +1

For more details on these failures, see this check.

Results for commit 388cf10. ± Comparison against base commit 751ac05.

♻️ This comment has been updated with latest results.

deepthidevaki · 2022-09-28T06:32:09Z

@oleschoenburg Could you also share the final dashboard view here? Just for reference.

monitor/grafana/zeebe.json

deepthidevaki · 2022-09-28T06:46:05Z

monitor/grafana/zeebe.json

+      "targets": [
+        {
+          "exemplar": false,
+          "expr": "min(max_over_time(zeebe_backup_operations_in_progress{namespace=~\"$namespace\", partition=~\"$partition\", pod=~\"$pod\", operation=\"take\"}[5m])) by (partition)",


May be we should also update the in progress metrics while closing BackupService. If the node restarts, the metrics is reset anyway, I think, as observed in other metrics. Can you verify it? If it works, then we probably don't need this complex query.

In this query, it is not clear what would be the value if inprogress backup takes more than 5m. Will max_over_time query ignored it and shows the result as 0?

deepthidevaki

🎉 Looks nice. One small thing - the unit of take backup latency is not clear from the graph.

lenaschoenburg · 2022-09-28T14:11:16Z

I added the correct unit for backup latency, thanks for spotting this 👍

bors r+

zeebe-bors-camunda · 2022-09-28T14:37:24Z

Build succeeded:

backport-action · 2022-09-28T14:38:39Z

Successfully created backport PR #10544 for release-8.1.0.

10544: [Backport release-8.1.0] feat: add backup metrics to grafana dashboard r=oleschoenburg a=backport-action # Description Backport of #10521 to `release-8.1.0`. relates to Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>

lenaschoenburg commented Sep 27, 2022

View reviewed changes

lenaschoenburg marked this pull request as ready for review September 27, 2022 15:02

lenaschoenburg requested a review from deepthidevaki September 27, 2022 15:02

lenaschoenburg added the backport release-8.1.0 label Sep 27, 2022

deepthidevaki reviewed Sep 28, 2022

View reviewed changes

lenaschoenburg force-pushed the os-backup-metrics branch from e0db00f to 0edd599 Compare September 28, 2022 10:56

fix: increase in-progress backup operation metric

86f4e9c

lenaschoenburg force-pushed the os-backup-metrics branch from 0edd599 to 4a9af3c Compare September 28, 2022 11:31

lenaschoenburg requested a review from deepthidevaki September 28, 2022 11:36

fix: reset backup metrics when closing service

64cf12d

lenaschoenburg force-pushed the os-backup-metrics branch from 4a9af3c to 388cf10 Compare September 28, 2022 11:43

deepthidevaki approved these changes Sep 28, 2022

View reviewed changes

feat: add backup metrics to grafana dashboard

ebe5a1b

lenaschoenburg force-pushed the os-backup-metrics branch from 388cf10 to ebe5a1b Compare September 28, 2022 14:10

zeebe-bors-camunda bot merged commit 7a1b945 into main Sep 28, 2022

zeebe-bors-camunda bot deleted the os-backup-metrics branch September 28, 2022 14:37

backport-action mentioned this pull request Sep 28, 2022

[Backport release-8.1.0] feat: add backup metrics to grafana dashboard #10544

Merged

deepthidevaki mentioned this pull request Sep 29, 2022

Zeebe can backup its data to an external storage without downtime and restore from it #9606

Closed

58 tasks

Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add backup metrics to grafana dashboard #10521

feat: add backup metrics to grafana dashboard #10521

lenaschoenburg commented Sep 27, 2022 •

edited

Loading

lenaschoenburg Sep 27, 2022

deepthidevaki Sep 28, 2022

lenaschoenburg Sep 28, 2022

lenaschoenburg Sep 28, 2022

github-actions bot commented Sep 27, 2022 •

edited

Loading

deepthidevaki commented Sep 28, 2022

deepthidevaki Sep 28, 2022

deepthidevaki left a comment

lenaschoenburg commented Sep 28, 2022

zeebe-bors-camunda bot commented Sep 28, 2022

backport-action commented Sep 28, 2022

feat: add backup metrics to grafana dashboard #10521

feat: add backup metrics to grafana dashboard #10521

Conversation

lenaschoenburg commented Sep 27, 2022 • edited Loading

lenaschoenburg Sep 27, 2022

Choose a reason for hiding this comment

deepthidevaki Sep 28, 2022

Choose a reason for hiding this comment

lenaschoenburg Sep 28, 2022

Choose a reason for hiding this comment

lenaschoenburg Sep 28, 2022

Choose a reason for hiding this comment

github-actions bot commented Sep 27, 2022 • edited Loading

Test Results

deepthidevaki commented Sep 28, 2022

deepthidevaki Sep 28, 2022

Choose a reason for hiding this comment

deepthidevaki left a comment

Choose a reason for hiding this comment

lenaschoenburg commented Sep 28, 2022

zeebe-bors-camunda bot commented Sep 28, 2022

backport-action commented Sep 28, 2022

lenaschoenburg commented Sep 27, 2022 •

edited

Loading

github-actions bot commented Sep 27, 2022 •

edited

Loading