-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: misc dashboards update #5738
Conversation
Performance Report✔️ no performance regression detected Full benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! I am really glad you started this process. I was thinking one of the next things I attempt is to clean up the dashboards a bit and perhaps reorganize them a bit. I will put up an issue to discuss and see how the team feels when I finish with the tasks I am working on.
@@ -599,7 +519,7 @@ | |||
"pluginVersion": "7.4.5", | |||
"targets": [ | |||
{ | |||
"expr": "delta(lodestar_bls_thread_pool_queue_job_wait_time_seconds_sum[$rate_interval])/delta(lodestar_bls_thread_pool_queue_job_wait_time_seconds_count[$rate_interval])", | |||
"expr": "rate(lodestar_bls_thread_pool_queue_job_wait_time_seconds_sum[$rate_interval])/rate(lodestar_bls_thread_pool_queue_job_wait_time_seconds_count[$rate_interval])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question. I noticed that you made this change in many places. I read the docs for both function and the major difference is the tolerance to restarts not skewing metrics. Is that the reason you made this change? Or was it directly related to how the calculation happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delta() / delta() should behave like rate() / rate(). But just delta() is very different read to just rate(). Many times I been reading a chart, think this numbers are impossible! And then found out it was a delta 😒 . For this reason I impose a strict rate() only usage across the board to keep things consistent.
"hide": false, | ||
"legendFormat": "notifyNewPayload roundtrip", | ||
"range": true, | ||
"interval": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what does the empty interval
here do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea grafana exports add so much random crap
@@ -3425,8 +3140,8 @@ | |||
"mode": "scheme", | |||
"reverse": false, | |||
"scale": "exponential", | |||
"scheme": "Oranges", | |||
"steps": 64 | |||
"scheme": "Magma", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
@@ -1059,7 +1196,7 @@ | |||
"uid": "${DS_PROMETHEUS}" | |||
}, | |||
"exemplar": false, | |||
"expr": "32*12*rate(validator_monitor_prev_epoch_on_chain_inclusion_distance_bucket [$rate_interval])", | |||
"expr": "32*12*rate(validator_monitor_prev_epoch_on_chain_inclusion_distance_bucket [$rate_interval]) - 10000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Why the scalar reduction? Would love to learn from this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this was a metric that was moved from below. Still curious about the scalar though. Will look when I load the dashboards to see how things change. Feel free to learn me somethin'!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contant substraction causes heatmaps to ignore the first row. This is very useful in histograms where the first bucket (inclusion distance 1) takes 99% of the values, but you want good resolution of the other values. Without this trick, non 1 values are dark, and with this they pop and are readable.
if (target.expr.includes("delta(")) { | ||
throw Error(`promql function 'delta' is not allowed, use 'rate' instead: ${target.expr}`); | ||
} | ||
if (target.expr.includes("increase(")) { | ||
throw Error(`promql function 'increase' is not allowed, use 'rate' instead: ${target.expr}`); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also disallow "decrease"?
if (target.expr.includes("delta(")) { | |
throw Error(`promql function 'delta' is not allowed, use 'rate' instead: ${target.expr}`); | |
} | |
if (target.expr.includes("increase(")) { | |
throw Error(`promql function 'increase' is not allowed, use 'rate' instead: ${target.expr}`); | |
} | |
const disallowed = ["delta", "increase", "decrease"]; | |
for (const disallow of disallowed) { | |
if (target.expr.includes(disallow + "(")) { | |
throw Error(`promql function '${disallow}' is not allowed, use 'rate' instead: ${target.expr}`); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't imagine anyone using decrease ever
dashboards/lodestar_vm_host.json
Outdated
@@ -1030,7 +941,7 @@ | |||
"uid": "${DS_PROMETHEUS}" | |||
}, | |||
"editorMode": "code", | |||
"expr": "nodejs_eventloop_lag_seconds", | |||
"expr": "nodejs_eventloop_lag_seconds{job=\"beacon\"}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Job name variable needs to be supported, else job name is not configurable
"expr": "nodejs_eventloop_lag_seconds{job=\"beacon\"}", | |
"expr": "nodejs_eventloop_lag_seconds{job=~\"$beacon_job|beacon\"}", |
Edit: Updated lint script to ensure this ☝️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! We should add a lint rule for it
@@ -10,6 +10,7 @@ import fs from "node:fs"; | |||
@typescript-eslint/no-unsafe-assignment, | |||
@typescript-eslint/explicit-function-return-type, | |||
@typescript-eslint/naming-convention, | |||
quotes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a conflict between eslint and prettier if you try to defined a string such as 'job=~"$beacon_job|beacon"'
, it won't let you escape double quotes \"
and eslint will complain if single quote or backtick is used.
🎉 This PR is included in v1.10.0 🎉 |
Motivation
After a debugging session I've done multiple changes to dashboards:
Plus I noticed that downloading the charts with
is not deterministic due to
fieldConfig.defaults.thresholds
sometimes addingvalue: null
sometimes not. Since we do not need those defaults, I've added a lint rule to delete all thresholds. If we ever need them we can review.