-
Notifications
You must be signed in to change notification settings - Fork 4
Closed
Description
While investigating CheckerNetwork/node#569, I noticed that spark-evaluate logs are full of the following error messages:
2024-08-09T07:19:31Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:21:42Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:23:53Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:26:04Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:28:15Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:30:26Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:32:37Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:34:48Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:36:59Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:39:10Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:41:21Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
2024-08-09T07:43:32Z app[e2867541be3e68] cdg [info]JsonRpcProvider failed to detect network and cannot start up; retry in 1s (perhaps the URL is wrong or the node is not started)
I think that this brought down the spark-evaluate service.
How can we detect this problem and send an alert to Slack?
What higher-level metric is affected? A bunch of charts in the Internal Spark Dasboard don't show any data points after 2024-08-08 14:08.
Can we create a new metric similar to "unpublished measurements max age" but for round evaluations and trigger an alert when there is no round evaluation posted in >30 minutes?
If that's not possible, then a last-resort option is to create a Papertrail filter to detect these error messages and trigger an alert. This can be too noisy, though.
juliangruber
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
✅ done