New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make graph-node
tolerate chains not being available during startup
#3937
Comments
Sounds like a great improvement. Please also expose this knowledge via Prometheus monitoring wherever possible/reasonable. |
I just filed #4115 which seems possibly related to this. Does graph-node mark a chain as not working if a firehose provider goes down after successful startup? |
In addition to toleration of chains not being available during startup, it would be amazing to tolerate chains which go down during while the graph-node is running. Even better I think would be tolerating individual RPC/Firehose providers not being available instead of marking a chain as dead if a single provider is down. |
Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it. |
#4754 should help with firehose providers, by allowing them to retry for 30 secs before giving up. |
Right now, when
graph-node
starts, it does some checks against each RPC endpoint; if those checks fail or time out,graph-node
will mark the chain as not working and not use it anymore. That will also make all subgraphs that use that chain fail at startup. If the problem with the endpoint is transient, the only way to getgraph-node
to use it again is to restart it (at the danger that now some other chain has a transient issue)The code needs to be changed such that
graph-node
is much more tolerant to such transient issues and automatically retries using a chain that caused trouble during startup. As part of solving the issue, we should also produce documentation that describes what is expected of an endpoint before we will use it, and agraphman
command that allows checking any given endpoint by going through its startup sequence. Additionally, there needs to be some way to figure out which of the configured endpointsgraph-node
considers usable/not usable at any given point in time.The text was updated successfully, but these errors were encountered: