Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make graph-node tolerate chains not being available during startup #3937

Open
lutter opened this issue Sep 14, 2022 · 7 comments · May be fixed by #5337
Open

Make graph-node tolerate chains not being available during startup #3937

lutter opened this issue Sep 14, 2022 · 7 comments · May be fixed by #5337
Assignees

Comments

@lutter
Copy link
Collaborator

lutter commented Sep 14, 2022

Right now, when graph-node starts, it does some checks against each RPC endpoint; if those checks fail or time out, graph-node will mark the chain as not working and not use it anymore. That will also make all subgraphs that use that chain fail at startup. If the problem with the endpoint is transient, the only way to get graph-node to use it again is to restart it (at the danger that now some other chain has a transient issue)

The code needs to be changed such that graph-node is much more tolerant to such transient issues and automatically retries using a chain that caused trouble during startup. As part of solving the issue, we should also produce documentation that describes what is expected of an endpoint before we will use it, and a graphman command that allows checking any given endpoint by going through its startup sequence. Additionally, there needs to be some way to figure out which of the configured endpoints graph-node considers usable/not usable at any given point in time.

@matthewdarwin
Copy link

Sounds like a great improvement.

Please also expose this knowledge via Prometheus monitoring wherever possible/reasonable.

@paymog
Copy link

paymog commented Oct 27, 2022

I just filed #4115 which seems possibly related to this. Does graph-node mark a chain as not working if a firehose provider goes down after successful startup?

@paymog
Copy link

paymog commented Oct 31, 2022

In addition to toleration of chains not being available during startup, it would be amazing to tolerate chains which go down during while the graph-node is running. Even better I think would be tolerating individual RPC/Firehose providers not being available instead of marking a chain as dead if a single provider is down.

@github-actions
Copy link

Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.

@leoyvens
Copy link
Collaborator

leoyvens commented Jul 13, 2023

#4754 should help with firehose providers, by allowing them to retry for 30 secs before giving up.

@paymog
Copy link

paymog commented Jul 13, 2023

@leoyvens does #4754 also help with #4323?

EDIT: and if so, can/should we make the 30 seconds configurable?

@azf20
Copy link
Contributor

azf20 commented Aug 3, 2023

@paymog I think it will help, but this is more targeted #4778

mangas added a commit that referenced this issue Apr 11, 2024
@mangas mangas linked a pull request Apr 11, 2024 that will close this issue
mangas added a commit that referenced this issue Apr 11, 2024
mangas added a commit that referenced this issue Apr 19, 2024
mangas added a commit that referenced this issue Apr 22, 2024
mangas added a commit that referenced this issue May 8, 2024
mangas added a commit that referenced this issue May 10, 2024
mangas added a commit that referenced this issue May 13, 2024
mangas added a commit that referenced this issue May 17, 2024
mangas added a commit that referenced this issue May 20, 2024
mangas added a commit that referenced this issue May 22, 2024
mangas added a commit that referenced this issue May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants