Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log Reports: Slow/Stuck GrahpQL API Responses (1.2.0beta2) #9211

Open
nholland94 opened this issue Jul 15, 2021 · 1 comment
Open

Log Reports: Slow/Stuck GrahpQL API Responses (1.2.0beta2) #9211

nholland94 opened this issue Jul 15, 2021 · 1 comment
Assignees

Comments

@nholland94
Copy link
Member

We have observed slow and/or stuck GraphQL API responses with the latest 1.2.0beta1 release. The new 1.2.0beta2 release contains new logs that we believe will help us confirm and identify the root cause of the issue.

We ask that users running the new 1.2.0beta2 release who encounter this issue to assist us by attaching logs to this issue. In order for the logs to be most helpful to us, we ask that users who encounter the issue mark the timestamp of when it occurred, and let the node run for a few minutes after the issue first occurs before recording the logs (so that we can see the behavior of the node in the period of time following the issue).

@jrwashburn
Copy link

I don't use graphql api frequent, but have monitoring scripts that check mina client status -json every 5 minutes and evaluate the node status to decide whether to restart. This script fails when there is not a valid response from the mina client status command.

On 1.2.0beta2, in mainnet, ~2 days with no issues (always returned status, node always synced.) Today it started failing periodically - I am assuming that there is a 30 second timeout because each logged failure introduces ~30s delay vs. when expected to report. Logs from mina-status-monitor.sh should be helpful to pinpoint specific incidents for correlation to daemon logs.

I'm also running mina client export-logs every 4 hours, and have provided a sample of 3 intervals. More logs and some redundancy vs. what is necessary, but I decided to overshare. (and did not have the time to sort / isolate the failure windows - sorry!) relevant logs shared here:
https://drive.google.com/drive/folders/1AZSv-3f4GaEX9iM30GB9tPsuP5MVS_74?usp=sharing

In mina-status-monitor.2021-07-19.log, you will see the first failure occurs at:
Jul 16 22:20:11 - node in bootstraps
Jul 16 22:25:12 - node in sync
Jul 19 02:16:42 - first mina client status failure
Jul 19 05:53:43 - sidecar restarted because it didn't report as many times as expected
Jul 19 07:19:53 - periodic mina client status failures at all of the below:
Jul 19 09:01:49
Jul 19 09:42:58
Jul 19 11:19:41
Jul 19 13:10:48
Jul 19 14:02:00
Jul 19 15:33:08
Jul 19 16:49:26
Jul 19 17:50:30
Jul 19 18:36:23

@deepthiskumar deepthiskumar added this to Discuss in United Sprint Board via automation Jul 23, 2021
@deepthiskumar deepthiskumar moved this from Discuss to In progress in United Sprint Board Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants