Log Reports: Slow/Stuck GrahpQL API Responses (1.2.0beta2) #9211

nholland94 · 2021-07-15T18:45:20Z

We have observed slow and/or stuck GraphQL API responses with the latest 1.2.0beta1 release. The new 1.2.0beta2 release contains new logs that we believe will help us confirm and identify the root cause of the issue.

We ask that users running the new 1.2.0beta2 release who encounter this issue to assist us by attaching logs to this issue. In order for the logs to be most helpful to us, we ask that users who encounter the issue mark the timestamp of when it occurred, and let the node run for a few minutes after the issue first occurs before recording the logs (so that we can see the behavior of the node in the period of time following the issue).

The text was updated successfully, but these errors were encountered:

jrwashburn · 2021-07-20T01:32:38Z

I don't use graphql api frequent, but have monitoring scripts that check mina client status -json every 5 minutes and evaluate the node status to decide whether to restart. This script fails when there is not a valid response from the mina client status command.

On 1.2.0beta2, in mainnet, ~2 days with no issues (always returned status, node always synced.) Today it started failing periodically - I am assuming that there is a 30 second timeout because each logged failure introduces ~30s delay vs. when expected to report. Logs from mina-status-monitor.sh should be helpful to pinpoint specific incidents for correlation to daemon logs.

I'm also running mina client export-logs every 4 hours, and have provided a sample of 3 intervals. More logs and some redundancy vs. what is necessary, but I decided to overshare. (and did not have the time to sort / isolate the failure windows - sorry!) relevant logs shared here:
https://drive.google.com/drive/folders/1AZSv-3f4GaEX9iM30GB9tPsuP5MVS_74?usp=sharing

In mina-status-monitor.2021-07-19.log, you will see the first failure occurs at:
Jul 16 22:20:11 - node in bootstraps
Jul 16 22:25:12 - node in sync
Jul 19 02:16:42 - first mina client status failure
Jul 19 05:53:43 - sidecar restarted because it didn't report as many times as expected
Jul 19 07:19:53 - periodic mina client status failures at all of the below:
Jul 19 09:01:49
Jul 19 09:42:58
Jul 19 11:19:41
Jul 19 13:10:48
Jul 19 14:02:00
Jul 19 15:33:08
Jul 19 16:49:26
Jul 19 17:50:30
Jul 19 18:36:23

deepthiskumar added this to Discuss in United Sprint Board via automation Jul 23, 2021

deepthiskumar moved this from Discuss to In progress in United Sprint Board Jul 23, 2021

deepthiskumar assigned ghost-not-in-the-shell Jul 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log Reports: Slow/Stuck GrahpQL API Responses (1.2.0beta2) #9211

Log Reports: Slow/Stuck GrahpQL API Responses (1.2.0beta2) #9211

nholland94 commented Jul 15, 2021

jrwashburn commented Jul 20, 2021

Log Reports: Slow/Stuck GrahpQL API Responses (1.2.0beta2) #9211

Log Reports: Slow/Stuck GrahpQL API Responses (1.2.0beta2) #9211

Comments

nholland94 commented Jul 15, 2021

jrwashburn commented Jul 20, 2021