Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial sync of blocks gets stuck #23

Closed
gmale opened this issue Apr 8, 2019 · 8 comments
Closed

Initial sync of blocks gets stuck #23

gmale opened this issue Apr 8, 2019 · 8 comments

Comments

@gmale
Copy link
Contributor

gmale commented Apr 8, 2019

During initialization, the CompactBlockProcessor sometimes gets stuck while processing compact blocks. This seems to be related to missing data on the server but the root cause is not 100% verified.

We should either fix the root cause or improve the behavior so that errors are better handled.

Summary of findings:

Issue Notes
Slow scans Related to a sanity check that can be removed
lightwalletd gets stuck Created zcash/lightwalletd#32 to track. Needs additional debugging.
Reorgs Impacts all layers. Needs changes in compact block spec.
Error Handling SDK can handle errors better by splitting processor into writer and scanner
@gmale
Copy link
Contributor Author

gmale commented Apr 10, 2019

Update: Error seems to be from slow scanning.

  • I've ruled out missing blocks from the server as a culprit
    • by rebuilding the lightwalletd server to remove that error (dev-infra team did the repair)
  • I've ruled out thread deadlock
    • first, the code doesn't support this theory. Closer inspection showed that we pause downloading while scanning because it's done sequentially in the same coroutine.
    • second, after adding logs to the scan logic to see if/where it paused, it turned out not to pause at all. Instead, it was taking over an hour to complete a scan, which is a major departure from previous behavior.

@gmale
Copy link
Contributor Author

gmale commented Apr 16, 2019

Note: There seems to be a separate issue with the lightwallet server getting stuck on a block. This is something that needs to be investigated separately. I will poke around and then raise an issue on the lightwalletd repo.

@str4d
Copy link
Contributor

str4d commented Apr 16, 2019

@gmale collected logs of the scanning process, and it's obvious from them that the slow part of scanning is here:

https://github.com/str4d/librustzcash/blob/247bb00ab45e0e776989ba1587713f9f16a6bdbf/zcash_client_backend/src/sqlite.rs#L565-L588

This was a sanity check we added in #12 alongside fixing a witness-updating bug in the librustzcash code. We can amend this check to only be performed when the code is compiled in debug mode (which itself will be inherently slow), but we should also figure out why it is quite so slow. My suspicion is that it is the tree root calculations, so I will do some benchmarking of zcash/librustzcash#68 and figure out if we can make it more efficient.

@gmale
Copy link
Contributor Author

gmale commented Apr 16, 2019

Further investigation with @ianamunoz has uncovered a related issue around handling reorgs:
zcash/lightwalletd#32

@gmale
Copy link
Contributor Author

gmale commented Apr 16, 2019

Next Step: verify that the workaround discussed with @str4d works. Also, tweak lightwalletd to handle reorgs better. Then retest.

@gmale
Copy link
Contributor Author

gmale commented Apr 23, 2019

Update: Confirmed workaround to speed up scanning is highly effective (roughly 35X faster), making client more functional but that's rendered moot if the server continues to get stuck after reorgs.

Next Step: implement re-org handling at all layers. Consider closing this issue and opening more focused ones for the remaining work.

@mms710
Copy link

mms710 commented Apr 24, 2019

Issues opened for the other components of this ticket:
Error Handling: #25
Reorgs: #26, zcash/librustzcash#75
Slow scans:

When these three issues, zcash/lightwalletd#32, and fixing reorgs in librustzcash and lightwalletd are closed, we can close this ticket.

@gmale
Copy link
Contributor Author

gmale commented Sep 12, 2019

Addressed in #39

@gmale gmale closed this as completed Sep 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants