Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rollbacks #570

Closed
kderme opened this issue Apr 2, 2021 · 5 comments
Closed

Improve rollbacks #570

kderme opened this issue Apr 2, 2021 · 5 comments
Assignees

Comments

@kderme
Copy link
Contributor

kderme commented Apr 2, 2021

There are some issues in how db-sync reacts to rollbacks.

After https://github.com/input-output-hk/cardano-db-sync/pull/499/files#diff-2700a1bfb3d0c533301dcae492070ff90357e45ac6ee16945ed800929d9a985fL253 we no longer delete ledger files in case of a rollback. This function deleteNewerLedgerStateFiles https://github.com/input-output-hk/cardano-db-sync/pull/499/files#diff-2700a1bfb3d0c533301dcae492070ff90357e45ac6ee16945ed800929d9a985fR158 that was introduced at the same pr, is only called on startup and not after a rollback. Also it takes a SlotNo, instead of a Point, which means that we may not delete a file, because there is a slot collision.

I think the issue exists even before this and it has to do with the way we use the ChainSync protocol. When we have to rollback to a point, it's possible that we don't have a ledger state snapshot. In this case we have to try to find an intersection SendMsgFindIntersect with the points we have. Currently we only send this on startup, which means the only way we can restore is to restart db-sync.

@kderme kderme self-assigned this Apr 2, 2021
@erikd
Copy link
Contributor

erikd commented Apr 3, 2021

This was definitely not a problem until just recently. I have had db-sync die three times in the last 12 hours while I have had db-sync instance running 24/7 and not seen this previously in at least 6 months.

@kderme
Copy link
Contributor Author

kderme commented Apr 3, 2021

Did you also change db-sync version recently?

@erikd
Copy link
Contributor

erikd commented Apr 3, 2021

Yes, in fact this is running from the kderme/time-ledger branch after being rebased against master.

I am syncing with a version built from master right now.

@erikd
Copy link
Contributor

erikd commented Apr 5, 2021

This is weird. I have now had a db-sync instance running from master for over 12 hours and have not once seen it fall over with this problem, but the one built from the kderme/time-ledger branch fell over 3 times in 12 hours. The weird thing is that the LedgerState file is not modified on that branch. I there for think the reason I have not hit any issue with this is simply down to luck.

@erikd
Copy link
Contributor

erikd commented Apr 6, 2021

This is weird. I have now had a db-sync instance running from master for over 24 hours and have not once seen it fall over with this problem, but the one built from the kderme/time-ledger branch fell over 3 times in 12 hours. The weird thing is that the LedgerState.hs file is not modified on that branch. I therefore think the reason I have not hit any issue with this is simply down to luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants