-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChainDB: allow EBB with same block number as the immutable block #1625
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to write a proper test for this. I have verified that this fixes the bug with a real node.
Potential testing approach: add WipeVolatileDB
command to the ChainDB q-s-m tests. Partially wiping the VolatileDB or even truncating the ImmutableDB would be better but is much harder to accomplish because of implementation details.
-- If we switch to PBFT for these tests, this case is not required anymore | ||
-- TODO: We should not make assumptions about the underlying | ||
-- ledger. We will fix this in | ||
-- <https://github.com/input-output-hk/ouroboros-network/issues/1571> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This special case is gone without #1571. Unverified: maybe we only needed it to not ignore the EBB, but we still don't adopt EBBs immediately when adding them, only when their successor is added? If that's true, then PBFT would change that, but then it is not that important to make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hang on, didn't we verify this? And it still failed? What am I getting confused with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This will have a minor conflict with my PR I guess.)
instance ModelSupportsBlock TestBlock where | ||
isEBB = const IsNotEBB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit annoying: this TestBlock lives in test-util
, so it cant implement ModelSupportsBlock
(test-consensus
) itself, hence the orphans 😞.
-- Exception: the block we're adding is the EBB with the same block | ||
-- number as the immutable tip. This can only happen when the | ||
-- VolatileDB was corrupted. | ||
not (At (blockNo hdr) == immBlockNo && cdbIsEBB hdr == IsEBB) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we introduce an olderThanK
predicate, document it with this comment, and then use that here and above rather than duplicate this logic?
-- If we switch to PBFT for these tests, this case is not required anymore | ||
-- TODO: We should not make assumptions about the underlying | ||
-- ledger. We will fix this in | ||
-- <https://github.com/input-output-hk/ouroboros-network/issues/1571> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hang on, didn't we verify this? And it still failed? What am I getting confused with?
=> NodeConfig (BlockProtocol blk) | ||
-> blk | ||
-> Model blk -> Model blk | ||
addBlock cfg blk m | ||
-- If the block is as old as the tip of the ImmutableDB, i.e. older than | ||
-- @k@, we ignore it, as we can never switch to it. TODO what about EBBs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about EBBs indeed 😠
-- If we switch to PBFT for these tests, this case is not required anymore | ||
-- TODO: We should not make assumptions about the underlying | ||
-- ledger. We will fix this in | ||
-- <https://github.com/input-output-hk/ouroboros-network/issues/1571> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This will have a minor conflict with my PR I guess.)
, not addingGenesisEBBToEmptyDB | ||
-- @k@, we ignore it, as we can never switch to it. | ||
| At (Block.blockNo blk) <= immBlockNo | ||
, not (At (Block.blockNo blk) == immBlockNo && isEBB (getHeader blk) == IsEBB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is basically the same change from the real DB to the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my last version, I reuse the new function from .ChainSel
in the model.
aa1dc7a
to
2be600b
Compare
2be600b
to
bade99b
Compare
I have created #1628 for testing this properly. Note that I have already verified it manually using |
bors r+ |
To verify whether #1625 correctly fixed the bug found in #1624, I reverted the fix from #1625 in the real implementation (but not in the model), i.e.: ```diff modified ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/ChainSel.hs @@ -224,7 +224,7 @@ addBlockSync cdb@CDB {..} BlockToAdd { blockToAdd = b, .. } = do -- ### Ignore if - | olderThanK hdr (cdbIsEBB hdr) immBlockNo -> do + | At (blockNo hdr) <= immBlockNo && blockNo hdr /= 0 -> trace $ IgnoreBlockOlderThanK (blockRealPoint b) deliverPromises curTip ``` and ran many ChainDB.StateMachine tests. Disappointingly, the tests never failed because of this bug (they did trigger a bug in the previous commit, though). After a long investigation, I discovered it was because the storage `TestBlock` uses BFT, and its `ConsensusProtocol` and `ValidateEnvelope` instances, which do not account for EBBs, *in conjunction with EBBs*. Many blocks (especially the first block or EBB) we generated that we thought were valid, were actually invalid because they didn't have the expected block number! As a consequence, we were not able to generate a scenario in which the bug manifested itself. The fix: * Use the `ModChainSel` combinator to override `compareCandidates` to be EBB-aware, like in PBFT. This approach is simpler than switching to PBFT entirely, as validation and other things are more complex in PBFT, and we want to keep the test block as simple as possible. * Override the `ValidateEnvelope` instance to be EBB-aware. Similar to Byron's instance, but where Byron *requires* the chain to start with an EBB with block number 0, we allow either such an EBB or a regular block with block number 1. With these changes, we can finally trigger the bug in <1000 tests, e.g., k = 2, chunk/epoch size irrelevant, EBBs allowed * Add and adopt regular block A fitting on genesis * Add and adopt regular block B fitting on A * Add and adopt regular block C fitting on B * Run background tasks: block A is copied to the ImmutableDB * Wipe the VolatileDB: A is now the immutable tip, B and C are gone * Add and adopt the EBB of the next epoch, fitting on A. This EBB will have the same block number as A. Before #1625, the EBB was ignored instead of adopted, with the fix it's not. Moreover, move the test config creation to the test block module.
To verify whether #1625 correctly fixed the bug found in #1624, I reverted the fix from #1625 in the real implementation (but not in the model), i.e.: ```diff modified ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/ChainSel.hs @@ -224,7 +224,7 @@ addBlockSync cdb@CDB {..} BlockToAdd { blockToAdd = b, .. } = do -- ### Ignore if - | olderThanK hdr (cdbIsEBB hdr) immBlockNo -> do + | At (blockNo hdr) <= immBlockNo && blockNo hdr /= 0 -> trace $ IgnoreBlockOlderThanK (blockRealPoint b) deliverPromises curTip ``` and ran many ChainDB.StateMachine tests. Disappointingly, the tests never failed because of this bug (they did trigger a bug in the previous commit, though). After a long investigation, I discovered it was because the storage `TestBlock` uses BFT, and its `ConsensusProtocol` and `ValidateEnvelope` instances, which do not account for EBBs, *in conjunction with EBBs*. Many blocks (especially the first block or EBB) we generated that we thought were valid, were actually invalid because they didn't have the expected block number! As a consequence, we were not able to generate a scenario in which the bug manifested itself. The fix: * Use the `ModChainSel` combinator to override `compareCandidates` to be EBB-aware, like in PBFT. This approach is simpler than switching to PBFT entirely, as validation and other things are more complex in PBFT, and we want to keep the test block as simple as possible. * Override the `ValidateEnvelope` instance to be EBB-aware. Similar to Byron's instance, but where Byron *requires* the chain to start with an EBB with block number 0, we allow either such an EBB or a regular block with block number 1. With these changes, we can finally trigger the bug in <1000 tests, e.g., k = 2, chunk/epoch size irrelevant, EBBs allowed * Add and adopt regular block A fitting on genesis * Add and adopt regular block B fitting on A * Add and adopt regular block C fitting on B * Run background tasks: block A is copied to the ImmutableDB * Wipe the VolatileDB: A is now the immutable tip, B and C are gone * Add and adopt the EBB of the next epoch, fitting on A. This EBB will have the same block number as A. Before #1625, the EBB was ignored instead of adopted, with the fix it's not. Moreover, move the test config creation to the test block module.
To verify whether #1625 correctly fixed the bug found in #1624, I reverted the fix from #1625 in the real implementation (but not in the model), i.e.: ```diff modified ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/ChainSel.hs @@ -224,7 +224,7 @@ addBlockSync cdb@CDB {..} BlockToAdd { blockToAdd = b, .. } = do -- ### Ignore if - | olderThanK hdr (cdbIsEBB hdr) immBlockNo -> do + | At (blockNo hdr) <= immBlockNo && blockNo hdr /= 0 -> trace $ IgnoreBlockOlderThanK (blockRealPoint b) deliverPromises curTip ``` and ran many ChainDB.StateMachine tests. Disappointingly, the tests never failed because of this bug (they did trigger a bug in the previous commit, though). After a long investigation, I discovered it was because the storage `TestBlock` uses BFT, and its `ConsensusProtocol` and `ValidateEnvelope` instances, which do not account for EBBs, *in conjunction with EBBs*. Many blocks (especially the first block or EBB) we generated that we thought were valid, were actually invalid because they didn't have the expected block number! As a consequence, we were not able to generate a scenario in which the bug manifested itself. The fix: * Use the `ModChainSel` combinator to override `compareCandidates` to be EBB-aware, like in PBFT. This approach is simpler than switching to PBFT entirely, as validation and other things are more complex in PBFT, and we want to keep the test block as simple as possible. * Override the `ValidateEnvelope` instance to be EBB-aware. Similar to Byron's instance, but where Byron *requires* the chain to start with an EBB with block number 0, we allow either such an EBB or a regular block with block number 1. With these changes, we can finally trigger the bug in <1000 tests, e.g., k = 2, chunk/epoch size irrelevant, EBBs allowed * Add and adopt regular block A fitting on genesis * Add and adopt regular block B fitting on A * Add and adopt regular block C fitting on B * Run background tasks: block A is copied to the ImmutableDB * Wipe the VolatileDB: A is now the immutable tip, B and C are gone * Add and adopt the EBB of the next epoch, fitting on A. This EBB will have the same block number as A. Before #1625, the EBB was ignored instead of adopted, with the fix it's not. Moreover, move the test config creation to the test block module.
Fixes #1624.