Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChainDB: allow EBB with same block number as the immutable block #1625

Merged
merged 1 commit into from
Feb 12, 2020

Conversation

mrBliss
Copy link
Contributor

@mrBliss mrBliss commented Feb 11, 2020

Fixes #1624.

@mrBliss mrBliss added the consensus issues related to ouroboros-consensus label Feb 11, 2020
@mrBliss mrBliss requested a review from edsko February 11, 2020 18:31
Copy link
Contributor Author

@mrBliss mrBliss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to write a proper test for this. I have verified that this fixes the bug with a real node.

Potential testing approach: add WipeVolatileDB command to the ChainDB q-s-m tests. Partially wiping the VolatileDB or even truncating the ImmutableDB would be better but is much harder to accomplish because of implementation details.

-- If we switch to PBFT for these tests, this case is not required anymore
-- TODO: We should not make assumptions about the underlying
-- ledger. We will fix this in
-- <https://github.com/input-output-hk/ouroboros-network/issues/1571>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special case is gone without #1571. Unverified: maybe we only needed it to not ignore the EBB, but we still don't adopt EBBs immediately when adding them, only when their successor is added? If that's true, then PBFT would change that, but then it is not that important to make that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on, didn't we verify this? And it still failed? What am I getting confused with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This will have a minor conflict with my PR I guess.)

Comment on lines +102 to +106
instance ModelSupportsBlock TestBlock where
isEBB = const IsNotEBB
Copy link
Contributor Author

@mrBliss mrBliss Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit annoying: this TestBlock lives in test-util, so it cant implement ModelSupportsBlock (test-consensus) itself, hence the orphans 😞.

-- Exception: the block we're adding is the EBB with the same block
-- number as the immutable tip. This can only happen when the
-- VolatileDB was corrupted.
not (At (blockNo hdr) == immBlockNo && cdbIsEBB hdr == IsEBB) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce an olderThanK predicate, document it with this comment, and then use that here and above rather than duplicate this logic?

-- If we switch to PBFT for these tests, this case is not required anymore
-- TODO: We should not make assumptions about the underlying
-- ledger. We will fix this in
-- <https://github.com/input-output-hk/ouroboros-network/issues/1571>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on, didn't we verify this? And it still failed? What am I getting confused with?

=> NodeConfig (BlockProtocol blk)
-> blk
-> Model blk -> Model blk
addBlock cfg blk m
-- If the block is as old as the tip of the ImmutableDB, i.e. older than
-- @k@, we ignore it, as we can never switch to it. TODO what about EBBs?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about EBBs indeed 😠

-- If we switch to PBFT for these tests, this case is not required anymore
-- TODO: We should not make assumptions about the underlying
-- ledger. We will fix this in
-- <https://github.com/input-output-hk/ouroboros-network/issues/1571>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This will have a minor conflict with my PR I guess.)

, not addingGenesisEBBToEmptyDB
-- @k@, we ignore it, as we can never switch to it.
| At (Block.blockNo blk) <= immBlockNo
, not (At (Block.blockNo blk) == immBlockNo && isEBB (getHeader blk) == IsEBB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is basically the same change from the real DB to the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my last version, I reuse the new function from .ChainSel in the model.

@mrBliss
Copy link
Contributor Author

mrBliss commented Feb 12, 2020

I have created #1628 for testing this properly. Note that I have already verified it manually using cardano-node (I discovered this bug when I wiped the VolatileDB of a cardano-node).

@mrBliss
Copy link
Contributor Author

mrBliss commented Feb 12, 2020

bors r+

iohk-bors bot added a commit that referenced this pull request Feb 12, 2020
1625: ChainDB: allow EBB with same block number as the immutable block r=mrBliss a=mrBliss

Fixes #1624.

Co-authored-by: Thomas Winant <thomas@well-typed.com>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Feb 12, 2020

@iohk-bors iohk-bors bot merged commit bade99b into master Feb 12, 2020
@iohk-bors iohk-bors bot deleted the mrBliss/fix-1624 branch February 12, 2020 16:54
mrBliss added a commit that referenced this pull request Mar 24, 2020
Fixes #1628.

Add the `WipeVolDB` which deletes the entire VolatileDB. This tests the
bug (#1624) that was fixed in #1625.

We now remember in the model which blocks got copied to the ImmutableDB,
because they should survive a `WipeVolDB` command.
mrBliss added a commit that referenced this pull request Mar 24, 2020
To verify whether #1625 correctly fixed the bug found in #1624, I reverted the
fix from #1625 in the real implementation (but not in the model), i.e.:

```diff
modified   ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/ChainSel.hs
@@ -224,7 +224,7 @@ addBlockSync cdb@CDB {..} BlockToAdd { blockToAdd = b, .. } = do

     -- ### Ignore
     if
-      | olderThanK hdr (cdbIsEBB hdr) immBlockNo -> do
+      | At (blockNo hdr) <= immBlockNo && blockNo hdr /= 0 ->
         trace $ IgnoreBlockOlderThanK (blockRealPoint b)
         deliverPromises curTip
```

and ran many ChainDB.StateMachine tests. Disappointingly, the tests never
failed because of this bug (they did trigger a bug in the previous commit,
though).

After a long investigation, I discovered it was because the storage
`TestBlock` uses BFT, and its `ConsensusProtocol` and `ValidateEnvelope`
instances, which do not account for EBBs, *in conjunction with EBBs*. Many
blocks (especially the first block or EBB) we generated that we thought were
valid, were actually invalid because they didn't have the expected block
number! As a consequence, we were not able to generate a scenario in which the
bug manifested itself.

The fix:

* Use the `ModChainSel` combinator to override `compareCandidates` to be
  EBB-aware, like in PBFT. This approach is simpler than switching to PBFT
  entirely, as validation and other things are more complex in PBFT, and we
  want to keep the test block as simple as possible.
* Override the `ValidateEnvelope` instance to be EBB-aware. Similar to Byron's
  instance, but where Byron *requires* the chain to start with an EBB with block
  number 0, we allow either such an EBB or a regular block with block number
  1.

With these changes, we can finally trigger the bug in <1000 tests, e.g.,

k = 2, chunk/epoch size irrelevant, EBBs allowed
* Add and adopt regular block A fitting on genesis
* Add and adopt regular block B fitting on A
* Add and adopt regular block C fitting on B
* Run background tasks: block A is copied to the ImmutableDB
* Wipe the VolatileDB: A is now the immutable tip, B and C are gone
* Add and adopt the EBB of the next epoch, fitting on A. This EBB will have
  the same block number as A. Before #1625, the EBB was ignored instead of
  adopted, with the fix it's not.

Moreover, move the test config creation to the test block module.
mrBliss added a commit that referenced this pull request Mar 31, 2020
Fixes #1628.

Add the `WipeVolDB` which deletes the entire VolatileDB. This tests the
bug (#1624) that was fixed in #1625.

We now remember in the model which blocks got copied to the ImmutableDB,
because they should survive a `WipeVolDB` command.
mrBliss added a commit that referenced this pull request Mar 31, 2020
To verify whether #1625 correctly fixed the bug found in #1624, I reverted the
fix from #1625 in the real implementation (but not in the model), i.e.:

```diff
modified   ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/ChainSel.hs
@@ -224,7 +224,7 @@ addBlockSync cdb@CDB {..} BlockToAdd { blockToAdd = b, .. } = do

     -- ### Ignore
     if
-      | olderThanK hdr (cdbIsEBB hdr) immBlockNo -> do
+      | At (blockNo hdr) <= immBlockNo && blockNo hdr /= 0 ->
         trace $ IgnoreBlockOlderThanK (blockRealPoint b)
         deliverPromises curTip
```

and ran many ChainDB.StateMachine tests. Disappointingly, the tests never
failed because of this bug (they did trigger a bug in the previous commit,
though).

After a long investigation, I discovered it was because the storage
`TestBlock` uses BFT, and its `ConsensusProtocol` and `ValidateEnvelope`
instances, which do not account for EBBs, *in conjunction with EBBs*. Many
blocks (especially the first block or EBB) we generated that we thought were
valid, were actually invalid because they didn't have the expected block
number! As a consequence, we were not able to generate a scenario in which the
bug manifested itself.

The fix:

* Use the `ModChainSel` combinator to override `compareCandidates` to be
  EBB-aware, like in PBFT. This approach is simpler than switching to PBFT
  entirely, as validation and other things are more complex in PBFT, and we
  want to keep the test block as simple as possible.
* Override the `ValidateEnvelope` instance to be EBB-aware. Similar to Byron's
  instance, but where Byron *requires* the chain to start with an EBB with block
  number 0, we allow either such an EBB or a regular block with block number
  1.

With these changes, we can finally trigger the bug in <1000 tests, e.g.,

k = 2, chunk/epoch size irrelevant, EBBs allowed
* Add and adopt regular block A fitting on genesis
* Add and adopt regular block B fitting on A
* Add and adopt regular block C fitting on B
* Run background tasks: block A is copied to the ImmutableDB
* Wipe the VolatileDB: A is now the immutable tip, B and C are gone
* Add and adopt the EBB of the next epoch, fitting on A. This EBB will have
  the same block number as A. Before #1625, the EBB was ignored instead of
  adopted, with the fix it's not.

Moreover, move the test config creation to the test block module.
mrBliss added a commit that referenced this pull request Apr 1, 2020
Fixes #1628.

Add the `WipeVolDB` which deletes the entire VolatileDB. This tests the
bug (#1624) that was fixed in #1625.

We now remember in the model which blocks got copied to the ImmutableDB,
because they should survive a `WipeVolDB` command.
mrBliss added a commit that referenced this pull request Apr 1, 2020
To verify whether #1625 correctly fixed the bug found in #1624, I reverted the
fix from #1625 in the real implementation (but not in the model), i.e.:

```diff
modified   ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/ChainSel.hs
@@ -224,7 +224,7 @@ addBlockSync cdb@CDB {..} BlockToAdd { blockToAdd = b, .. } = do

     -- ### Ignore
     if
-      | olderThanK hdr (cdbIsEBB hdr) immBlockNo -> do
+      | At (blockNo hdr) <= immBlockNo && blockNo hdr /= 0 ->
         trace $ IgnoreBlockOlderThanK (blockRealPoint b)
         deliverPromises curTip
```

and ran many ChainDB.StateMachine tests. Disappointingly, the tests never
failed because of this bug (they did trigger a bug in the previous commit,
though).

After a long investigation, I discovered it was because the storage
`TestBlock` uses BFT, and its `ConsensusProtocol` and `ValidateEnvelope`
instances, which do not account for EBBs, *in conjunction with EBBs*. Many
blocks (especially the first block or EBB) we generated that we thought were
valid, were actually invalid because they didn't have the expected block
number! As a consequence, we were not able to generate a scenario in which the
bug manifested itself.

The fix:

* Use the `ModChainSel` combinator to override `compareCandidates` to be
  EBB-aware, like in PBFT. This approach is simpler than switching to PBFT
  entirely, as validation and other things are more complex in PBFT, and we
  want to keep the test block as simple as possible.
* Override the `ValidateEnvelope` instance to be EBB-aware. Similar to Byron's
  instance, but where Byron *requires* the chain to start with an EBB with block
  number 0, we allow either such an EBB or a regular block with block number
  1.

With these changes, we can finally trigger the bug in <1000 tests, e.g.,

k = 2, chunk/epoch size irrelevant, EBBs allowed
* Add and adopt regular block A fitting on genesis
* Add and adopt regular block B fitting on A
* Add and adopt regular block C fitting on B
* Run background tasks: block A is copied to the ImmutableDB
* Wipe the VolatileDB: A is now the immutable tip, B and C are gone
* Add and adopt the EBB of the next epoch, fitting on A. This EBB will have
  the same block number as A. Before #1625, the EBB was ignored instead of
  adopted, with the fix it's not.

Moreover, move the test config creation to the test block module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus issues related to ouroboros-consensus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chain DB can't add the next EBB when the VolatileDB is empty
3 participants