New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cloneBlockchainTime #1506
Fix cloneBlockchainTime #1506
Conversation
@@ -0,0 +1,620 @@ | |||
{-# LANGUAGE LambdaCase #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the commits for a delta without the renaming (can't pass the git diff -M
option to GitHub's PR renderer?).
I've pushed new commits and updated the PR description, but the commit history is still just a trail of breadcrumbs along my bug hunt. When I'm back online in about 6 hours, my immediate goal is to clean up those commits. The tip of this branch has passed 50000 RealPBFT k=2 tests and 50000 RealPBFT k=3 tests and tens of thousands of more varying tests. |
403aa12
to
745a250
Compare
745a250
to
3b86ee3
Compare
That rebase cleaned up the git history, removing a lot of the code that arose during debugging.
|
-- not reach it soon enough) | ||
, let nextLeader = Ref.mkLeaderOf params $ succ jslot | ||
, jslot /= coreNodeIdJoinSlot nodeJoinPlan nextLeader || | ||
cid `elem` coreNodeIdNeighbors nodeTopology nextLeader | ||
-> pure $ NodeRestarts $ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO We also need to prevent the rekeying node from restarting too soon, since it might need to propagate its dlg cert tx more than once.
3b86ee3
to
980a641
Compare
That force push is passing tests. Remaining work:
|
e7c83b5
to
98d9f85
Compare
I rebased onto cee384a, because that's the last commit on Unfortunately, the fixes for 1544 ( I'm going to open a PR in |
I opened IntersectMBO/cardano-ledger#716, which I think is the necessary fix (it is the same as the patch I have locally on this branch). Once that's merged, then my PR can update the deps in the repo accordingly and therefore rebase all the way to our |
After rebasing onto master (now that a I'm guessing the recent changes to ChainSync client (in light of the recent refinement of tips to possibly include "origin" instead of an off-by-one blockNumber = 0") has somehow enabled this family of mini protocol message/event interleavings, which PR 1131 instead has been exploring by adding network latencies. |
98d9f85
to
5f71595
Compare
1691: consensus: handle EBBs in rewindHeaderState r=nfrisby a=nfrisby Fixes #1690. I believe this bug was masked by Issue #1489; I discovered it and confirmed that this patch fixes it on my related WIP branch. This PR does not include a repro for this bug, but that branch does, and PR #1506 will as soon as I'm able to update it with that WIP branch -- that's my current focus. Co-authored-by: Nicolas Frisby <nick.frisby@iohk.io>
48babe5
to
23a0a4b
Compare
I eventually realized that the Just-In-Time EBBs (enabled by a recent PR on master) avoid the error case that otherwise required a patch for Issue 1631 (Chain Density check in |
6178e19
to
3acec92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments.
-- | ||
-- INVARIANT the 'NonEmpty's are all ascending | ||
-- | ||
newtype Pam v k = UnsafePam {getPam :: Map v (NonEmpty k)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this to InvertedMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 3de1fc8
@@ -74,8 +73,10 @@ data TestClock = | |||
deriving (Eq, Generic, NoUnexpectedThunks) | |||
|
|||
data TestBlockchainTime m = TestBlockchainTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really ought to move this to test-infra
.
-- EBB predecessor's hash | ||
-> blk | ||
, ebbSlotBefore :: SlotNo -> SlotNo | ||
-- ^ the slot of the most recent expected that precedes the given slot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this name or this comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborated in commit 9b2aa86
-- If we add the transaction and then the mempools discards it for some | ||
-- reason, this thread will add it again. | ||
-- | ||
forkTxs0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could do with a one or two line explanation of what the purpose of it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborated in commit 183da45
|
||
-- The test infrastructure allows nodes to forge in slot 0; however, the | ||
-- cardano-ledger-specs code causes @PBFTFailure (SlotNotAfterLastBlock | ||
-- (Slot 0) (Slot 0))@ in that case. So we discard such tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do with a comment that the spec here is therefore more strict than the real implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborated in commit d0134ca
8be8d2f
to
d0134ca
Compare
OK. I addressed each comment with a commit, and now I'm going to squash those down and then |
d0134ca
to
5e8a8d2
Compare
bors r+ |
Fixes #1489. Fixes #1524.
Fixing Issue #1489 (let nodes lead when they join) and letting k vary in the
range [2 .. 10] since the dual-ledger tests do that now revealed several
Issues.
Issues in the library, not just the test infrastructure:
Issue Debug reapplyTxSameState: unexpected error: MockInvalidInputs #1505 --
removeTxs
cannot use the fast path when validating afterremoving or else we might have dangling tx inputs references. Was fixed
by Mempool.removeTxs should not use the fast path #1565.
Issue Test with
k = 1
in test-consensus ouroboros-consensus#726, bullet 1 (closed) -- TheEmpty
cases inprevPointAndBlockNo
were wrong. Recent PRs have addressed this: consensus: do not assume the anchor point is genesis in prevPointAndBlockNo #1544 andAdd
BlockNo
toAnchoredFragment
#1589.Issue ImmutableDB is leaking file handles #1543 (closed) -- A bracket in
registeredStream
was spoiled by aninterruptible operation. Thomas' PR re-designed the vulnerability away. I
think this is unrelated to the other changes; it was lurking and happened
to pop up here just because I've been running hundreds of thousands of
tests.
Issues only in the test infrastructure:
Issue Fix cloning of
TestBlockChaintime
#1489: let nodes lead when they join. This bug slipped in recently,when I added cloning of
BlockchainTime
s as part of therestarting/rekeying loop in the test infrastructure.
PBFT reference simulator. (Was masked by 1489.) Model competing 1-block
chains in Ref.PBFT and use its results where applicable instead of the
.Expectations module. Check that the PBFT threadnet and Ref.PBFT.simulate
results agree on
Ref.Nominal
slots.The Ref.PBFT module had been making assumptions that were accurate given
the guards in RealPBFT generators, given the
k >= n
regime. But outsideof that regime, when the security parameter exceeds the node count, it
wasn't enough. Also, it couldn't be compared against the PBFT threadnet
because of those assumptions.
PBFT reference simulator. (Cascade of above.) Add
definitelyEnoughBlocks
to confirm the "at least k blocks in 2k slots"invariant in
genRealPBFTNodeJoinPlan
. The existing guards in the RealPBFTgenerators are intentionally insufficient by themselves; this way we can
optimize them to avoid O(n^2) complexity without risking divergence from
the
suchThat
s.Origin corner-case. (Was masked by 1489.) Discard DualPBFT tests that
forge in slot 0. The current
cardano-ledger-specs
doesn't allow for that.My hypothesis is that
cvsLastSlot
would need to be able to representorigin.
Dlg cert tx. Adjust
genNodeRekeys
to respect "If node X rekeys in slotS and Y leads slot S+1, then either the topology must connect X and Y
directly, or Y must join before slot S."
Dlg cert tx. (Was (statistically?) masked by relatively large k.) Add
rekeyOracle
and use it to determine which epoch number to record in thedlg cert. The correct value depends on which block the dlg cert tx will end
up in, not when we first add it to our mempool.
Dlg cert tx. (Was (statistically?) masked by relatively large k.) Add the
dlg cert tx to the mempool each time the ledger changes.