Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange block decision behavior after forging block #2573

Closed
brouwerQ opened this issue Apr 4, 2021 · 4 comments
Closed

Strange block decision behavior after forging block #2573

brouwerQ opened this issue Apr 4, 2021 · 4 comments

Comments

@brouwerQ
Copy link

brouwerQ commented Apr 4, 2021

I'm not sure what is going on here, but today my block producing node forged a block, my relay node sees it immediately, but it says FetchDeclineChainNotPlausible. Nine seconds later, it receives a block with the same block number as mine forged by someone else and only then my block is added to my relay's version of the chain. Then a fork is recognized and later on the chain with the other block got choosen. I can see the orphaned block on PoolTool and on some block explorers it was briefly visible before getting replaced by the other, thus other nodes did pick it up, but I guess only after that nine seconds have passed.

Why did my relay node waited before receiving the other block before adding it to its version of the chain?

Also, I do see that my relay node uses '10:59:27.99' as log time where the block is forged on the block producing node at '10:59:28.01' (slot time was 10:59:28), so the relay node's clock lags a little bit behind, but that cannot be a problem I guess?

Here the (somewhat annotated) logs from my relay:

RELAY FETCHING MY BLOCK FROM MY CORE NODE (FetchDeclineChainNotPlausible): Apr 04 12:59:27 nimue-relay1 bash[1332]: [nimue-re:cardano.node.BlockFetchDecision:Info:58] [2021-04-04 10:59:27.99 UTC] [TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35611, remoteAddress = MY_CORE_LOCAL_IP:2770}) (Right [At (Block {blockPointSlot = SlotNo 25967677, blockPointHash = e6dfb148ab2d8bd607985d4c32465ad0a0fae4b833ffb76f013c5d6447ef65cd})]),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:33119, remoteAddress = 34.66.64.195:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:33843, remoteAddress = 75.32.204.221:6001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:34899, remoteAddress = 34.80.179.174:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35439, remoteAddress = 3.126.142.56:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35903, remoteAddress = 15.237.129.233:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:36867, remoteAddress = 78.23.133.211:3022}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:37573, remoteAddress = 178.128.196.58:5001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:39737, remoteAddress = 73.129.228.216:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:40369, remoteAddress = 66.42.127.211:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:40743, remoteAddress = 178.128.187.12:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:41505, remoteAddress = 88.99.39.191:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:42321, remoteAddress = 18.159.65.241:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43093, remoteAddress = 161.97.175.144:3004}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43209, remoteAddress = 209.151.148.76:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43695, remoteAddress = 89.40.9.152:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:44037, remoteAddress = 134.209.26.2:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:46189, remoteAddress = 157.90.151.159:6003}) (Left FetchDeclineChainNotPlausible)]

Apr 04 12:59:30 nimue-relay1 bash[1332]: [nimue-re:cardano.node:Info:18780] [2021-04-04 10:59:30.50 UTC] fromList [("time(ps)",Number 6.6131797628408e16),("kind",String "MeasureTxsTimeStart"),("mempoolNumBytes",Number 1202.0),("mempoolNumTxs",Number 3.0)]

Apr 04 12:59:30 nimue-relay1 bash[1332]: [nimue-re:cardano.node.Mempool:Info:18780] [2021-04-04 10:59:30.50 UTC] fromList [("tx",Object (fromList [("txid",String "txid: TxId {_unTxId = "5c33265d0e587315ad40c862b62fbe4e69602f924dc9c7e1df6f9b7b8f30e58a"}")])),("kind",String "TraceMempoolAddedTx"),("mempoolSize",Object (fromList [("numTxs",Number 3.0),("bytes",Number 1202.0)]))]

Apr 04 12:59:30 nimue-relay1 bash[1332]: [nimue-re:cardano.node.Mempool:Info:268] [2021-04-04 10:59:30.52 UTC] fromList [("tx",Object (fromList [("txid",String "txid: TxId {_unTxId = "5c33265d0e587315ad40c862b62fbe4e69602f924dc9c7e1df6f9b7b8f30e58a"}")])),("kind",String "TraceMempoolRejectedTx"),("mempoolSize",Object (fromList [("numTxs",Number 3.0),("bytes",Number 1202.0)])),("err",Object (fromList [("kind",String "BadInputsUTxO"),("error",String "The transaction contains inputs that do not exist in the UTxO set."),("consumed",Object (fromList [("lovelace",Number 0.0),("policies",Object (fromList []))])),("badInputs",Array [Array [String "3fa78413aba48f4a3a3749c64349366aa714300ba076689a56d16cb2a109d854",Number 0.0]]),("produced",Object (fromList [("lovelace",Number 1.054774257e9),("policies",Object (fromList []))]))]))]

Apr 04 12:59:30 nimue-relay1 bash[1332]: [nimue-re:cardano.node.Mempool:Info:14375] [2021-04-04 10:59:30.65 UTC] fromList [("tx",Object (fromList [("txid",String "txid: TxId {_unTxId = "5c33265d0e587315ad40c862b62fbe4e69602f924dc9c7e1df6f9b7b8f30e58a"}")])),("kind",String "TraceMempoolRejectedTx"),("mempoolSize",Object (fromList [("numTxs",Number 3.0),("bytes",Number 1202.0)])),("err",Object (fromList [("kind",String "BadInputsUTxO"),("error",String "The transaction contains inputs that do not exist in the UTxO set."),("consumed",Object (fromList [("lovelace",Number 0.0),("policies",Object (fromList []))])),("badInputs",Array [Array [String "3fa78413aba48f4a3a3749c64349366aa714300ba076689a56d16cb2a109d854",Number 0.0]]),("produced",Object (fromList [("lovelace",Number 1.054774257e9),("policies",Object (fromList []))]))]))]

Apr 04 12:59:30 nimue-relay1 bash[1332]: [nimue-re:cardano.node.Mempool:Info:10579] [2021-04-04 10:59:30.65 UTC] fromList [("tx",Object (fromList [("txid",String "txid: TxId {_unTxId = "5c33265d0e587315ad40c862b62fbe4e69602f924dc9c7e1df6f9b7b8f30e58a"}")])),("kind",String "TraceMempoolRejectedTx"),("mempoolSize",Object (fromList [("numTxs",Number 3.0),("bytes",Number 1202.0)])),("err",Object (fromList [("kind",String "BadInputsUTxO"),("error",String "The transaction contains inputs that do not exist in the UTxO set."),("consumed",Object (fromList [("lovelace",Number 0.0),("policies",Object (fromList []))])),("badInputs",Array [Array [String "3fa78413aba48f4a3a3749c64349366aa714300ba076689a56d16cb2a109d854",Number 0.0]]),("produced",Object (fromList [("lovelace",Number 1.054774257e9),("policies",Object (fromList []))]))]))]

RELAY SEES OTHER BLOCK WITH SAME BLOCK NO AS MINE: Apr 04 12:59:37 nimue-relay1 bash[1332]: [nimue-re:cardano.node.BlockFetchDecision:Info:58] [2021-04-04 10:59:37.24 UTC] [TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43209, remoteAddress = 209.151.148.76:3001}) (Right [At (Block {blockPointSlot = SlotNo 25967686, blockPointHash = 558241f6312c1ed94dcefad7ba2b2038cfae16fd1d3468a4a57f1303bd985a76})]),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:33119, remoteAddress = 34.66.64.195:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:33843, remoteAddress = 75.32.204.221:6001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:34899, remoteAddress = 34.80.179.174:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35439, remoteAddress = 3.126.142.56:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35611, remoteAddress = MY_CORE_LOCAL_IP:2770}) (Left FetchDeclineAlreadyFetched),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35903, remoteAddress = 15.237.129.233:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:36867, remoteAddress = 78.23.133.211:3022}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:37573, remoteAddress = 178.128.196.58:5001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:39737, remoteAddress = 73.129.228.216:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:40369, remoteAddress = 66.42.127.211:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:40743, remoteAddress = 178.128.187.12:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:41505, remoteAddress = 88.99.39.191:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:42321, remoteAddress = 18.159.65.241:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43093, remoteAddress = 161.97.175.144:3004}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43695, remoteAddress = 89.40.9.152:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:44037, remoteAddress = 134.209.26.2:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:46189, remoteAddress = 157.90.151.159:6003}) (Left FetchDeclineChainNotPlausible)]

Apr 04 12:59:37 nimue-relay1 bash[1332]: [nimue-re:cardano.node.BlockFetchDecision:Info:58] [2021-04-04 10:59:37.29 UTC] [TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:39737, remoteAddress = 73.129.228.216:3001}) (Left (FetchDeclineConcurrencyLimit FetchModeDeadline 1)),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:33119, remoteAddress = 34.66.64.195:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:33843, remoteAddress = 75.32.204.221:6001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:34899, remoteAddress = 34.80.179.174:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35439, remoteAddress = 3.126.142.56:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35611, remoteAddress = MY_CORE_LOCAL_IP:2770}) (Left FetchDeclineAlreadyFetched),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:35903, remoteAddress = 15.237.129.233:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:36867, remoteAddress = 78.23.133.211:3022}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:37573, remoteAddress = 178.128.196.58:5001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:40369, remoteAddress = 66.42.127.211:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:40743, remoteAddress = 178.128.187.12:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:41505, remoteAddress = 88.99.39.191:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:42321, remoteAddress = 18.159.65.241:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43093, remoteAddress = 161.97.175.144:3004}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43209, remoteAddress = 209.151.148.76:3001}) (Left FetchDeclineInFlightThisPeer),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:43695, remoteAddress = 89.40.9.152:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:44037, remoteAddress = 134.209.26.2:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = MY_RELAY_LOCAL_IP:46189, remoteAddress = 157.90.151.159:6003}) (Left FetchDeclineChainNotPlausible)]

RELAY EXTENDING CHAIN WITH MY BLOCK: Apr 04 12:59:37 nimue-relay1 bash[1332]: [nimue-re:cardano.node.ChainDB:Notice:48] [2021-04-04 10:59:37.35 UTC] Chain extended, new tip: e6dfb148ab2d8bd607985d4c32465ad0a0fae4b833ffb76f013c5d6447ef65cd at slot 25967677

Apr 04 12:59:37 nimue-relay1 bash[1332]: [nimue-re:cardano.node.ChainDB:Notice:48] [2021-04-04 10:59:37.35 UTC] before next, messages elided = 66138644376224

RELAY EXTENDING CHAIN WITH MY BLOCK (REPEAT): Apr 04 12:59:37 nimue-relay1 bash[1332]: [nimue-re:cardano.node.ChainDB:Notice:48] [2021-04-04 10:59:37.35 UTC] Chain extended, new tip: e6dfb148ab2d8bd607985d4c32465ad0a0fae4b833ffb76f013c5d6447ef65cd at slot 25967677

FORK RECOGNIZED: Apr 04 12:59:37 nimue-relay1 bash[1332]: [nimue-re:cardano.node.ChainDB:Info:48] [2021-04-04 10:59:37.35 UTC] Block fits onto some fork: 558241f6312c1ed94dcefad7ba2b2038cfae16fd1d3468a4a57f1303bd985a76 at slot 25967686

@brouwerQ
Copy link
Author

brouwerQ commented Apr 6, 2021

Update: this seems to be a one-time event, less than two hours ago we minted another block that was propagated immediately around the network.

Things changed in our nodes since the problem Sunday:

  • Installed Chrony on core and relay node for better time synchronization.
  • Disabling TraceMempool on relay node; CPU load dropped from 40-60% to about 1%.

@dcoutts
Copy link
Contributor

dcoutts commented Apr 18, 2021

Here's the context for where we log FetchDeclineChainNotPlausible:

-- | Keep only those candidate chains that are preferred over the current
-- chain. Typically, this means that their length is longer than the length of
-- the current chain.
--
filterPlausibleCandidates
  :: (AnchoredFragment block -> AnchoredFragment header -> Bool)
  -> AnchoredFragment block  -- ^ The current chain
  -> [(AnchoredFragment header, peerinfo)]
  -> [(FetchDecision (AnchoredFragment header), peerinfo)]
filterPlausibleCandidates plausibleCandidateChain currentChain chains =
    [ (chain', peer)
    | (chain,  peer) <- chains
    , let chain' = do
            guard (plausibleCandidateChain currentChain chain)
              ?! FetchDeclineChainNotPlausible
            return chain
    ]

This is part of the network code so it's not supposed to know exactly what a "plausible" candidate is exactly. That knowledge is given to it via the plausibleCandidateChain function from the consensus code. But what that does for both Ouroboros classic and Praos is to check if the chain is longer than the node's current chain.

So in other words, FetchDeclineChainNotPlausible means we did not bother to fetch any blocks from this peer because its chain is not longer than our existing chain.

So if your relay made this decision about the chain from your block producer then it indicates the chain from your block producer really was not longer. You clearly think it ought to have been longer, so the thing to look at is why it was not.

@brouwerQ
Copy link
Author

But the curious thing is, the moment my relay fetched another block from another relay with the same block number as my forged block, it added my block to its chain and other relays also fetched it, because for a moment it was visible in some block explorers to disappear again. On PoolTool, it's still visible as an orphan. Also, the next block number also has two blocks, so I guess another pool extended on my block, it was received by me and added to the chain, but later reversed.

I also looked at what happened before my block and both my relay and BP had added the same previous block to its chain and no other blocks were received before mine was forged. So I don't quite get how my BP's chain could be shorter...

Can the timing issue had something to do with it? Because my BP's clock was running 20-30 ms ahead of my relay and at the moment of the FetchDeclineChainNotPlausible log on my relay, it was still 10 ms before my slot time on my relay...

Since that issue, I've succesfully minted all the (2) slots where I was elected as slot leader (I have better time sync now and also the new version).

@brouwerQ
Copy link
Author

For completion, the reason my block was hold 'hostage' for nine seconds by my relay before getting (temporally) adopted by it seems to be IntersectMBO/ouroboros-network#4251, because it arrived too early from my relay's perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants