Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect leader_vrf_0 values in chain table? #19

Closed
TerminadaPool opened this issue Sep 29, 2022 · 16 comments
Closed

Incorrect leader_vrf_0 values in chain table? #19

TerminadaPool opened this issue Sep 29, 2022 · 16 comments

Comments

@TerminadaPool
Copy link

Describe the bug
Chain table format for leader_vrf_0 values has changed in recent versions and now these values appear to be incorrect as they no longer correlate with the slot battle winner.

To Reproduce
Query the chain table for slot battles where the winner had a higher leader_vrf_0 value
Eg:

sqlite3 /var/lib/cardano/cncli.db 'select winner.block_number, winner.slot_number, winner.pool_id, winner.hash, winner.leader_vrf_0, loser.leader_vrf_0 from chain winner inner join chain loser on winner.slot_number = loser.slot_number and winner.block_number = loser.block_number and winner.orphaned=0 and loser.orphaned=1 and winner.leader_vrf_0 > loser.leader_vrf_0 order by winner.block_number;'

Note that the winner should have a lower leader_vrf_0 value. The above query should therefore retrieve cases where the slot battle subsequently became a height battle. This happens rarely when another node builds the next block on top of the block that had the higher leader_vrf_0 value because it hadn't received the other block with the lower leader_vrf_0 value yet. Thus what should have been a slot battle becomes a height battle and is subsequently resolved by the longest chain rule. This should happen rarely and can be confirmed by review of log files on a node.

I noticed that the above query returns a lot more instances in more recent times. However, upon reviewing log files all these instances are simple slot battles which should have been resolved in favour of the block with the lower leader_vrf_0 score.

Since around slot 72236516 the leader_vrf_0 value hex string is now much shorter. In recent versions of cncli this shorter hex leader_vrf_0 value no longer appears to predict the slot battle winner.

Expected behavior
According to Duncan Coutts' statement at IntersectMBO/ouroboros-network#2913 (comment) slot battles should be won by the block that has the lower leader vrf score.

Additional context
It looks like the format of the chain table leader_vrf_0 values changed in recent versions of cncli. Are these values still correct?

@TerminadaPool
Copy link
Author

@AndrewWestberg would you please describe how the leader vrf value is calculated for a block or how one can independently verify it?

@AndrewWestberg
Copy link
Collaborator

@TerminadaPool I'll check into this next week, but I've done a video on the babbage changes here: https://www.youtube.com/watch?v=SGNAsfsVr6Q

@TerminadaPool
Copy link
Author

@AndrewWestberg Thanks. I watch all of your excellent videos and I just watched that one again. I wish I could understand Rust and Haskell better to be more helpful.

But maybe I can help with some investigative work. Here is a query to retrieve the last 5 slot battles where the winner unexpectedly had a higher leader_vrf_0 value:

sqlite3 /var/lib/cardano/cncli.db 'select winner.block_number, winner.slot_number, winner.pool_id, winner.hash, winner.leader_vrf_0, loser.leader_vrf_0 from chain winner inner join chain loser on winner.slot_number = loser.slot_number and winner.block_number = loser.block_number and winner.orphaned=0 and loser.orphaned=1 and winner.leader_vrf_0 > loser.leader_vrf_0 order by winner.block_number desc limit 5;'

Here is what my logs show for the latest slot battle:

Sep 30 07:33:03 relay1 cardano-node[995341]: [relay1:cardano.node.ChainDB:Notice:169] [2022-09-29 21:33:03.70 UTC] Chain extended, new tip: 7e509ed514b0a5ca12cbc2efa7b9b2a3c995f3fecd9cd77a3577b3b6edcc89d9 at slot 72920892
Sep 30 07:33:03 relay1 cardano-node[995341]: [relay1:cardano.node.ChainDB:Info:169] [2022-09-29 21:33:03.79 UTC] Block fits onto some fork: 9ae2d5b41718c4f132e59d2450a12455c89ab7b3ece15cd28bd6ac364cd7b742 at slot 72920892

Sep 30 07:33:03 relay1 cardano-node[995341]: [relay1:cardano.node.ChainDB:Notice:169] [2022-09-29 21:33:03.82 UTC] Switched to a fork, new tip: 9ae2d5b41718c4f132e59d2450a12455c89ab7b3ece15cd28bd6ac364cd7b742 at slot 72920892

And the second latest slot battle:

Sep 30 07:26:52 relay1 cardano-node[995341]: [relay1:cardano.node.ChainDB:Notice:169] [2022-09-29 21:26:52.22 UTC] Chain extended, new tip: 42a021dddc771a4b210840e0bf8e4270f1c4fda769a0813e0ea3f72a44fe805c at slot 72920520
Sep 30 07:26:52 relay1 cardano-node[995341]: [relay1:cardano.node.ChainDB:Info:169] [2022-09-29 21:26:52.44 UTC] Block fits onto some fork: 5e636930f202839fcddf0447e4df28a65d590439fb8ea76c2a27652a8e6b3861 at slot 72920520

Sep 30 07:26:52 relay1 cardano-node[995341]: [relay1:cardano.node.ChainDB:Notice:169] [2022-09-29 21:26:52.53 UTC] Switched to a fork, new tip: 5e636930f202839fcddf0447e4df28a65d590439fb8ea76c2a27652a8e6b3861 at slot 72920520

In both cases the second block was received only a few milliseconds (9,22) later and it was immediately preferred despite this second block having a higher leader_vrf_0 value in the cncli database.

Possible explanations:

  1. The leader_vrf_0 values stored in cncli database are not the same as the leader vrf values the node is using for comparison.
  2. The node is not doing a leader vrf comparison and instead preferring the second block received for some other reason. Which would be weird and possibly bad!

Sorry to give you an extra concern to look at when you are busy.

@TerminadaPool
Copy link
Author

I looked at logs on my relay3, which is on the other side of the world. It happened to receive the latest slot battle blocks in the opposite order to my relay1. And, it didn't extend its tip with the second block. Notice the difference between the logs below and the first output in my previous comment which logs the same blocks.

Sep 30 07:33:03 relay3 cardano-node[1012155]: [relay3:cardano.node.ChainDB:Notice:306] [2022-09-29 21:33:03.33 UTC] Chain extended, new tip: 9ae2d5b41718c4f132e59d2450a12455c89ab7b3ece15cd28bd6ac364cd7b742 at slot 72920892

Sep 30 07:33:03 relay3 cardano-node[1012155]: [relay3:cardano.node.ChainDB:Info:306] [2022-09-29 21:33:03.92 UTC] Block fits onto some fork: 7e509ed514b0a5ca12cbc2efa7b9b2a3c995f3fecd9cd77a3577b3b6edcc89d9 at slot 72920892

But the tip wasn't extended with that 7e509 block.

Then the chain moves on with next block as usual

Sep 30 07:33:05 relay3 cardano-node[1012155]: [relay3:cardano.node.BlockFetchDecision:Info:316] [2022-09-29 21:33:05.15 UTC] [TraceLabelPeer (ConnectionId {localAddress = 144.126.157.46:42419, remoteAddress = 54.93.82.235:3001}) (Right [At (Block {blockPointSlot = SlotNo 72920894, blockPointHash = a9bf4bca57148425216eed70484083c8c2e8996a5ab>
Sep 30 07:33:05 relay3 cardano-node[1012155]: [relay3:cardano.node.ChainDB:Notice:306] [2022-09-29 21:33:05.30 UTC] Chain extended, new tip: a9bf4bca57148425216eed70484083c8c2e8996a5ab6a9f9d33e4fce7ebf0bad at slot 72920894

Block 7e509... was received 59ms after 9ae2d... but relay3 did not adopt block 7e509... So the order of block receipt doesn't matter. In both relays block 9ae2d was preferred. I must presume that both these nodes preferred block 9ae2d because its leader vrf score was actually lower. Whereas the cncli database shows block 7e509 as having the lower leader_vrf_0 score:

sqlite3 /var/lib/cardano/cncli.db 'select block_number, slot_number, hash, leader_vrf_0, orphaned from chain where slot_number=72920892;'

7821110|72920892|7e509ed514b0a5ca12cbc2efa7b9b2a3c995f3fecd9cd77a3577b3b6edcc89d9|000037a19fc8c185e59e66572121bd8697c254726ccb89c44e151388dc36112e|1
7821110|72920892|9ae2d5b41718c4f132e59d2450a12455c89ab7b3ece15cd28bd6ac364cd7b742|0005141d6304f92bd4a189b79d93dced2b5dfe5daff3af2b8e38cd9f11c59c94|0

Therefore the likely explanation is:

  1. leader_vrf_0 values stored in cncli database are not the same as the correct leader vrf values nodes are using for comparison.

@AndrewWestberg
Copy link
Collaborator

@TerminadaPool There is nothing wrong in cncli. I believe you've actually found a bug in cardano-node. It appears that the node is incorrectly using the block_vrf_0 value for determining who wins the slot battle instead of the leader_vrf_0 value.

This is incorrect because it removes the slight advantage that small pools had in slot leadership and instead makes it entirely random.

I queried my cncli db in a similar manner as yours and determined a fairly random pattern of winners when I looked at leader_vrf_0. Out of the 30 most recent slot battles queried, 13 were showing the higher leader_vrf_0 value winning.

When you compare the block_vrf_0 values for all 30, they are perfectly in line with the lower block_vrf_0 winning each time.

@AndrewWestberg
Copy link
Collaborator

@TerminadaPool I'll reach out to IOG to see what they want to do about this.

@AndrewWestberg
Copy link
Collaborator

@TerminadaPool Please create a github issue here and link this ticket in it. https://github.com/input-output-hk/ouroboros-network/issues

@AndrewWestberg
Copy link
Collaborator

Bug confirmed by Jared from IOG. In Alonzo and earlier. the leader_vrf value was being used:
https://github.com/input-output-hk/ouroboros-network/blob/9249a70ed9e2365f3963e47cb31b4b1589bca8f6/ouroboros-consensus-shelley/src/Ouroboros/Consensus/Shelley/Protocol/TPraos.hs#L111

In Babbage, the raw vrf value is being passed instead of calculating the leader_vrf from it.
https://github.com/input-output-hk/ouroboros-network/blob/9249a70ed9e2365f3963e47cb31b4b1589bca8f6/ouroboros-consensus-shelley/src/Ouroboros/Consensus/Shelley/Protocol/Praos.hs#L141

@papacarp
Copy link

watch it in action: https://www.youtube.com/shorts/pXGQQmCneM4

@JaredCorduan
Copy link

Now that I've looked into this, I know enough to be able to make an issue myself.

@TerminadaPool
Copy link
Author

Apologies to @AndrewWestberg for thinking the bug was in cncli.

@AndrewWestberg
Copy link
Collaborator

@TerminadaPool no need to apologize whatsoever. You found a bug. This will forever be a contribution to improving cardano that you have made.

@TerminadaPool
Copy link
Author

In case this "bug" was instead a design decision: I pointed out that leader VRF value is not possible to game whereas block VRF can be manipulated by custom software doing transaction selecting.

IntersectMBO/ouroboros-network#4051 (comment)

@TerminadaPool
Copy link
Author

I have been corrected by @JaredCorduan

the value of the VRF in the block header depends on:

  • the epoch nonce
  • the slot
  • the private VRF key

@JaredCorduan
Copy link

I hope that "corrected" reads like "helped" to most folks! None of this is obvious, there is a lot of technical detail here which is not easy to discover, and my hope is that I can empower y'all.

@AndrewWestberg
Copy link
Collaborator

Closing this issue on the cncli side for now. Thanks for reporting @TerminadaPool and thanks @JaredCorduan for taking the ball.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants