Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subpar handling of blocks from the future #4251

Open
karknu opened this issue Jan 3, 2023 · 15 comments
Open

Subpar handling of blocks from the future #4251

karknu opened this issue Jan 3, 2023 · 15 comments
Assignees
Labels
consensus issues related to ouroboros-consensus consensus-migration

Comments

@karknu
Copy link
Contributor

karknu commented Jan 3, 2023

When the node manage to download a block from the future it is not adopted directly, rather it is tucked away with the hope that it can be adopted when the time comes. Any time a new block is downloaded or after a forged block is adopted the node checks to see if there are any old blocks-from-the-future that it could adopt. If there are such blocks they are adopted before the new block.

Notably it appears that a new slot/tick isn't enough to trigger the adoption of blocks-from-the-future, it is always dependent on another block being added.

This means that if all nodes in the network where to download a block early then the only way for the network to progress would be by a BP forging a new block on the current tip (that is without the block from the future). This would cause the BP to notice and adopt the old block-from the future, then it would attempt to adopt its forged block. If it lost due to vrf the TraceDidntAdoptBlock would be shown. If the BP lost it means that another BP would have to attempt to beat the block-from-the-future, since no new block was adopted nothing will cause the rest of the nodes in the network to adopt the block-from-the-future.

Suggestions:

  1. Before forging a new block chainSelectionForFutureBlocks should be run so that the node can forge its block on top of blocks-from-the-future.
  2. Trigger adoption of blocks-from-the-future without depending on any new blocks. This could be done by having a 1s timeout on the getBlockToAdd in addBlockRunner . If there was a timeout call chainSelectionForFutureBlocks and try again.

This would stop blocks from the future causing forks.
I suspect that this is at least partially the cause of IntersectMBO/cardano-node#4720

@nfrisby

@coot coot added the consensus issues related to ouroboros-consensus label Jan 3, 2023
@karknu
Copy link
Contributor Author

karknu commented Jan 3, 2023

My node had a block from the future around the time of the IntersectMBO/cardano-node#4720 issue .
Screenshot 2022-12-19 at 09 54 55

An SPO shared logs with @gufmar regarding another TraceDidntAdoptBlock at 2023-01-02T05:13:57.03 .
This also matches with a block from the future.

Screenshot 2023-01-03 at 10 27 12

@gufmar
Copy link

gufmar commented Jan 3, 2023

we have noticed several such examples in the last 4-5 epochs, as one particular pool regularly started announcing its blocks a little too early (enough to reach many/most other nodes before the actual slot time)

This is an example how it looks like

image

one of the four nodes heard about the new blockheader for slot 79817694 late enough. so he fetched and adopted, all fine.
Another node heard about the block, and had downloaded the body ~200ms before the slot time, but managed to adopt it localy.
Two other nodes also got the header and block body before the actual slot time. But then the final local adoption (green bar) was paused until the next block appearance.

In this example above the second block producer (for slot 79817700) hadn't received the previous block. It looks like all his relays hadn't adopted the block because of the early appearance.
So the second pool created a fork block.

@Straightpool
Copy link

Straightpool commented Jan 3, 2023

@karknu
Glad my logs were of help to get to the bottom of this. Nice Grafana tile to visualize this problem, I shall try to reproduce this one for my own monitoring dashboard.

For anyone who wants to double check local BP log files for the same issue do this:

If there is a "TraceDidntAdoptBlock" with severity "Error" in the BP log, find the prior line in the BP log with the TraceForgedBlock message, in close vicinity above, you should find a block fetch of a future block with negative delay being too early which blocked your own block from local adoption and thus prevented propagation.

In my case here the relevant log lines in line with the graphical representation of negative block delay, so my BP saw the future block triggering this issue:

{"app":[],"at":"2023-01-02T05:13:44.71Z","data":{"block":"ec095f8782d832cfbe936b43640cfbcfa07f8b57df333be9d23c490d3f766f13","delay":-0.285766939,"kind":"CompletedBlockFetch","peer":{"local":{"addr":"","port":""},"remote":{"addr":"","port":""}},"size":871},"env":"1.35.4:ebc7b","host":"ub20cn","loc":null,"msg":"","ns":["cardano.node.BlockFetchClient"],"pid":"833","sev":"Info","thread":"631239"}

{"app":[],"at":"2023-01-02T05:13:57.02Z","data":{"credentials":"Cardano","val":{"block":"521058f6067abb88828611255df6c2695393ae3abb4e113cf550c74ad9e88754","blockNo":8217108,"blockPrev":"9306145db376fd2aa74ea1bb24ed915f2f69a66d40dfdef2f68e82b53141a337","kind":"TraceForgedBlock","slot":81070146}},"env":"1.35.4:ebc7b","host":"ub20cn","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"833","sev":"Info","thread":"353"}

{"app":[],"at":"2023-01-02T05:13:57.02Z","data":{"block":{"hash":"521058f6067abb88828611255df6c2695393ae3abb4e113cf550c74ad9e88754","kind":"Point","slot":81070146},"kind":"TraceAddBlockEvent.TrySwitchToAFork"},"env":"1.35.4:ebc7b","host":"ub20cn","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"833","sev":"Info","thread":"343"}
{"app":[],"at":"2023-01-02T05:13:57.03Z","data":{"credentials":"Cardano","val":{"kind":"TraceDidntAdoptBlock","slot":81070146}},"env":"1.35.4:ebc7b","host":"ub20cn","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"833","sev":"Error","thread":"353"}

So for my missed block this was most definitely the issue, good catch! Looking greatly forward to the node patch, any missed block is one missed block too many.

May I ask though for the rationale in adding the complexity to try to save blocks from the future? Why not keep it simple and just discard them to encourage proper node ntp configuration? Maybe because they are hard to avoid with improper time?
A simple grep 'delay":-0' reveals 6 hits in my current node0.log. How much would really be lost though?
In any case, I hope with Ouroboros Chronos implemented in the future, complexity such as this, which obviously has caused issues here, can be removed again.

@karknu
Copy link
Contributor Author

karknu commented Jan 4, 2023

May I ask though for the rationale in adding the complexity to try to save blocks from the future? Why not keep it simple and just discard them to encourage proper node ntp configuration? Maybe because they are hard to avoid with improper time? A simple grep 'delay":-0' reveals 6 hits in my current node0.log. How much would really be lost though? In any case, I hope with Ouroboros Chronos implemented in the future, complexity such as this, which obviously has caused issues here, can be removed again.

We can't expect the clocks of all nodes to be completely in sync so there needs to be some allowance for blocks-from-the-future. Even Ouroboros Chronos needs to keep track of blocks-from-the-future.

There is a limit (currently 5s) for how far into the future blocks can be before they are ignored. This puts a limit on how out of sync nodes in the network can be.

@IntersectMBO IntersectMBO deleted a comment Jan 5, 2023
@dmitrystas
Copy link

dmitrystas commented Jan 7, 2023

I'm not 100% sure my comment will be useful, since Karl created this issue and the IOG team probably already observed similar behavior, but just in case I'll leave my logs here

So, the block d75e2bb9f1a1a7b083e081aff94a8e9363453f6a9eae40db1a884a6d468d49e8 first mentioned in my logs at 2023-01-07 05:14:09.69 UTC

[r1:cardano.node.BlockFetchDecision:Info:4018] [2023-01-07 05:14:09.69 UTC] [TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:28629, remoteAddress = 18.158.188.21:3001}) (Right [At (Block {blockPointSlot = SlotNo 81502159, blockPointHash = d75e2bb9f1a1a7b083e081aff94a8e9363453f6a9eae40db1a884a6d468d49e8})]),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:18599, remoteAddress = 213.155.70.133:5002}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:11805, remoteAddress = 78.47.99.41:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:23557, remoteAddress = 89.47.164.140:3333}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:30965, remoteAddress = 88.80.147.114:6002}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:14569, remoteAddress = 18.133.104.22:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:14771, remoteAddress = 5.185.126.249:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:28843, remoteAddress = 52.57.122.128:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:13251, remoteAddress = 173.50.95.212:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:17579, remoteAddress = 209.126.86.244:5003}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:7281, remoteAddress = 85.239.230.48:3333}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:12649, remoteAddress = 38.242.212.113:3001}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:20081, remoteAddress = 143.110.236.218:6000}) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (ConnectionId {localAddress = 84.46.244.205:3251, remoteAddress = 154.12.240.223:6000}) (Left FetchDeclineChainNotPlausible)]

and reached my block producer at 2023-01-07 05:14:09.73 UTC. According to the block schedule (based on the analysis of the blockchain), this block should have been created at 2023-01-07 05:14:10 UTC, so the block from the future. After that my pool haven't put this block into the chain (as Karl mentioned early), and just continued checking each slot for leadership till time for block creation came at 2023-01-07 05:16:01.00 UTC. After that the fork occurred

[p1:cardano.node.LeadershipCheck:Info:326] [2023-01-07 05:14:10.00 UTC] {"chainDensity":4.976489e-2,"credentials":"Cardano","delegMapSize":1253566,"kind":"TraceStartLeadershipCheck","slot":81502159,"utxoSize":9598554}
[p1:cardano.node.Forge:Info:326] [2023-01-07 05:14:10.03 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 8.1502159e7)]))]
[p1:cardano.node.LeadershipCheck:Info:326] [2023-01-07 05:14:11.00 UTC] {"chainDensity":4.976489e-2,"credentials":"Cardano","delegMapSize":1253566,"kind":"TraceStartLeadershipCheck","slot":81502160,"utxoSize":9598554}
[p1:cardano.node.Forge:Info:326] [2023-01-07 05:14:11.03 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 8.150216e7)]))]
[p1:cardano.node.LeadershipCheck:Info:326] [2023-01-07 05:14:12.00 UTC] {"chainDensity":4.976489e-2,"credentials":"Cardano","delegMapSize":1253566,"kind":"TraceStartLeadershipCheck","slot":81502161,"utxoSize":9598554}
[p1:cardano.node.Forge:Info:326] [2023-01-07 05:14:12.02 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 8.1502161e7)]))]
[p1:cardano.node.LeadershipCheck:Info:326] [2023-01-07 05:14:13.00 UTC] {"chainDensity":4.976489e-2,"credentials":"Cardano","delegMapSize":1253566,"kind":"TraceStartLeadershipCheck","slot":81502162,"utxoSize":9598554}
[p1:cardano.node.Forge:Info:326] [2023-01-07 05:14:13.03 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 8.1502162e7)]))]
[p1:cardano.node.LeadershipCheck:Info:326] [2023-01-07 05:14:14.00 UTC] {"chainDensity":4.976489e-2,"credentials":"Cardano","delegMapSize":1253566,"kind":"TraceStartLeadershipCheck","slot":81502163,"utxoSize":9598554}
[p1:cardano.node.Forge:Info:326] [2023-01-07 05:14:14.02 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 8.1502163e7)]))]
......
[p1:cardano.node.LeadershipCheck:Info:326] [2023-01-07 05:16:01.00 UTC] {"chainDensity":4.976489e-2,"credentials":"Cardano","delegMapSize":1253566,"kind":"TraceStartLeadershipCheck","slot":81502270,"utxoSize":9598554}
[p1:cardano.node.Forge:Info:326] [2023-01-07 05:16:01.03 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeIsLeader"),("slot",Number 8.150227e7)]))]
[p1:cardano.node.Forge:Info:326] [2023-01-07 05:16:01.05 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("block",String "ac6a5ba8319f37a8997b9c6e8517710a4a61cec877992f980ebde143e9e93f90"),("blockNo",Number 8238422.0),("blockPrev",String "29a5c8bbcaf5da4de379205a5baa717f45e155432c1efad7479d6c03d2a55234"),("kind",String "TraceForgedBlock"),("slot",Number 8.150227e7)]))]
[p1:cardano.node.ChainDB:Notice:316] [2023-01-07 05:16:01.05 UTC] Chain extended, new tip: d75e2bb9f1a1a7b083e081aff94a8e9363453f6a9eae40db1a884a6d468d49e8 at slot 81502159
[p1:cardano.node.ChainDB:Notice:316] [2023-01-07 05:16:01.05 UTC] before next, messages elided = 10078454653411508
[p1:cardano.node.ChainDB:Notice:316] [2023-01-07 05:16:01.05 UTC] Chain extended, new tip: d75e2bb9f1a1a7b083e081aff94a8e9363453f6a9eae40db1a884a6d468d49e8 at slot 81502159
[p1:cardano.node.ChainDB:Info:316] [2023-01-07 05:16:01.05 UTC] Block fits onto some fork: ac6a5ba8319f37a8997b9c6e8517710a4a61cec877992f980ebde143e9e93f90 at slot 81502270
[p1:cardano.node.Forge:Error:326] [2023-01-07 05:16:01.06 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceDidntAdoptBlock"),("slot",Number 8.150227e7)]))]

The fact that my pool has also created the block before d75e2bb9f1a1a7b083e081aff94a8e9363453f6a9eae40db1a884a6d468d49e8 might or might not be of interest

@nfrisby
Copy link
Contributor

nfrisby commented Jan 10, 2023

The first suggestion seems reasonable and like only a mild design challenge.

Before forging a new block chainSelectionForFutureBlocks should be run so that the node can forge its block on top of blocks-from-the-future.

This would ensure that you consider pending blocks before forging. When you do find such blocks this way, it would delay your new block by however long it takes to process the pending block. But your forged block would (usually?) have a higher BlockNo as a result, so that seems like a fair trade.

The other suggestion, checking for pending blocks on some fixed internal (eg the midpoint of each block slot), might avoid that possible delay. But it also doesn't ensure that the leadership check notices the pending blocks (if they arrived very recently). Seems like a toss-up to me, and also maybe less natural to synchronize.

Not super confident either way, but those are my initial thoughts.


I'm unsure how to prioritize this wrt to other work. How prevalent is it? Are maleficent SPOs setting their clocks early, causing this to happen often?

@gufmar
Copy link

gufmar commented Jan 10, 2023

I'm unsure how to prioritize this wrt to other work. How prevalent is it? Are maleficent SPOs setting their clocks early, causing this to happen often?

Here below is a list of (almost*) all recorded blocks for last 3 weeks, with an header announced in the future.

headerTime tells the milliseconds the header reached the reporting node in advance.

adoption tells the time in milliseconds till local adoption. Values >0 <1000 are expected
Values > 1000ms indicate the situation described by Karl in this issue
Empty adoption fields also indicate a delayed local adoption (*). The difference is in this case the early announcing pool lost the height battle, and his block finally wasn't adopted.

(*) note we noted this special behaviour since mid of December, and we refined the monitoring data capture process to also identify the delayed and then not adopted block only after Dec-27

Considering this extraordinarily negative impact on the rewards for the early announced block (50% of the lost slot battles go to its account, while the other 50% are shared with all the other pools), I wouldn't say that it is deliberately malicious. But if you look at the number of early announced blocks since mid-December 2022, you will notice that one particular pool is causing most of these altitude battles. (I won't name names here, the pool has already been informed).

This delaying effect may increases probability for #4252
I don't know it it opens other theoretical or practical attack vectors to malicious players.

| blockNo | slotNo   | headerTime | adoption | blockSize | reported         |
|---------|----------|------------|----------|-----------|------------------|
| 8160563 | 79905623 | -240       | 980      | 2125      | 19.12.2022 17:45 |
| 8162426 | 79943581 | -180       | 820      | 61231     | 20.12.2022 04:17 |
| 8163751 | 79972015 | -120       | 100      | 83417     | 20.12.2022 12:11 |
| 8165372 | 80003426 | -240       | 190      | 89514     | 20.12.2022 20:55 |
| 8165945 | 80014767 | -220       | 22660    | 46512     | 21.12.2022 00:04 |
| 8166572 | 80027236 | -270       | 660      | 48029     | 21.12.2022 03:32 |
| 8167190 | 80039236 | -270       | 14580    | 28482     | 21.12.2022 06:52 |
| 8167338 | 80042995 | -300       | 22780    | 2415      | 21.12.2022 07:55 |
| 8167583 | 80047611 | -290       | 16560    | 3161      | 21.12.2022 09:11 |
| 8169137 | 80080145 | -270       | 11040    | 33676     | 21.12.2022 18:14 |
| 8172488 | 80148890 | -300       | 390      | 2776      | 22.12.2022 13:19 |
| 8173782 | 80177008 | -220       | 28100    | 88576     | 22.12.2022 21:08 |
| 8175768 | 80218463 | -280       | 860      | 5492      | 23.12.2022 08:39 |
| 8176234 | 80227676 | -290       | 960      | 72537     | 23.12.2022 11:12 |
| 8178176 | 80267443 | -250       | 31790    | 52534     | 23.12.2022 22:16 |
| 8180134 | 80306773 | -300       | 5540     | 4452      | 24.12.2022 09:11 |
| 8181846 | 80343782 | -190       | 9060     | 42643     | 24.12.2022 19:28 |
| 8181980 | 80346301 | -150       | 100      | 89739     | 24.12.2022 20:09 |
| 8185162 | 80413291 | -260       | 2720     | 18212     | 25.12.2022 14:46 |
| 8188334 | 80477074 | -280       | 10490    | 25626     | 26.12.2022 08:29 |
| 8188555 | 80481990 | -260       | 20940    | 25441     | 26.12.2022 09:51 |
| 8189643 | 80503909 | -270       | 2700     | 25397     | 26.12.2022 15:56 |
| 8191455 | 80541475 | -280       | 11700    | 37297     | 27.12.2022 02:22 |
| 8167223 | 80040247 | -260       |          | 14436     | 27.12.2022 13:05 |
| 8193425 | 80582214 | -250       | 3560     | 32882     | 27.12.2022 13:41 |
| 8193725 | 80588451 | -290       | 10010    | 3653      | 27.12.2022 15:25 |
| 8194733 | 80610083 | -240       |          | 63649     | 27.12.2022 21:27 |
| 8195563 | 80626738 | -270       |          | 29210     | 28.12.2022 02:03 |
| 8196068 | 80636229 | -290       |          | 13836     | 28.12.2022 04:42 |
| 8196244 | 80640012 | -300       |          | 13745     | 28.12.2022 05:45 |
| 8197387 | 80663984 | -280       | 13910    | 21653     | 28.12.2022 12:24 |
| 8197691 | 80670450 | -260       | 22690    | 54181     | 28.12.2022 14:12 |
| 8197926 | 80675013 | -270       |          | 38979     | 28.12.2022 15:29 |
| 8198191 | 80680111 | -290       |          | 12367     | 28.12.2022 16:53 |
| 8198456 | 80686458 | -210       |          | 88978     | 28.12.2022 18:39 |
| 8198914 | 80696100 | -220       |          | 24495     | 28.12.2022 21:19 |
| 8200038 | 80719005 | -280       | 1510     | 26050     | 29.12.2022 03:41 |
| 8201064 | 80739796 | -310       | 11470    | 2656      | 29.12.2022 09:28 |
| 8201068 | 80739881 | -280       | 26270    | 2453      | 29.12.2022 09:29 |
| 8201503 | 80749162 | -290       | 14620    | 11336     | 29.12.2022 12:04 |
| 8201762 | 80754443 | -260       |          | 89656     | 29.12.2022 13:32 |
| 8201845 | 80755918 | -310       | 21620    | 9283      | 29.12.2022 13:57 |
| 8201845 | 80755918 | -310       | 21620    | 9283      | 29.12.2022 13:57 |
| 8202466 | 80768061 | -260       |          | 11323     | 29.12.2022 17:20 |
| 8202466 | 80768061 | -260       |          | 11323     | 29.12.2022 17:20 |
| 8202620 | 80771362 | -250       |          | 49763     | 29.12.2022 18:15 |
| 8202620 | 80771362 | -250       |          | 49763     | 29.12.2022 18:15 |
| 8202896 | 80777284 | -160       |          | 1767      | 29.12.2022 19:53 |
| 8202896 | 80777284 | -160       |          | 1767      | 29.12.2022 19:53 |
| 8205180 | 80823538 | -260       |          | 51533     | 30.12.2022 08:43 |
| 8205180 | 80823538 | -260       |          | 51533     | 30.12.2022 08:44 |
| 8205684 | 80834209 | -290       | 480      | 871       | 30.12.2022 11:41 |
| 8205684 | 80834209 | -290       | 480      | 871       | 30.12.2022 11:41 |
| 8205746 | 80835170 | -290       | 8640     | 18446     | 30.12.2022 11:57 |
| 8205746 | 80835170 | -290       | 8640     | 18446     | 30.12.2022 11:57 |
| 8205886 | 80838081 | -280       |          | 13140     | 30.12.2022 12:47 |
| 8205886 | 80838081 | -280       |          | 13140     | 30.12.2022 12:47 |
| 8206487 | 80850994 | -250       | 220      | 84062     | 30.12.2022 16:21 |
| 8206487 | 80850994 | -250       | 220      | 84062     | 30.12.2022 16:21 |
| 8206673 | 80854459 | -310       |          | 1543      | 30.12.2022 17:19 |
| 8206673 | 80854459 | -310       |          | 1543      | 30.12.2022 17:19 |
| 8206699 | 80854857 | -290       | 4550     | 27192     | 30.12.2022 17:25 |
| 8206699 | 80854857 | -290       | 4550     | 27192     | 30.12.2022 17:25 |
| 8206800 | 80856978 | -280       | 31800    | 8236      | 30.12.2022 18:01 |
| 8206800 | 80856978 | -280       | 31800    | 8236      | 30.12.2022 18:01 |
| 8207241 | 80866719 | -240       | 7150     | 79176     | 30.12.2022 20:43 |
| 8207241 | 80866719 | -240       | 7150     | 79176     | 30.12.2022 20:43 |
| 8207243 | 80866755 | -270       |          | 62391     | 30.12.2022 20:44 |
| 8207243 | 80866755 | -270       |          | 62391     | 30.12.2022 20:44 |
| 8209179 | 80906869 | -250       |          | 54716     | 31.12.2022 07:52 |
| 8209179 | 80906869 | -250       |          | 54716     | 31.12.2022 07:52 |
| 8209671 | 80917061 | -290       | 7860     | 17133     | 31.12.2022 10:42 |
| 8209671 | 80917061 | -290       | 7860     | 17133     | 31.12.2022 10:42 |
| 8209698 | 80917712 | -250       | 1630     | 42073     | 31.12.2022 10:53 |
| 8209698 | 80917712 | -250       | 1630     | 42073     | 31.12.2022 10:53 |
| 8210014 | 80923833 | -240       |          | 57496     | 31.12.2022 12:36 |
| 8210014 | 80923833 | -240       |          | 57496     | 31.12.2022 12:36 |
| 8212303 | 80970688 | -280       |          | 22794     | 01.01.2023 01:37 |
| 8212303 | 80970688 | -280       |          | 22794     | 01.01.2023 01:37 |
| 8213703 | 80999503 | -260       |          | 16341     | 01.01.2023 09:36 |
| 8213703 | 80999503 | -260       |          | 16341     | 01.01.2023 09:36 |
| 8214251 | 81010738 | -270       |          | 2506      | 01.01.2023 12:43 |
| 8214251 | 81010738 | -270       |          | 2506      | 01.01.2023 12:43 |
| 8214329 | 81012429 | -220       |          | 69311     | 01.01.2023 13:12 |
| 8214329 | 81012429 | -220       |          | 69311     | 01.01.2023 13:12 |
| 8215150 | 81028694 | -230       | 8110     | 53521     | 01.01.2023 17:43 |
| 8215150 | 81028694 | -230       | 8110     | 53521     | 01.01.2023 17:43 |
| 8215152 | 81028726 | -280       | 14750    | 32800     | 01.01.2023 17:43 |
| 8215152 | 81028726 | -280       | 14750    | 32800     | 01.01.2023 17:43 |
| 8215706 | 81040703 | -280       | 13140    | 9590      | 01.01.2023 21:03 |
| 8215706 | 81040703 | -280       | 13140    | 9590      | 01.01.2023 21:03 |
| 8216096 | 81049448 | -220       | 140      | 88957     | 01.01.2023 23:29 |
| 8216096 | 81049448 | -220       | 140      | 88957     | 01.01.2023 23:29 |
| 8217625 | 81080788 | -280       |          | 42751     | 02.01.2023 08:11 |
| 8217625 | 81080788 | -280       |          | 42751     | 02.01.2023 08:11 |
| 8217918 | 81086866 | -240       | 14590    | 39956     | 02.01.2023 09:52 |
| 8218356 | 81095134 | -270       |          | 49949     | 02.01.2023 12:10 |
| 8218958 | 81107130 | -200       | 130      | 84156     | 02.01.2023 15:30 |
| 8219009 | 81108007 | -210       | 140      | 77779     | 02.01.2023 15:45 |
| 8220558 | 81139420 | -260       |          | 45799     | 03.01.2023 00:28 |
| 8221597 | 81160363 | -290       |          | 16612     | 03.01.2023 06:17 |
| 8222314 | 81174155 | -270       | 21740    | 13932     | 03.01.2023 10:07 |
| 8223142 | 81190599 | -290       | 3700     | 2566      | 03.01.2023 14:41 |
| 8224299 | 81215584 | -240       |          | 36324     | 03.01.2023 21:38 |
| 8224309 | 81215803 | -270       |          | 64004     | 03.01.2023 21:42 |
| 8224446 | 81218713 | -250       |          | 3511      | 03.01.2023 22:30 |
| 8224835 | 81226921 | -230       |          | 77188     | 04.01.2023 00:46 |
| 8225568 | 81241902 | -260       |          | 66463     | 04.01.2023 04:57 |
| 8226068 | 81252133 | -280       |          | 40246     | 04.01.2023 07:48 |
| 8226716 | 81265022 | -270       |          | 47830     | 04.01.2023 11:21 |
| 8227721 | 81285145 | -70        | 80       | 71270     | 04.01.2023 16:57 |
| 8228410 | 81298597 | -250       | 230      | 88207     | 04.01.2023 20:41 |
| 8228498 | 81300819 | -250       |          | 28146     | 04.01.2023 21:19 |
| 8229218 | 81315557 | -240       |          | 80008     | 05.01.2023 01:24 |
| 8229841 | 81328274 | -300       | 13680    | 10933     | 05.01.2023 04:56 |
| 8231318 | 81357144 | -310       | 410      | 2926      | 05.01.2023 12:57 |
| 8231632 | 81363896 | -270       |          | 47121     | 05.01.2023 14:50 |
| 8233907 | 81410965 | -290       | 960      | 34255     | 06.01.2023 03:54 |
| 8234903 | 81430446 | -300       | 31880    | 35591     | 06.01.2023 09:19 |
| 8235110 | 81434403 | -290       |          | 8443      | 06.01.2023 10:25 |
| 8235385 | 81440120 | -240       |          | 89441     | 06.01.2023 12:00 |
| 8235676 | 81445735 | -260       |          | 54659     | 06.01.2023 13:34 |
| 8236105 | 81455289 | -280       |          | 3534      | 06.01.2023 16:13 |
| 8236210 | 81457738 | -160       |          | 86718     | 06.01.2023 16:54 |
| 8237465 | 81482975 | -220       | 9630     | 77700     | 06.01.2023 23:54 |
| 8238094 | 81495463 | -270       |          | 38572     | 07.01.2023 03:23 |
| 8238133 | 81496460 | -250       |          | 65769     | 07.01.2023 03:39 |
| 8238422 | 81502159 | -330       |          | 3441      | 07.01.2023 05:16 |
| 8238443 | 81502652 | -10        | 10       | 10274     | 07.01.2023 05:22 |
| 8238549 | 81505340 | -280       |          | 16270     | 07.01.2023 06:07 |
| 8238572 | 81505759 | -280       |          | 3779      | 07.01.2023 06:14 |
| 8239233 | 81519978 | -290       | 25610    | 871       | 07.01.2023 10:11 |
| 8239678 | 81529416 | -310       |          | 2527      | 07.01.2023 12:48 |
| 8240054 | 81537543 | -260       |          | 28512     | 07.01.2023 15:04 |
| 8240233 | 81541239 | -280       | 480      | 29595     | 07.01.2023 16:05 |
| 8240614 | 81549094 | -300       |          | 26625     | 07.01.2023 18:16 |
| 8241584 | 81569298 | -260       |          | 36122     | 07.01.2023 23:53 |
| 8241624 | 81570277 | -300       |          | 871       | 08.01.2023 00:11 |
| 8243651 | 81611651 | -240       |          | 24897     | 08.01.2023 11:40 |
| 8244575 | 81630859 | -170       |          | 87321     | 08.01.2023 16:59 |
| 8245741 | 81655815 | -250       | 4670     | 47936     | 08.01.2023 23:55 |
| 8245797 | 81657056 | -170       | 110      | 82198     | 09.01.2023 00:15 |
| 8245906 | 81659161 | -260       | 29460    | 72585     | 09.01.2023 00:51 |
| 8247172 | 81685299 | -260       |          | 20501     | 09.01.2023 08:06 |
| 8248085 | 81704376 | -280       |          | 12258     | 09.01.2023 13:24 |
| 8248336 | 81709549 | -230       |          | 30069     | 09.01.2023 14:50 |
| 8249369 | 81730764 | -250       |          | 61209     | 09.01.2023 20:44 |
| 8249410 | 81731520 | -260       |          | 69273     | 09.01.2023 20:56 |
| 8249639 | 81735901 | -240       |          | 3161      | 09.01.2023 22:10 |
| 8249951 | 81742707 | -210       |          | 87546     | 10.01.2023 00:03 |
| 8250119 | 81746776 | -280       |          | 16820     | 10.01.2023 01:11 |
| 8251805 | 81781697 | -300       | 27610    | 2485      | 10.01.2023 10:53 |
| 8252218 | 81790405 | -270       | 1510     | 17764     | 10.01.2023 13:18 |
| 8252236 | 81790896 | -260       | 16690    | 53654     | 10.01.2023 13:26 |```

@gufmar
Copy link

gufmar commented May 9, 2023

At Slot 92045428 (9-May 06:55:19am GMT) we had a block that was announced, broadcasted and adopted too early. (around half a second before the actual slot time)

After this block, there was an 8 minutes lasting break of no new blocks and it seems as it is related to this issue here.

2023-05-09_08h57_38

@Straightpool
Copy link

Straightpool commented May 9, 2023

I again lost a block due to this issue today at the time mentioned above by Gufmar.

My BP completed BlockFetch of the correct parent block of an unnamed pool with a negative delay, a block from the future: https://cardanoscan.io/block/8749998

{"app":[],"at":"2023-05-09T05:55:18.61Z","data":{"block":"bae1e9d042b11a86503108739a6ec054566355d520e871f44639a82732a5207f","delay":-0.392551713,"kind":"CompletedBlockFetch","peer"

My 1.35.7 BP did then not apply this block to the currentTip until 4 minutes later colliding with itself, first my BP minted my block on the old, wrong parent

{"app":[],"at":"2023-05-09T05:59:39.02Z","data":{"credentials":"Cardano","val":{"kind":"TraceNodeIsLeader","slot":92045688}},"env":"1.35.7:f0b4a","host":"ub22cn","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"860","sev":"Info","thread":"377"} {"app":[],"at":"2023-05-09T05:59:39.06Z","data":{"credentials":"Cardano","val":{"block":"10c3f7e59bbcd99dbde73c6ed53d03a30bbbdf81ba056d1ec6be4c18185be56d","blockNo":8749998,"blockPrev":"d98f58ed260269164ed7cdde61cd7302597803a308dfa5b6afefaa1d802e7b54","kind":"TraceForgedBlock","slot":92045688}},"env":"1.35.7:f0b4a","host":"ub22cn","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"860","sev":"Info","thread":"377"}

At the exact same UTC time my BP decided to finally add the future block of the unnamed pool to the tip, not once but twice for good measure

{"app":[],"at":"2023-05-09T05:59:39.06Z","data":{"chainLengthDelta":1,"kind":"TraceAddBlockEvent.AddedToCurrentChain","newtip":"bae1e9d042b11a86503108739a6ec054566355d520e871f44639a82732a5207f@92045428"},"env":"1.35.7:f0b4a","host":"ub22cn","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"860","sev":"Notice","thread":"367"} {"app":[],"at":"2023-05-09T05:59:39.06Z","data":{"kind":"LogValue","name":"before next, messages elided","value":{"contents":825390879782320,"tag":"PureI"}},"env":"1.35.7:f0b4a","host":"ub22cn","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"860","sev":"Notice","thread":"367"} {"app":[],"at":"2023-05-09T05:59:39.06Z","data":{"chainLengthDelta":1,"kind":"TraceAddBlockEvent.AddedToCurrentChain","newtip":"bae1e9d042b11a86503108739a6ec054566355d520e871f44639a82732a5207f@92045428"},"env":"1.35.7:f0b4a","host":"ub22cn","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"860","sev":"Notice","thread":"367"}

Now the BP had created a fork on it's own and tried to decide which fork to take: The one just created itself or the block from the future it added 4 minutes too late to its own ledger db?

{"app":[],"at":"2023-05-09T05:59:39.06Z","data":{"block":{"hash":"10c3f7e59bbcd99dbde73c6ed53d03a30bbbdf81ba056d1ec6be4c18185be56d","kind":"Point","slot":92045688},"kind":"TraceAddBlockEvent.TrySwitchToAFork"},"env":"1.35.7:f0b4a","host":"ub22cn","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"860","sev":"Info","thread":"367"}

My BP then dropped its own block:

{"app":[],"at":"2023-05-09T05:59:39.10Z","data":{"credentials":"Cardano","val":{"kind":"TraceDidntAdoptBlock","slot":92045688}},"env":"1.35.7:f0b4a","host":"ub22cn","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"860","sev":"Error","thread":"377"}

The losing out new block of the BP was never picked up by my relays either, so no block explorer saw the issue. This is also visualised in the monitoring, the BP (bright green) is one block ahead and it takes minutes for all nodes to align again

image

@rickymac68
Copy link

Any progress on a potential fix since January? I'm curious if a node release version to address this has been planned. Thx.

@dnadales
Copy link
Member

dnadales commented Jun 6, 2023

Any progress on a potential fix since January? I'm curious if a node release version to address this has been planned. Thx.

Hi 👋 The Consensus team analyzed possible fixes for this issue, but they have security and incentives implications for the network, so we need to look at this with a multi-disciplinary team consisting of members of the Network and Research team. At the moment we don't have people available to work on this, but we'll consider this issue for inclusion in our quarterly planning sessions.

@Straightpool
Copy link

Straightpool commented Jan 28, 2024

Happend again today, did not lose a block this time luckily myself, but others for sure did, the chain halted for 6 minutes, see https://pooltool.io/realtime/9861109

Log from one of my relays:
"app":[],"at":"2024-01-28T11:58:32.77Z","data":{"block":"3c7be56c81d2949ce7e13e2f0eb6abbe7dd9c89ebef53316a27ac57f236b097b","delay":-0.231176125,"kind":"CompletedBlockFetch","peer":{"local":{"addr":","port":""},"remote":{"addr":"78.141.218.135","port":"3001"}},"size":1788},"env":"8.7.3:30b6e","host":"ub22r2","loc":null,"msg":"","ns":["cardano.node.BlockFetchClient"],"pid":"3115980","sev":"Info","thread":"968062"}

@lead-pool
Copy link

Unfortunately i was one of the pools that lost a block because of this... Below is a copy of my logs incase they are useful

{"app":[],"at":"2024-01-28T12:03:23.03Z","data":{"credentials":"Cardano","val":{"kind":"TraceNodeNotLeader","slot":114877112}},"env":"8.1.2:d2d90","host":"producer","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"1547","sev":"Info",>
{"app":[],"at":"2024-01-28T12:03:24.00Z","data":{"chainDensity":4.7089357e-2,"credentials":"Cardano","delegMapSize":1333936,"kind":"TraceStartLeadershipCheck","slot":114877113,"utxoSize":10976816},"env":"8.1.2:d2d90","host":"producer","loc>
{"app":[],"at":"2024-01-28T12:03:24.11Z","data":{"credentials":"Cardano","val":{"kind":"TraceNodeIsLeader","slot":114877113}},"env":"8.1.2:d2d90","host":"producer","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"1547","sev":"Info",">
{"app":[],"at":"2024-01-28T12:03:24.19Z","data":{"credentials":"Cardano","val":{"block":"0bb540b0b4da38badc11f98c78722c742cff125eb3b84363572977372839a272","blockNo":9861109,"blockPrev":"2a8b15efe9bd5b00d8d7391c6965ecff193ab96e8a143fecf8a80>
{"app":[],"at":"2024-01-28T12:03:24.19Z","data":{"chainLengthDelta":1,"kind":"TraceAddBlockEvent.AddedToCurrentChain","newtip":"3c7be56c81d2949ce7e13e2f0eb6abbe7dd9c89ebef53316a27ac57f236b097b@114876822"},"env":"8.1.2:d2d90","host":"produc>
{"app":[],"at":"2024-01-28T12:03:24.19Z","data":{"kind":"LogValue","name":"before next, messages elided","value":{"contents":343168217855463,"tag":"PureI"}},"env":"8.1.2:d2d90","host":"producer","loc":null,"msg":"","ns":["cardano.node.Chai>
{"app":[],"at":"2024-01-28T12:03:24.19Z","data":{"chainLengthDelta":1,"kind":"TraceAddBlockEvent.AddedToCurrentChain","newtip":"3c7be56c81d2949ce7e13e2f0eb6abbe7dd9c89ebef53316a27ac57f236b097b@114876822"},"env":"8.1.2:d2d90","host":"produc>
{"app":[],"at":"2024-01-28T12:03:24.19Z","data":{"block":{"hash":"0bb540b0b4da38badc11f98c78722c742cff125eb3b84363572977372839a272","kind":"Point","slot":114877113},"kind":"TraceAddBlockEvent.TrySwitchToAFork"},"env":"8.1.2:d2d90","host":">
{"app":[],"at":"2024-01-28T12:03:24.19Z","data":{"credentials":"Cardano","val":{"kind":"TraceDidntAdoptBlock","slot":114877113}},"env":"8.1.2:d2d90","host":"producer","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"1547","sev":"Erro>
{"app":[],"at":"2024-01-28T12:03:25.00Z","data":{"chainDensity":4.7114022e-2,"credentials":"Cardano","delegMapSize":1333936,"kind":"TraceStartLeadershipCheck","slot":114877114,"utxoSize":10976816},"env":"8.1.2:d2d90","host":"producer","loc>
{"app":[],"at":"2024-01-28T12:03:25.03Z","data":{"credentials":"Cardano","val":{"kind":"TraceNodeNotLeader","slot":114877114}},"env":"8.1.2:d2d90","host":"producer","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"1547","sev":"Info",>
{"app":[],"at":"2024-01-28T12:03:26.00Z","data":{"chainDensity":4.7114022e-2,"credentials":"Cardano","delegMapSize":1333936,"kind":"TraceStartLeadershipCheck","slot":114877115,"utxoSize":10976816},"env":"8.1.2:d2d90","host":"producer","loc>
{"app":[],"at":"2024-01-28T12:03:26.03Z","data":{"credentials":"Cardano","val":{"kind":"TraceNodeNotLeader","slot":114877115}},"env":"8.1.2:d2d90","host":"producer","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"1547","sev":"Info",>

@gufmar
Copy link

gufmar commented Jan 28, 2024

one of the pools that lost a block because of this

Here's what highly probably happened

  • Last block before the break period was slot 114876813 filled by RSTK
  • Then 9 slots but only 8.5 secs later we had another block at slot 114876822 produced by SCAR and broadcasted ~ half a second too early. (a "block from the future" case)
  • Then it took 6 minutes and 6 seconds until slot 114877188 when STAKE produced a new block, build on top of RSTK block

during these 6 minutes we probably had other slot leaders, all stuck to adopt SCAR's block and hence building their block on RSTK. So they ran into height battles with SCAR

  • To solve the height battle the hashes are compared, and apparently SCAR won multiple times. Against LEAD and may some other elected slot leaders, we only know about if reported by the SPO here, like LEAD did for slot 114877113.

So no new blocks until STAKE was lucky to win the height battle and catch out SCARs block, stuck in adoption on many nodes worldwide.

@TerminadaPool
Copy link

Another problem with using the block VRF to deterministically settle "fork battles" is that a malicious pool that is the next valid leader can deliberately choose to cause a fork by building upon the second last block if it knows it will win the VRF battle with the last block. A group of such pools could work together to selectively knock out blocks produced by other pools that are not members of their malicious group.

I wonder if this attack method as well as the problem identified in this comment thread could be solved by three changes:

  1. Change the slot leadership calculation so that only every (say) 4th slot is a possible valid slot. IE: Add in 'mod 4' to the calculation and re-calibrate the parameters so that there is still, on average, 1 valid slot leader to produce a block every 20 seconds.
  2. Change the chain selection algorithm so that the block VRF is only used to settle equal length forks when the terminal blocks have the exact same slot number (IE: true slot battles), otherwise the node should prefer it's current chain.
  3. Reject blocks that arrive more than 4 seconds early. (IE. Equal to the minimum amount of time between possible valid slots, as a consequence of change 1.)

Such changes would also fix another problem that disincentivises geographic decentralisation: Stake pools run from more remote areas (relative to the majority in USA/EU) suffer increased network delays in sending and receiving blocks. Such pools that experience just 1 second delays will suffer 3 times the number of "fork battles", and pools that experience 2 second delays will suffer 5 times the number of fork battles. Since half of these battles will be lost this is a huge cost for the increased geographic decentralisation these pools provide.

IE: The majority pools in USA/EU with less than 1 second delays will only lose around 2.5% of their blocks. But, pools with 1 second delays will lose around 7.5% of their blocks, and pools with 2 second delays will lose around 12.5% of their blocks. Such numbers create a massive incentive for operators to centrally re-house their pool in a USA/EU data centre owned by Amazon.

The above changes would give each block 4 seconds to propagate across the network. Therefore pools with 1, 2, or even 3 seconds of delays wouldn't be penalised for their decentralisation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus issues related to ouroboros-consensus consensus-migration
Projects
Status: 🔖 Ready
Status: No status
Development

No branches or pull requests

11 participants