Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robustness: DELA network breaks down after 256 rounds #269

Open
PascalinDe opened this issue Sep 6, 2023 · 5 comments
Open

Robustness: DELA network breaks down after 256 rounds #269

PascalinDe opened this issue Sep 6, 2023 · 5 comments

Comments

@PascalinDe
Copy link

PascalinDe commented Sep 6, 2023

after round 255 the following happens:

d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z INF pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/core/ordering/cosipbft/mod.go:387 > block event addr=dela-worker-0:2000 index=256 root=b9bf9506
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z DBG pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/core/ordering/cosipbft/mod.go:551 > round has started addr=dela-worker-0:2000 index=257
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z DBG pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/core/ordering/cosipbft/blocksync/default.go:230 > received synchronization message addr=dela-worker-0:2000 index=255
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/mino/minogrpc/session/mod.go:398 > relay failed to send error="client: rpc error: code = Canceled desc = context canceled" addr=dela-worker-0:2000
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/mino/minogrpc/session/mod.go:398 > relay failed to send error="client: rpc error: code = Canceled desc = context canceled" addr=dela-worker-0:2000
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/mino/minogrpc/session/mod.go:374 > failed to setup relay error="client: rpc error: code = Canceled desc = context canceled" addr=dela-worker-0:2000 to=dela-worker-2:2000
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z ERR pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/mino/minogrpc/rpc.go:227 > stream to root failed error="rpc error: code = Unknown desc = handler failed to process: failed to verify chain: mismatch from: 'd6daf929' != '1894d9c4'"
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/mino/minogrpc/session/mod.go:389 > parent is closing error="client: rpc error: code = Canceled desc = context canceled" addr=Orchestrator:dela-worker-0:2000
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/core/ordering/cosipbft/blocksync/default.go:124 > announcement failed error="session Orchestrator:dela-worker-0:2000 is closing: Canceled" addr=dela-worker-0:2000
d-voting-dela-worker-0-1  | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/mino/minogrpc/session/mod.go:374 > failed to setup relay error="client: rpc error: code = Canceled desc = context canceled" addr=dela-worker-0:2000 to=dela-worker-3:2000

and then no new transactions can be added to the block chain anymore (i.e. no new forms, votes, ...)

I do not understand DELA enough to guess at an answer, the only thing that seems suspicious to me is the following line in the logs:

d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z DBG pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/core/ordering/cosipbft/mod.go:551 > round has started addr=dela-worker-0:2000 index=257
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z DBG pkg/mod/go.dedis.ch/dela@v0.0.0-20221010131641-9c479e68be18/core/ordering/cosipbft/blocksync/default.go:230 > received synchronization message addr=dela-worker-0:2000 index=255

when before that the index has always been increasing

To reproduce: on a clean install, create a form and add votes up until 256

@nkcr
Copy link
Contributor

nkcr commented Sep 6, 2023

Congratulation for daring to go beyond 255 blocks, you found a pretty nasty bug 😁.

The symptom that we see is that node-0 rejects the chain it receives from itself during the periodic sync that happens among nodes. When nodes receive a sync message, they first validate the chain of links present in the sync message by checking that each forward and backward links correspond.

The chain is stored in a key-value store, where each key is the index of the block. When we create the chain of links (which is a lighter version of the blockchain) for validation, we iterate over all keys from the key-value store in a sorted order by key, which should naturally provide blocks from the genesis to the latest one. (genesis block has index 0, next block 1, etc...)
We are using bbolt for the key-value store, which states that "Bolt stores its keys in byte-sorted order within a bucket".

When we store a block, we compute the key that we use for the key-value store with this function:

func (s *InDisk) makeKey(index uint64) []byte {

func (s *InDisk) makeKey(index uint64) []byte {
	key := make([]byte, 8)
	binary.LittleEndian.PutUint64(key, index)

	return key
}

Can you see why there is a problem after 255 ?

@PascalinDe
Copy link
Author

so what you are saying is that when they index hits 256 and therefore the "next byte" and the lowest byte becomes 0, Bolt is not correctly interpreting the position of the key in the ordering anymore since we are using LitteEndian

so changing it to BigEndian should solve the problem?

@PascalinDe PascalinDe transferred this issue from dedis/d-voting Sep 7, 2023
@PascalinDe
Copy link
Author

(moved this to Dela is the problem is clearly on this side)

@nkcr
Copy link
Contributor

nkcr commented Sep 7, 2023

so changing it to BigEndian should solve the problem?

yes :)

@ineiti
Copy link
Member

ineiti commented Sep 28, 2023

Is included in c4dt/dela

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants