-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quorum during self repair should fetch from all nodes #859
quorum during self repair should fetch from all nodes #859
Conversation
what happens if we have both network_issue and transaction_not_exists which one should we raise ? |
510d9c7
to
120dc16
Compare
lib/archethic/p2p.ex
Outdated
message, | ||
conflict_resolver, | ||
acceptance_resolver, | ||
consistency_level - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should not reduce the consistency level as we want at least 3 results (by defaut), so here only 2 nodes will be asked instead of 3. Then 1, then 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to remove one - 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty
lib/archethic/p2p.ex
Outdated
message, | ||
conflict_resolver, | ||
acceptance_resolver, | ||
consistency_level - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to remove one - 1
end | ||
) | ||
|
||
assert {:ok, %NotFound{}} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I runned this test multiple times, I have some case where the last message is not %NotFound
. We should be sure the test is deterministic. Maybe the mock %NotFound is returned faster than the last %Transaction, so the last element of the list for the conflict resolver is %Transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't reproduce it on my machine it's weird, normally this shouldn't happen but I will add a timer.sleep() before the NotFound message to make sure that the other messages have arrived.
@@ -130,6 +130,37 @@ defmodule Archethic.SelfRepair.Sync.TransactionHandlerTest do | |||
) | |||
end | |||
|
|||
test "download_transaction/2 should download the transaction even after a first failure" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test does not test what it should.
As this function uses quorum, it will request to the 3 connected nodes (constency level to 3) so the first quorum attempt will get both result (network_issue and tx) and so the conflict resolver will return the tx. So your test actually verify if the conflict resolver works well.
To test the acceptance resolver behavior, you should add more nodes and the first 3 should return an error, then the next should return the transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right
Description
During the self repair, we use the quorum read to fetch the missed transactions and the previous summary aggregate.
But if there is some synchronization error, the 3 requested node from the quorum may return that the transaction does not exist and so the self repair crashes.
If the transaction is in the beacon summary it means that it exists and so it should be fetched during self repair.
If the quorum read return an unexpected response (transaction does not exists, transaction invalid ...), we remove from the node list the previously requested nodes, and retry the quorum until we get the expected response or the node list is empty
Fixes #856
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist: