-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE #1390 Ensemble change on delayed write error #1395
Conversation
Error on delayed writes are dropped if the addEntry is in complete state (ack quorum satisfied). This change records the delayed write failure and forces ensemble change onthe next write. This saves from having extended under replicated status on the ledger and also avoids unnecessary build up at PCBC channel. Signed-off-by: Venkateswararao Jujjuri (JV) <vjujjuri@salesforce.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is very smart.
I left some minor comment
I wonder if we should add a config option in order to enable explicitly this change, maybe we could keep it 'disabled' in 4.8 and then change default to 'enable' in 4.9
I also wonder if it is possible to use mockito based client side framework and save us from starting a real cluster in tests. (Not blocker for me)
@@ -1749,6 +1755,42 @@ EnsembleInfo replaceBookieInMetadata(final Map<Integer, BookieSocketAddress> fai | |||
return new EnsembleInfo(newEnsemble, failedBookies, replacedBookies); | |||
} | |||
|
|||
void handleDelayeWriteBookieFailure() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken care
@@ -151,6 +153,10 @@ | |||
} | |||
} | |||
|
|||
public Map<Integer, BookieSocketAddress> getDelayedWriteFailedBookies() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to expose as 'public' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took care of it.
@@ -117,6 +118,7 @@ | |||
ScheduledFuture<?> timeoutFuture = null; | |||
|
|||
final long waitForWriteSetMs; | |||
Map<Integer, BookieSocketAddress> delayedWriteFailedBookies = new HashMap<Integer, BookieSocketAddress>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why this is not a ConcurrentMap ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of or ordered safe executor, I thought a regular map is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same. Just wanted to double check.
@eolivelli I thought about it - Felt that it can just follow the same config path we use to make ensemble change on the failed add entry path. This patch still respects delayedEnsemble change and disableEnsembleChange configurations. |
Are you making a general comment about tests in this class? if so, maybe we can open an issue and deal with that separately? |
For me it is okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Regarding the test: I did not think that this is not a new class, so okay for keeping your test as it is. Regarding the map: okay for making it a concurrent map, in this case the map will be very small and rarely used so using a ConcurrentHashMap will not have significant impact and it makes things simpler. It can be a simple hashmap btw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvrao before giving a detailed review, just one general question:
any reason why we can't re-use bookieFailureHistory loading cache?
@@ -117,6 +118,7 @@ | |||
ScheduledFuture<?> timeoutFuture = null; | |||
|
|||
final long waitForWriteSetMs; | |||
private Map<Integer, BookieSocketAddress> delayedWriteFailedBookies = new HashMap<Integer, BookieSocketAddress>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can do that; but would wait to see if you have any other comments.
I thought the purpose is different and usage is different. In this case, it the index, not entryId that goes in with bookieIP, also this will be cleared as soon as the ensemble change attempted, and the population of this is based on the configuration parameters based on ensemble change etc etc. Moreover simple to have another field that is not bundled with something else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvrao looks good to me.
retest this please |
2 similar comments
retest this please |
retest this please |
Descriptions of the changes in this PR: The Original intent of this change is to do a best-effort ensemble change. But this is not possible until the local metadata is completely immutable. Until the feature "Make LedgerMetadata Immutable #610" Is complete we will use handleBookieFailure() to handle delayed writes as regular bookie failures. Signed-off-by: Venkateswararao Jujjuri (JV) <vjujjurisalesforce.com> Master Issue: #1591 Relate Issue: #1395 Author: JV Jujjuri <vjujjuri@salesforce.com> Author: Ivan Kelly <ivank@apache.org> Reviewers: Ivan Kelly <ivank@apache.org>, Sijie Guo <sijie@apache.org> This closes #1592 from jvrao/datalossbug
Descriptions of the changes in this PR: The Original intent of this change is to do a best-effort ensemble change. But this is not possible until the local metadata is completely immutable. Until the feature "Make LedgerMetadata Immutable #610" Is complete we will use handleBookieFailure() to handle delayed writes as regular bookie failures. Signed-off-by: Venkateswararao Jujjuri (JV) <vjujjurisalesforce.com> Master Issue: #1591 Relate Issue: #1395 Author: JV Jujjuri <vjujjuri@salesforce.com> Author: Ivan Kelly <ivank@apache.org> Reviewers: Ivan Kelly <ivank@apache.org>, Sijie Guo <sijie@apache.org> This closes #1592 from jvrao/datalossbug (cherry picked from commit 3ab6e92) Signed-off-by: Ivan Kelly <ivank@apache.org>
* Avoid releasing sent buffer to early in BookieClient mock If the buffer was sent to more than one bookie with the mock, it would be released after being sent to the first one. Each write should retain a refCount themselves, and then release when done. Author: Ivan Kelly <ivank@apache.org> Reviewers: Sijie Guo <sijie@apache.org> This closes apache#1598 from ivankelly/double-rel-mock * (@bug W-5344681@) Delayed write ensemble change may cause dataloss Descriptions of the changes in this PR: The Original intent of this change is to do a best-effort ensemble change. But this is not possible until the local metadata is completely immutable. Until the feature "Make LedgerMetadata Immutable apache#610" Is complete we will use handleBookieFailure() to handle delayed writes as regular bookie failures. Signed-off-by: Venkateswararao Jujjuri (JV) <vjujjurisalesforce.com> Master Issue: apache#1591 Relate Issue: apache#1395 Author: JV Jujjuri <vjujjuri@salesforce.com> Author: Ivan Kelly <ivank@apache.org> Reviewers: Ivan Kelly <ivank@apache.org>, Sijie Guo <sijie@apache.org> @Rev Sam Just@ This closes apache#1592 from jvrao/datalossbug
Error on delayed writes are dropped if the addEntry
is in complete state (ack quorum satisfied).
This change records the delayed write failure and forces
ensemble change onthe next write. This saves from having
extended under replicated status on the ledger and also
avoids unnecessary build up at PCBC channel.
Signed-off-by: Venkateswararao Jujjuri (JV) vjujjuri@salesforce.com
Descriptions of the changes in this PR:
(PR description content here)...
Master Issue: #