-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DistributionSchedule uses custom wrapped int[] rather than HashSet #657
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great change. And I understand the benefit. Please take into account the possibility to not introduce a new client side dependency
bookkeeper-server/pom.xml
Outdated
@@ -178,6 +178,11 @@ | |||
<artifactId>http-server</artifactId> | |||
<version>${project.version}</version> | |||
</dependency> | |||
<dependency> | |||
<groupId>com.carrotsearch</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have checked license for this?
Is it really needed? We already have several custom collections, maybe we could have our own implementation of IntArrayList without importing a whole third party lib. This will be used on client side.
It is used in EnsemblePlacement policy which is very often customized, at least in my projects.
I would not like to impose such dep on public plugins API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second to this comment here:
- for license, we need to update src/main/resources/NOTICE.bin.txt
- it is a new dependency introduced on client. It would prefer if we can avoid using this dependency.
- I am not sure what would be the benefits of using IntArrayList. If boxing, unboxing Integer and using an ArrayList is a problem, why can't we use
int[]
because we don't need resizing ability? we can wrapint[]
into a recycle class, it should be simple and efficient, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have checked license for this?
License is Apache v2.
I am not sure what would be the benefits of using IntArrayList. If boxing, unboxing Integer and using an ArrayList is a problem, why can't we use int[] because we don't need resizing ability? we can wrap int[] into a recycle class, it should be simple and efficient, no?
Yes, basically the class is a wrapper on top of an int[]
. I guess the option here would be to either reimplement the "set" logic on the int[]
or maybe import just the needed class.
We already have several custom collections, maybe we could have our own implementation of IntArrayList without importing a whole third party lib
Yes, the other collections were more about having no-boxing and concurrent collections. For non-concurrent primitive collections there are many available implementations (like the HPPC here).
@@ -88,14 +89,16 @@ public BookieSocketAddress replaceBookie(int ensembleSize, int writeQuorumSize, | |||
} | |||
|
|||
@Override | |||
public List<Integer> reorderReadSequence(ArrayList<BookieSocketAddress> ensemble, List<Integer> writeSet, | |||
Map<BookieSocketAddress, Long> bookieFailureHistory) { | |||
public IntArrayList reorderReadSequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are breaking compatibility. We already did break compatibility for policy in 4.5....
Anyway if you have I would like to have a BK class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this, if it helps with performance - because the stability of this interface is Evolving
. so it is okay to break the compatibility in minor release: https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/annotation/InterfaceStability.java#L41
Refer to this link for build results (access rights to CI server needed): |
private final static Logger LOG = LoggerFactory.getLogger(PendingAddOp.class); | ||
|
||
ByteBuf toSend; | ||
AddCallback cb; | ||
Object ctx; | ||
long entryId; | ||
int entryLength; | ||
Set<Integer> writeSet; | ||
IntHashSet writeSet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically I think we don't need Set here. I bet an int[]
would work here as well.
Looking at this again, bringing in an outside library is overkill. I'm going to try it with just a raw int[]. The placement policy stuff needs to be rewritten as well. It came in after the yahoo changes were made, and undoes a lot of the good work on the read path. @merlimat @sijie @eolivelli |
@ivankelly what changes were undoed by the placement policy in the master? the |
Ah, i didn't see it was optional. If enabled, it causes a bunch of allocations. Perhaps this can be ignored for now. This is more important for the write path, and that's allocated per entry, whereas on read it is per block. |
Actually, reading again, it is per entry on read too. Will think about what the best way to proceed is. |
This avoids the autoboxing on Integers and allocations of cells for the hashset. This also implies using int[] for the write set in the DistributionSchedules and PlacementPolicies. This patch was originally submitted as dc7933b on the yahoo-4.3 branch, though this has been modified extensively to remove the dependency on carrotsearch hppc and to allow it to work with placement policies which didn't exist at the time of the original patch.
7769023
to
a4481d0
Compare
@sijie @merlimat @eolivelli This is ready for review again. Using exclusively int[] now. Removed all allocations from the read reordering also. |
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 looks good.
I have left some minor comment.
@@ -92,7 +93,8 @@ PendingAddOp enableRecoveryAdd() { | |||
|
|||
void setEntryId(long entryId) { | |||
this.entryId = entryId; | |||
writeSet = new HashSet<Integer>(lh.distributionSchedule.getWriteSet(entryId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move the allocation to the constructor w will have a write set full of zeroes which could appear valid.
In previous code an NPE would occur in case of missed call to setentryid. Recently I caught such kind of error while playing with ledger handleadv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updating
ArrayList<BookieSocketAddress> ensemble, | ||
Map<BookieSocketAddress, Long> bookieFailureHistory, | ||
int[] writeSet) { | ||
return super.reorderReadLACSequence(ensemble, bookieFailureHistory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to call super? We can just not override the method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it as it always was.
*/ | ||
package org.apache.bookkeeper.util.collections; | ||
|
||
import java.util.Arrays; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: reorder imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, though I'm not sure what order you want (I don't use an IDE).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivankelly for me is OK, I think that we have no clear rules about import order, maybe we should discuss on the list. Maybe checkstyle will impose some ordering
//cc @sijie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, a checkstyle would be good. However turning it on would result in a bunch of errors straight away. Personally, I have little preference for imports, as long as they look neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is already a checkstyle rule. it applies on the modules that have checkstyle enabled. the packages in bookkeeper-server are not enforced yet.
Import Static and then Import. And sort the imports alphabetically. try to avoid start import.
<module name="ImportOrder">
<property name="severity" value="error"/>
<!-- This ensures that static imports go first. -->
<property name="option" value="top"/>
<property name="sortStaticImportsAlphabetically" value="true"/>
<property name="tokens" value="STATIC_IMPORT, IMPORT"/>
<message key="import.ordering"
value="Import {0} appears after other imports that it should precede"/>
</module>
<module name="AvoidStarImport">
<property name="severity" value="error"/>
</module>
I think the patch is ready. I will wait for the checks. we need to wait for @sijie final review |
retest |
import java.util.Random; | ||
import org.apache.commons.lang3.ArrayUtils; | ||
|
||
public class ArrayUtils2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why it is ArrayUtils2
not ArrayUtils
?
also this class doesn't sound like a collection
. it should probably in util package, rather than here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also move this code to bookkeeper common module ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah, noone of these are generic enough. Moving into the WriteSet wrapper class anyhow.
import java.util.Random; | ||
import org.apache.commons.lang3.ArrayUtils; | ||
|
||
public class ArrayUtils2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add at least a class comment?
public class ArrayUtils2 { | ||
static Random r = new Random(); | ||
|
||
public static int[] addMissingIndices(int[] array, int maxIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- it would be good to add a comment on this method to explain what it does here.
- all the utils here are not generic enough as an
array
util. it is more like distribution schedule util, because it is more specific to the way we do round-robin striping in ensemble. so I would suggest moving it intoutil
package and renaming it asDistributionScheduleUtils
and have some better names for those methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we call this DistributionScheduleUtils maybe it is better to keep them in bookkeeper-server module
BookieSocketAddress addr = curEnsemble.get(bi); | ||
bookieClient.readEntry(addr, lh.getId(), | ||
entryToRead, eecb, null); | ||
final ArrayList<BookieSocketAddress> curEnsembleFinal = curEnsemble; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why do you do this assignment here? It doesn't seem required to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed. Got pulled in as part of cherry-pick, will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled in as part of cherry-pick. I guess there was a callback here at some point which required the final that isn't there now. removed.
|
||
writeSet = new int[lh.metadata.getWriteQuorumSize()]; | ||
// require writeSet be populated for it to be usable | ||
Arrays.fill(writeSet, -12345); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define a CONSTANT for -12345
.
|
||
private final int[][] writeSets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it is a good idea to do so. it is kind of changing the garbage collection behavior. you are going to maintain ensmebleSize * writeQuorumSize
ints per ledger. if your client has N ledgers, that is going to be N * ensembleSize * writeQuorumSize, those memory are going to sit in old gen for a long time.
if you allocate int[]
on getWriteSet, it is one-time use and the memory can be reclaimed on young gen gc.
If you want to avoid allocation, you can wrap int[] into a WriteSet class and only retrieve a WriteSet and set the values rather than preallocating all the write sets.
I can see other benefits when wrapping int[] into a WriteSet:
- ensemble placement policy doesn't have to deal with int[] directly. it can deal with WriteSet.
- all the util functions related int[] can be potentially be part of WriteSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping is a good idea. Doing that. Will have a knock on effect to the rest of the comments.
@@ -136,6 +221,7 @@ public QuorumCoverageSet getCoverageSet() { | |||
|
|||
@Override | |||
public boolean hasEntry(long entryId, int bookieIndex) { | |||
return getWriteSet(entryId).contains(bookieIndex); | |||
int[] set = writeSets[(int) (entryId % ensembleSize)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be further improved. because it is just checking whether an entry is supposed to be stored in one bookie. I think we can simply calculate it.
*/ | ||
package org.apache.bookkeeper.util.collections; | ||
|
||
import java.util.Arrays; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is already a checkstyle rule. it applies on the modules that have checkstyle enabled. the packages in bookkeeper-server are not enforced yet.
Import Static and then Import. And sort the imports alphabetically. try to avoid start import.
<module name="ImportOrder">
<property name="severity" value="error"/>
<!-- This ensures that static imports go first. -->
<property name="option" value="top"/>
<property name="sortStaticImportsAlphabetically" value="true"/>
<property name="tokens" value="STATIC_IMPORT, IMPORT"/>
<message key="import.ordering"
value="Import {0} appears after other imports that it should precede"/>
</module>
<module name="AvoidStarImport">
<property name="severity" value="error"/>
</module>
List<Integer> unAvailableList = new ArrayList<Integer>(writeSet.size()); | ||
for (Integer idx : writeSet) { | ||
|
||
int localMask = 0x01 << 24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these can be constants?
@@ -500,19 +504,26 @@ protected BookieNode replaceFromRack(BookieNode bookieNodeToReplace, | |||
} | |||
|
|||
@Override | |||
public final List<Integer> reorderReadSequence(ArrayList<BookieSocketAddress> ensemble, List<Integer> writeSet, Map<BookieSocketAddress, Long> bookieFailureHistory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth adding a BC test comparing the logic here with the logic in 4.5.0. they should be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests that are already there effectively test B.C. as they put a write set through a reorder and check if it matches a set value.
This object is pooled using a netty recycler.
Refer to this link for build results (access rights to CI server needed): |
Not ready for review yet. Still some comments to address. |
@sijie @eolivelli ready for review again. There's a lot of changes for the changing to a wrapped write set. Multiple patches, though only the first one is big. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work! @ivankelly
@@ -323,7 +327,12 @@ private synchronized int getNextReplicaIndexToReadFrom() { | |||
} | |||
|
|||
private int getReplicaIndex(int bookieIndex) { | |||
return writeSet.indexOf(bookieIndex); | |||
for (int i = 0; i < writeSet.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this as a method of WriteSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeSet already has indexOf. Removing the whole method.
// Iterate over set and trigger the sendWriteRequests | ||
DistributionSchedule.WriteSet writeSet | ||
= lh.distributionSchedule.getWriteSet(entryId); | ||
for (int i = 0; i < writeSet.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try {
...
} finally {
writeSet.recycle();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public List<Integer> reorderReadLACSequence(ArrayList<BookieSocketAddress> ensemble, | ||
List<Integer> writeSet, | ||
Map<BookieSocketAddress, Long> bookieFailureHistory); | ||
public DistributionSchedule.WriteSet reorderReadLACSequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use 'void' as return type:
we are changing the signature and we do not want client code to allocate WriteSet but we would like the given WriteSet to be modified.
Anyway I am fine with this version.
But I would like to know @ivankelly and @sijie options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eolivelli I think void would make it a little harder to use.
For example, in a lot of places we do
WriteSet reordered = policy.reorderReadSequnce(ensemble, failed, schedule.getWriteSet(eid))
If reorder returned void, then we would need to first assign the write set returned from getWriteSet to a variable and then pass it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivankelly I see, thank you for your clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 great work Ivan
This avoids the autoboxing on Integers and allocations of cells for the hashset. This also implies using wrapped int[] object for the write set in the DistributionSchedules and PlacementPolicies. This patch was originally submitted as dc7933b on the yahoo-4.3 branch, though this has been modified extensively to remove the dependency on carrotsearch hppc and to allow it to work with placement policies which didn't exist at the time of the original patch. Author: Ivan Kelly <ivank@apache.org> Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org> This closes apache#657 from ivankelly/yahoo-backports
…onfirmedAndEntry Descriptions of the changes in this PR: *Motivation* There are two bugs in `ReadLastAddConfirmedAndEntry`: 1) a regression was introduced by #657. the long poll read op is attempting to long-poll reading lac. since lac is stored across ensemble, so the retry logic assumes it will attempt over all the bookies in the ensemble. however #657 use a `write-quorum-size` write set for tracking those attempts. this leads to ArrayIndexOutOfBoundsException reported at #1403. The integrate tests added in this PR can easily reproduce this issue. 2) there was a bug on retry logic when a ledger whose ensemble size is larger than write quorum size. when this happens, it will claim lac is not advanced prior to attempt the bookie in the ensemble. so the client will never know lac is advanced if using long poll reads on such ledgers. The integrate tests added in this PR can also catch this issue. disclaim: twitter uses long poll reads but never uses `ensembleSize > writeQuorumSize`. so this is not a problem for dlog users. *Solution* - introduce a `getWriteSetForLongPoll` call that uses `ensembleSize` for building the write set. this would address problem 1) - fix the assignment of `numEmptyResponsesAllowed`, so the long poll reads can work with `ensembleSize > writeQuorumSize` - add integration tests for long polling reads at the same time, also add an integration test for normal tailing reads with Author: Sijie Guo <sijie@apache.org> Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Jia Zhai <None> This closes #1404 from sijie/longpoll_tests, closes #1403
…onfirmedAndEntry Descriptions of the changes in this PR: *Motivation* There are two bugs in `ReadLastAddConfirmedAndEntry`: 1) a regression was introduced by #657. the long poll read op is attempting to long-poll reading lac. since lac is stored across ensemble, so the retry logic assumes it will attempt over all the bookies in the ensemble. however #657 use a `write-quorum-size` write set for tracking those attempts. this leads to ArrayIndexOutOfBoundsException reported at #1403. The integrate tests added in this PR can easily reproduce this issue. 2) there was a bug on retry logic when a ledger whose ensemble size is larger than write quorum size. when this happens, it will claim lac is not advanced prior to attempt the bookie in the ensemble. so the client will never know lac is advanced if using long poll reads on such ledgers. The integrate tests added in this PR can also catch this issue. disclaim: twitter uses long poll reads but never uses `ensembleSize > writeQuorumSize`. so this is not a problem for dlog users. *Solution* - introduce a `getWriteSetForLongPoll` call that uses `ensembleSize` for building the write set. this would address problem 1) - fix the assignment of `numEmptyResponsesAllowed`, so the long poll reads can work with `ensembleSize > writeQuorumSize` - add integration tests for long polling reads at the same time, also add an integration test for normal tailing reads with Author: Sijie Guo <sijie@apache.org> Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Jia Zhai <None> This closes #1404 from sijie/longpoll_tests, closes #1403 (cherry picked from commit 9259173) Signed-off-by: Sijie Guo <sijie@apache.org>
…stAddConfirmedAndEntry Descriptions of the changes in this PR: *Motivation* There are two bugs in `ReadLastAddConfirmedAndEntry`: 1) a regression was introduced by apache#657. the long poll read op is attempting to long-poll reading lac. since lac is stored across ensemble, so the retry logic assumes it will attempt over all the bookies in the ensemble. however apache#657 use a `write-quorum-size` write set for tracking those attempts. this leads to ArrayIndexOutOfBoundsException reported at apache#1403. The integrate tests added in this PR can easily reproduce this issue. 2) there was a bug on retry logic when a ledger whose ensemble size is larger than write quorum size. when this happens, it will claim lac is not advanced prior to attempt the bookie in the ensemble. so the client will never know lac is advanced if using long poll reads on such ledgers. The integrate tests added in this PR can also catch this issue. disclaim: twitter uses long poll reads but never uses `ensembleSize > writeQuorumSize`. so this is not a problem for dlog users. *Solution* - introduce a `getWriteSetForLongPoll` call that uses `ensembleSize` for building the write set. this would address problem 1) - fix the assignment of `numEmptyResponsesAllowed`, so the long poll reads can work with `ensembleSize > writeQuorumSize` - add integration tests for long polling reads at the same time, also add an integration test for normal tailing reads with Author: Sijie Guo <sijie@apache.org> Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Jia Zhai <None> This closes apache#1404 from sijie/longpoll_tests, closes apache#1403 (cherry picked from commit 9259173) Signed-off-by: Sijie Guo <sijie@apache.org> Signed-off-by: JV Jujjuri <vjujjuri@salesforce.com>
…o issues like connection loss Quick fix to make the test more reliable. Author: Andor Molnar <andor@apache.org> Reviewers: fangmin@apache.org, eolivelli@gmail.com, andor@apache.org Closes apache#657 from anmolnar/ZOOKEEPER-3157
This avoids the autoboxing on Integers and allocations of cells for
the hashset.
This also implies using wrapped int[] object for the write set in the
DistributionSchedules and PlacementPolicies.
This patch was originally submitted as dc7933b on the yahoo-4.3
branch, though this has been modified extensively to remove the
dependency on carrotsearch hppc and to allow it to work with placement
policies which didn't exist at the time of the original patch.