Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes bug with bulk RW file partition point creation #236

Merged
merged 2 commits into from
Oct 6, 2022

Conversation

keith-turner
Copy link
Contributor

There was code that did the following

TreeSet<Integer> startRows = new TreeSet<>();
startRows.add(0);
while (startRows.size() < parts)
  startRows.add(rand.nextInt(LOTS));

The above code was replaced in 7453c37 with a stream. The stream did not fully capture the original behavior of the loop. This change makes the stream fully capture that behavior. Need to ensure that parts unique random numbers are generated including zero (like if the random number generator returns zero it should be properly deduplicated). The stream was not properly handling the RNG returning duplicates or zero.

There was code that did the following

    TreeSet<Integer> startRows = new TreeSet<>();
    startRows.add(0);
    while (startRows.size() < parts)
      startRows.add(rand.nextInt(LOTS));

The above code was replaced in 7453c37 with a stream. The stream
did not fully capture the original behavior of the loop.  This
change makes the stream fully capture that behavior.  Need to
ensure that `parts` unique random numbers are generated including
zero (like if the random number generator returns zero it should
be properly deduplicated).  The stream was not properly handling
the RNG returning duplicates or zero.
@keith-turner
Copy link
Contributor Author

I'm beginning to wonder if we should replace the stream with the original loop. The created set must always have at least 0 in it. Will the stream changes I created with Stream.concat result in this?

@keith-turner
Copy link
Contributor Author

I'm beginning to wonder if we should replace the stream with the original loop. The created set must always have at least 0 in it. Will the stream changes I created with Stream.concat result in this?

I looked at the stream javadoc and it will maintain order, so since 0 comes first it should always be there.

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/stream/Stream.html#concat(java.util.stream.Stream,java.util.stream.Stream)

I think the stream code may be slightly more readable than the loop with the explicit call to distinct(). So I think its slightly better than the loop. I am going to add a comment about zero.

@keith-turner
Copy link
Contributor Author

keith-turner commented Oct 6, 2022

I lost the exception, but I was seeing an error when running the Bulk RW test. When duplicate rand nums were generated it would create a set that was too small and this would result in an index out bounds exception in a later loop. This PR is in support of running bulk RW test for apache/accumulo#2667.

@keith-turner keith-turner merged commit 0f7201f into apache:main Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants