Skip to content

Made parallel APIs for SetOperations in Theta, and Tuple (generics).#348

Merged
leerho merged 3 commits into
masterfrom
Prep_for_2.0.0-RC3
Feb 9, 2021
Merged

Made parallel APIs for SetOperations in Theta, and Tuple (generics).#348
leerho merged 3 commits into
masterfrom
Prep_for_2.0.0-RC3

Conversation

@leerho
Copy link
Copy Markdown
Member

@leerho leerho commented Feb 8, 2021

Made parallel APIs for SetOperations in Theta, and Tuple (generics): We now have both stateful and stateless operations for union, intersection and AnotB. Added comprehensive examples (and live tests) for these added stateless and stateful operations.

Made consistent naming of union and intersection operations in ArrayOfDoubles package. We don't have stateless operations for union and intersection in ArrayOfDoubles. This can wait for when we do major refactoring of that package.

Also added more complete testing for the inequality search methods for long arrays, which were missing.

Copy link
Copy Markdown
Member

@davecromberge davecromberge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good 👍

I have a question about the use of reset to reset the state within the stateless intersection and union operations. Specifically, was this related to the bug that you found in the release candidate?
How likely is it that same union/intersection instance could be used in some concurrent context, like an RDD in Spark? This question seems to have come up in the mailing list - is it worth putting one or two lines of java doc in the SetOperation base class? Otherwise, resets may interleave.

@leerho
Copy link
Copy Markdown
Member Author

leerho commented Feb 8, 2021

"Stateless" does not imply concurrent. It just means everything is completed in one action. The term "stateless" is also used to contrast it with "stateful" where the sketch object holds state between subsequent calls to the same object. The user must be aware of this stateful property and respect it. "Stateless" also does not imply that "stateless" and "stateful" operations can be interleaved.

All of our sketches are non-thread-safe, with the exception of the "Concurrent*ThetaSketch" family, which was specifically designed to be thread-safe.

Perhaps this comment should be in the documentation somewhere?

@leerho
Copy link
Copy Markdown
Member Author

leerho commented Feb 8, 2021

We already have a clear statement of "not-thread-safe" in our documentation ... do we need to say more?

@davecromberge
Copy link
Copy Markdown
Member

davecromberge commented Feb 8, 2021

We already have a clear statement of "not-thread-safe" in our documentation ... do we need to say more?

I agree with you - you raise a good point that "stateless" does not imply concurrent. This is just something that I thought of in passing, and admittedly it's a contrived scenario, based on one or two that may have come up on the mailing list. Personally, I don't see any reason to reuse a set operation builder, and in our use case we construct a virgin instance for every union or intersection operation - despite serving many concurrent requests.

The documentation you have linked to is clear and explicit about the concerns raised in my comment. Thanks for. your response and additional detail.

@leerho leerho merged commit 0dea9cd into master Feb 9, 2021
@leerho leerho deleted the Prep_for_2.0.0-RC3 branch February 9, 2021 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants