Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider add a distinct stage? #22

Open
zhxiaogg opened this issue Jun 14, 2016 · 5 comments
Open

consider add a distinct stage? #22

zhxiaogg opened this issue Jun 14, 2016 · 5 comments

Comments

@zhxiaogg
Copy link
Contributor

Recently I need a distinct stage which at first sight I think it should take a buffer holding distinct elements and preventing duplicated elements from pushing downstream.

Considering that the stream may never stop, it could be dangerous when the buffer keeps growing. But in my scenario the stream will definitely stop and the buffer size is predictable.

Any suggestion?

BTW, I've seen akka/akka#19395 proposing for adding a dedupe stage.

@viktorklang
Copy link
Member

Use statefulMapConcat?

@ktoso
Copy link
Member

ktoso commented Jun 14, 2016

For what it's worth, for streaming "uniqueness detection in face of a crap-ton of elements" bloom filters can be used. Not sure how to handle dependencies for that here though.

@drewhk
Copy link
Member

drewhk commented Jun 14, 2016

There are many ways to do that, BloomFilters is only one. Alternative is to keep a buffer of bounded size of seen elements, and remove the oldest entry once the buffer gets full. I think we can have multiple style of dedupe operators.

@zhxiaogg
Copy link
Contributor Author

I consider Bloom filter as a higher order choice.

I use Akka stream + Cassandra to provide a reactive query api. And in my scenario all the streams are short lived and a buffer in memory dose not really hurt.

As to dedupe, I think distinct is a bit different from it and would be more intuitive.

@zhxiaogg
Copy link
Contributor Author

Guava has a BloomFilter in beta phase though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants