Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption using Grouped operator in Akka Streams #25623

Closed
dembol opened this issue Sep 14, 2018 · 10 comments
Closed

High memory consumption using Grouped operator in Akka Streams #25623

dembol opened this issue Sep 14, 2018 · 10 comments
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted help wanted Issues that the core team will likely not have time to work on t:stream

Comments

@dembol
Copy link

dembol commented Sep 14, 2018

I’ve noticed a serious problem with high memory consumption while using streams terminated by SinkRef and grouped operator somewhere in a stream topology. The grouped operator is backed by a VectorBuilder which is cleared when onPush or onUpstreamFinish callbacks are executed. Unfortunately clearing the vector in Scala 2.12 means allocating new root node but still keeping hard references to the obsolete nodes on the higher levels of a tree which cannot be Garbage Collected. I see that some additional cleaning has been added in Scala 2.13 which will eliminate the problem in the future - scala/scala@845b0f0#diff-59f3462485b74027de4fd5e9febcc81bR627

The situation can be very dangerous when somebody wants to use grouped operator with high number of grouped elements, connect it to the sink produced by StreamRefs.sourceRef() and send materialized SourceRef over the network. Why it’s so dangerous? SourceRefImpl and SinkRefImpl start watching their partners what causes sending a WATCH SystemMessage with watchee and watcher remote paths. Those paths are deserialized by SystemMessageSerializer.deserializeSystemMessage which in turn calls RemoteActorRefProvider.resolveActorRef which uses ActorRefResolveThreadLocalCache backed by LruBoundedCache. If we have grouped stage with lots of heavy elements we may observe OutOfMemoryErrors in short time if we materialize many such streams.

Here are screenshots from profiler:

telemetry

heap_walker

The quick fix is to use modified grouped operator using a List instead of a Vector (https://gist.github.com/dembol/b69d205ca35af7ec19453e66affbb10c) or use grouped operator with less than 32 elements.

@dembol dembol changed the title High memory consumption using grouped operator in Akka Streams High memory consumption using Grouped operator in Akka Streams Sep 14, 2018
@patriknw
Copy link
Member

That's an interesting finding. Wouldn't an appropriate solution be to create a new VectorBuilder instead of clearing it?

We use clear in one other place, in ByteStringBuilder#clear. Perhaps we should change there also to be on the safe side?

@patriknw patriknw added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted help wanted Issues that the core team will likely not have time to work on t:stream labels Sep 17, 2018
@patriknw
Copy link
Member

@dembol would you be able to create a pull request?

@dembol
Copy link
Author

dembol commented Sep 18, 2018

Sure, it would be nice to start contributing :) I'll investigate a ByteStringBuilder#clear too and prepare a PR this week.

@jrudolph
Copy link
Member

That seems to be also a candidate for an upstream fix in Scala itself. Might make sense to check if that is a known issue for VectorBuilder?

@He-Pin
Copy link
Member

He-Pin commented Apr 26, 2022

@dembol Are you still plan to contribute to this?

@He-Pin
Copy link
Member

He-Pin commented Apr 27, 2022

I updated it in scala/scala#10019

@He-Pin
Copy link
Member

He-Pin commented Apr 27, 2022

@patriknw I think once Akka updated to scala 2.12.16, then this issue can be closed then.

@johanandren
Copy link
Member

If it is fixed upstream I think we can close it already now since Akka does not affect the Scala patch version of consuming project?

@jrudolph
Copy link
Member

jrudolph commented May 2, 2022

I agree. Fixed by scala/scala#10019 due for Scala 2.12.16, thanks for following up on that, @hepin1989!

@jrudolph jrudolph closed this as completed May 2, 2022
@jrudolph
Copy link
Member

jrudolph commented May 2, 2022

(Not adding a milestone since nothing seems appropriate, technically it's a "will not fix (here)".)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted help wanted Issues that the core team will likely not have time to work on t:stream
Projects
None yet
Development

No branches or pull requests

5 participants