Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-3322] MemoryManager creates too much GC pressure with iterative jobs. #1769

Closed
wants to merge 1 commit into from
Closed

Conversation

Xazax-hun
Copy link
Contributor

This fix uses the approach suggested by Gábor Gévay on the mailing list: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html

Instead of making the GC to free the segments, they are returned to the pool and allocation of new segments are modified to work accordingly. Note that, the size of the pool is never decreasing, but right now that is unlikely to cause any trouble. If once it is desired to decrease the size of the pool dynamically soft references can be used.

Some benchmarks using the new test case:
preAllocateMemory == false
Before the patch: 7s with 500m heap, 55s with 5000m heap
After the patch: 5s with 500m heap, 8s with 5000m heap

preAllocateMemory == true
Before the patch: 4s with 500m heap, 8s with 5000m heap
After the patch: 5s with 500m heap, 8s with 5000m heap

@mbalassi
Copy link
Contributor

mbalassi commented Mar 6, 2016

I would be curious about the soft reference implementation as in DetaIteration cases I think it is a valid situation that the job needs less and less memory. Please add the licence header to the manual test file.

@ggevay
Copy link
Contributor

ggevay commented Mar 6, 2016

I would be curious about the soft reference implementation as in DetaIteration cases I think it is a valid situation that the job needs less and less memory.

Are there any operators that dynamically adjust how much memory they take based on the amount of input data?

@Xazax-hun
Copy link
Contributor Author

I would be curious about the soft reference implementation as in DetaIteration cases I think it is a valid situation that the job needs less and less memory. Please add the licence header to the manual test file.

This is my first attempt to use a "soft reference pool":
https://github.com/Xazax-hun/flink/commit/2694910e53b2f86412f2a9c3e4d83cf1705e3c65

I could not measure any performance gain compared to the code before this pull request as a baseline. Hopefully I will have some extra time tomorrow, so I can further investigate whether there is something wrong with my first implementation or the approach.

@ggevay
Copy link
Contributor

ggevay commented Mar 8, 2016

The Memory tab in Java Mission Control can probably help with investigating this.

@StephanEwen
Copy link
Contributor

I think the right way to solve that would actually not be in the MemoryManager (to cache), but in the operators that know that they will need the memory again and again and will hold onto it.

For another feature in the code, I am currently changing the access to memory such that it goes through "allocators". These are used by for example the sorter, to get their memory segments from the memory manager. It would be simple to create them in the batch operators such that they don't immediately release to the mem manager, but only after the operator is disposed. That means memory segments will be held onto through all iterations.

@Xazax-hun
Copy link
Contributor Author

I think the soft references solution is not worth investigating, and I agree that the best way to solve the problem is to make the operators smarter for the iterative jobs. Do you want to merge this pull request to temporarily solve the problem until the other solution is materialized? In case the answer is no, I think I might give this Jira back to someone else to be able to focus on serialization (in case the community accept me to work on that.)

@aljoscha
Copy link
Contributor

I'm closing this as "Abandoned", since there is no more activity and the code base has moved on quite a bit. Please re-open this if you feel otherwise and work should continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants