Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workaround for memory leaks in bag generators (fixes #5189) #5208

Merged
merged 3 commits into from Aug 8, 2019

Conversation

@crepererum
Copy link
Contributor

commented Aug 2, 2019

  • Tests added / passed
  • Passes black dask / flake8 dask

Fixes #5189
This is an alternative to #5201

@martindurant

This comment has been minimized.

Copy link
Member

commented Aug 2, 2019

The string of numba errors that appeared here are not likely to be related.

@jcrist , do you have time to review this, or perhaps suggest someone?

@crepererum crepererum force-pushed the crepererum:fix/5189b branch from d778060 to 1b7198b Aug 5, 2019

@jcrist

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

Apologies for the delayed response here. The intent/approach here seems good to me, but I think we can make the implementation simpler. Mind if I push a commit to this PR?

@crepererum

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

Mind if I push a commit to this PR?

Go ahead.

@jcrist

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

I've pushed a commit simplifying the implementation. This also builds a simpler task representation, which should (slightly) improve performance of any graph operations.

cc @crepererum for a quick review.

@crepererum

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

OK, that looks cleaner. Thanks a lot!

This also builds a simpler task representation, which should (slightly) improve performance of any graph operations.

For personal interest and if you have time, can you elaborate on that part a little bit?

@jcrist

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

For personal interest and if you have time, can you elaborate on that part a little bit?

Every nested potential sub-task (lists and tuples starting with a callable) in a task (a value in the graph) is traversed during graph optimization and execution a couple times. While these traversals usually are negligible compared to the cost of the actual computation, they do still take time and memory. Flattening the task structure a bit can help speed up optimization passes and execution time as the scheduler has less to do.

@jcrist jcrist merged commit e8a2adc into dask:master Aug 8, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jcrist jcrist referenced this pull request Aug 8, 2019
2 of 2 tasks complete

@crepererum crepererum deleted the crepererum:fix/5189b branch Aug 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.