Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workaround for memory leaks in bag generators (fixes #5189) #5208

Merged
merged 3 commits into from Aug 8, 2019

Conversation

crepererum
Copy link
Contributor

@crepererum crepererum commented Aug 2, 2019

  • Tests added / passed
  • Passes black dask / flake8 dask

Fixes #5189
This is an alternative to #5201

@martindurant
Copy link
Member

@martindurant martindurant commented Aug 2, 2019

The string of numba errors that appeared here are not likely to be related.

@jcrist , do you have time to review this, or perhaps suggest someone?

@jcrist
Copy link
Member

@jcrist jcrist commented Aug 6, 2019

Apologies for the delayed response here. The intent/approach here seems good to me, but I think we can make the implementation simpler. Mind if I push a commit to this PR?

@crepererum
Copy link
Contributor Author

@crepererum crepererum commented Aug 7, 2019

Mind if I push a commit to this PR?

Go ahead.

@jcrist
Copy link
Member

@jcrist jcrist commented Aug 7, 2019

I've pushed a commit simplifying the implementation. This also builds a simpler task representation, which should (slightly) improve performance of any graph operations.

cc @crepererum for a quick review.

@crepererum
Copy link
Contributor Author

@crepererum crepererum commented Aug 8, 2019

OK, that looks cleaner. Thanks a lot!

This also builds a simpler task representation, which should (slightly) improve performance of any graph operations.

For personal interest and if you have time, can you elaborate on that part a little bit?

@jcrist
Copy link
Member

@jcrist jcrist commented Aug 8, 2019

For personal interest and if you have time, can you elaborate on that part a little bit?

Every nested potential sub-task (lists and tuples starting with a callable) in a task (a value in the graph) is traversed during graph optimization and execution a couple times. While these traversals usually are negligible compared to the cost of the actual computation, they do still take time and memory. Flattening the task structure a bit can help speed up optimization passes and execution time as the scheduler has less to do.

@jcrist jcrist merged commit e8a2adc into dask:master Aug 8, 2019
2 checks passed
@crepererum crepererum deleted the fix/5189b branch Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants