Skip to content

Add DataFrame and Series memory_usage_per_partition methods#5971

Merged
TomAugspurger merged 3 commits intodask:masterfrom
jrbourbeau:mem-usage-partition
Mar 4, 2020
Merged

Add DataFrame and Series memory_usage_per_partition methods#5971
TomAugspurger merged 3 commits intodask:masterfrom
jrbourbeau:mem-usage-partition

Conversation

@jrbourbeau
Copy link
Copy Markdown
Member

This PR adds a memory_usage_per_partition method to Dask's Series and DataFrame collections. Closes #5952

cc @xhochy @quasiben

  • Tests added / passed
  • Passes black dask / flake8 dask

Copy link
Copy Markdown
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jrbourbeau. I see that total_mem_usage is also used in test_repartition_partition_size in test_dataframe.py, and I think it's being called with deep=False now. I think we do want deep=True there, since we do have a string column.

LGTM otherwise.

@jrbourbeau
Copy link
Copy Markdown
Member Author

Ah good catch, thanks @TomAugspurger!

@jrbourbeau
Copy link
Copy Markdown
Member Author

The Python 3.7 failure is unrelated and appears to sporadically pop up in master (xref https://travis-ci.org/dask/dask/jobs/657521668)

Copy link
Copy Markdown
Contributor

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@TomAugspurger
Copy link
Copy Markdown
Member

Seems like a flaky test with distributed. Opened #5972 for that and restarted the build.

@TomAugspurger TomAugspurger merged commit cfa28ae into dask:master Mar 4, 2020
@TomAugspurger
Copy link
Copy Markdown
Member

Thanks @jrbourbeau!

@jrbourbeau jrbourbeau deleted the mem-usage-partition branch March 4, 2020 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-partition information in dask.dataframe.DataFrame.memory_usage

3 participants