New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entity.plot compute dask entitysets from delayed #1086
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting up this PR, @systemshift
It would be good to avoid the compute. Perhaps es.plot
should not display the row counts for a dask entityset. That way there would be no need for a compute.
docs/source/changelog.rst
Outdated
@@ -23,6 +24,7 @@ Changelog | |||
* Implement automated process for checking critical dependencies (:pr:`1045`, :pr:`1054`, :pr:`1081`) | |||
* Don't run changelog check for release PRs or automated dependency PRs (:pr:`1057`) | |||
* Fix non-deterministic behavior in Dask test causing codecov issues (:pr:`1070`) | |||
* Remove xfail cases from ``test_plotting.py`` (:pr:`1086`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main change is fixing EntitySet.plot
, an additional changelog entry is unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an additional changelog entry is unnecessary
Not sure what you mean by this, should I remove the changes I made to testing section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by this, should I remove the changes I made to testing section?
Sorry, I meant remove this line from the changelog, since we already have the line added in the Fixes section.
Wouldn't this mean a dask user will lose some function features? |
It would, but this allows a dask user to generate the plot without needing a
I think the shorter time creating the plot due to not needing to compute is worth the loss of information |
if isinstance(entity.df, dd.DataFrame): # entity is a dask entity | ||
label = '{%s |%s\l}' % (entity.id, variables_string) # noqa: W605 | ||
else: | ||
nrows = entity.shape[0] | ||
label = '{%s (%d row%s)|%s\l}' % (entity.id, nrows, 's' * (nrows > 1), variables_string) # noqa: W605 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwedge if this looks good I will start working on the broken test units to finish it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Codecov Report
@@ Coverage Diff @@
## main #1086 +/- ##
==========================================
- Coverage 98.36% 98.36% -0.01%
==========================================
Files 126 126
Lines 13272 13257 -15
==========================================
- Hits 13055 13040 -15
Misses 217 217
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Pull Request Description
This is a first draft that should fix #1051
I will work on the failing test unit if the changes are approved.
My concern is that while this change 'solves' dask entity sets being plotted, it only works on low memory use cases. And would break when a user has large entityset and keeps calling
.compute()
inside a loop. My assumption is that a dask user would otherwise use pandas if they did not need to work on extremely large data.Thoughts @rwedge?
update: might be another task to add to #901
After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of
docs/source/changelog.rst
to include this pull request.