-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up pipeline graphing code #423
Conversation
Codecov Report
@@ Coverage Diff @@
## master #423 +/- ##
==========================================
+ Coverage 98.19% 98.25% +0.06%
==========================================
Files 104 104
Lines 3260 3265 +5
==========================================
+ Hits 3201 3208 +7
+ Misses 59 57 -2
Continue to review full report at Codecov.
|
Hmmm, this is something we've previously discussed here. I think we decided to move everything into a separate PipelinePlot class because it would keep the implementation separate from the plots, making it cleaner to understand what was what. I believe we implemented this similar to how pandas did it too (https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.plot.html). What are the benefits you see in adding them to the base class directly? @kmax12, do you have any thoughts on this? Also, could be important to keep this issue in mind? |
seems like there are two concerns here
I don't have a strong opinion on this. just that we are consistent. I can see that the
|
Thanks for the context @angela97lin. I like that summary @max. My rationale here was to do the simplest thing, both in terms of the user API and the code organization. For the user API: I think its more clear to have the graph methods be direct attributes of the pipeline, without indirection through an additional For the code organization: this PR keeps the graphing code in a separate file, with flat methods, and then calls those methods in Also I changed the naming from "plot" to "graph", because both the pipeline graph and feature importance bar graph are graphs :) |
I wouldn't be surprised if we do add more, but i don't think we need to make an API decision in anticipation of that. I think the only other ones to be aware are the confusion_matrix and roc plot ones. These are currently defined on the Auto search classes, but I know we been discussing moving them to the pipeline object, which I 100% agree with.
Yep, I actually like this organization of the code more than the plotting classes we currently have. My thought would be to actually push those methods you've defined to be more general purpose. e.g the |
@@ -8,6 +8,7 @@ Changelog | |||
* Add CatBoost (gradient-boosted trees) classification and regression components and pipelines :pr:`247` | |||
* Added Tuner abstract base class :pr:`351` | |||
* Added n_jobs as parameter for AutoClassificationSearch and AutoRegressionSearch :pr:`403` | |||
* Added PipelineBase graph and feature_importance_graph methods, moved from previous location :pr:`423` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also add this as a breaking change
evalml/pipelines/graphs.py
Outdated
Returns: | ||
plotly.Figure, a bar graph showing features and their importances | ||
""" | ||
feat_imp = pipeline.feature_importances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the input to this method should be a dataframe i.e whatever pipeline.feature_importances
is. that would make it more general purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, will do
1deb2d8
to
97a3466
Compare
f = open(filepath, 'w') | ||
f.close() | ||
except IOError: | ||
raise ValueError(('Specified parent directory does not exist: {}'.format(filepath))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did end up adding one thing: make sure the file location is writeable. Added unit test
Ugh, the |
3a59ddf
to
4941a4e
Compare
Cleaning up the API for 1) generating images of pipeline graphs and 2) generating bar graphs of feature importances. It felt like these should be direct members of
PipelineBase
rather than having indirection through another class.I haven't modified the logic for generating the graphs, just how that functionality is invoked.