Skip to content

Added ability to create a bar plot of feature importances #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Dec 6, 2019

Conversation

christopherbunn
Copy link
Contributor

@christopherbunn christopherbunn commented Oct 15, 2019

image

Resolves #24

@codecov
Copy link

codecov bot commented Oct 16, 2019

Codecov Report

Merging #133 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
+ Coverage    97.3%   97.32%   +0.01%     
==========================================
  Files          95       95              
  Lines        2712     2729      +17     
==========================================
+ Hits         2639     2656      +17     
  Misses         73       73
Impacted Files Coverage Δ
evalml/models/auto_base.py 96.84% <ø> (ø) ⬆️
evalml/pipelines/pipeline_plots.py 91.66% <100%> (+2.47%) ⬆️
...valml/tests/pipeline_tests/test_pipelines_plots.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5306eef...25987ae. Read the comment docs.

@kmax12
Copy link
Contributor

kmax12 commented Oct 21, 2019

how does this handle features that weren't selected in a feature selection? is it still showing them in the x-axis?

@christopherbunn
Copy link
Contributor Author

how does this handle features that weren't selected in a feature selection? is it still showing them in the x-axis?

It only shows features that were selected for that particular pipeline. Features that were dropped in this pipeline are not shown.

@christopherbunn
Copy link
Contributor Author

christopherbunn commented Oct 21, 2019

Actually, it looks like this functionality has already existed as feature_importances_plot() in evalml/models/render.py. Does this plot need any changes @kmax12 @jeremyliweishih?

image

image

def plot_feature_importances(self):
feat_imp = self.feature_importances
feat_imp['importance'] = abs(feat_imp['importance'])
feat_imp = feat_imp.iloc[::-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just add a comment about why we need to reverse the list? i would think feature importance already returned it in the right order

@@ -252,3 +253,28 @@ def feature_importances(self):
importances.sort(key=lambda x: -abs(x[1]))
df = pd.DataFrame(importances, columns=["feature", "importance"])
return df

def plot_feature_importances(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should wait for #169 to get merge in and then change the api to pipeline.plot.feature_importances

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see this might be a bit tricky since the feature importances are store on the pipeline object rather than results dictionary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently have it so that the plot class takes in a data dictionary that can be updated to store whatever (aka more than just results) for this purpose. I wonder if there is a better implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. one though i have is that we could initialize the class every time .plot is accessed. When we initialize it, we just pass the pipeline object to the constructor. that would also also use to override __call__ so that pipeline.plot() rendered the pipeline graphic in #211

@christopherbunn
Copy link
Contributor Author

This function is called by using pipeline.plot.feature_importances(). I thought about calling function generate_feature_importances_plot() so it would be in line to generate_roc_plots() but I decided on just feature_importances() since it seemed less verbose. Let me know what you think!

@jeremyliweishih
Copy link
Collaborator

@christopherbunn I think the naming is good!

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than some clarifying questions. One small thing: would it look better for centering on the title and subtitle?

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :D

@angela97lin
Copy link
Contributor

Naming sounds good to me! (Honestly, might be worth reducing generate_roc() to just roc() but if anything we can address that in a future PR :))

Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one question. otherwise looks good

@@ -75,3 +75,32 @@ def __call__(self, to_file=None):
graph.render(output_path, cleanup=True)

return graph

def feature_importances(self):
import plotly.graph_objects as go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we import this inside of the function?

@christopherbunn christopherbunn dismissed jeremyliweishih’s stale review December 6, 2019 20:35

Addressed conf.py issues

@christopherbunn christopherbunn merged commit 2cea658 into master Dec 6, 2019
@angela97lin angela97lin mentioned this pull request Dec 16, 2019
@dsherry dsherry deleted the feat-imp-plot branch May 26, 2020 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add plots for feature importance
4 participants