Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1239 visualize decision trees #1511

Merged
merged 26 commits into from
Dec 10, 2020
Merged

Conversation

ParthivNaresh
Copy link
Contributor

@ParthivNaresh ParthivNaresh commented Dec 7, 2020

Fixes #1239


The output of visualize_decision_tree() will be a graphviz.files.Source object.

For example visualize_decision_tree(clf=regression_estimator, filled=True, max_depth=2).view()

Screen Shot 2020-12-08 at 11 29 46 AM

The output of clean_format_tree will be an OrderedDict.

Screen Shot 2020-12-08 at 1 10 15 PM

@ParthivNaresh ParthivNaresh self-assigned this Dec 7, 2020
@CLAassistant
Copy link

CLAassistant commented Dec 7, 2020

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Dec 7, 2020

Codecov Report

Merging #1511 (47adf09) into main (3de3b12) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1511     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         232      232             
  Lines       16430    16639    +209     
=========================================
+ Hits        16422    16631    +209     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/model_understanding/graphs.py 99.7% <100.0%> (+0.1%) ⬆️
...s/estimators/regressors/decision_tree_regressor.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 100.0% <100.0%> (ø)
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3de3b12...47adf09. Read the comment docs.

@ParthivNaresh ParthivNaresh marked this pull request as ready for review December 8, 2020 15:02
@dsherry
Copy link
Contributor

dsherry commented Dec 8, 2020

@ParthivNaresh I'm excited to review this! Could you please include an example of the output in the PR description? Will make it quick for reviewers to understand what's up.

@ParthivNaresh ParthivNaresh marked this pull request as draft December 8, 2020 16:16
@ParthivNaresh ParthivNaresh marked this pull request as ready for review December 8, 2020 18:44
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh Looks great! And tests look solid. I left some comments but I think the only thing holding merge is the discussion about how to display the column names in the clean_format_tree

evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
num_nodes = est.tree_.node_count
children_left = est.tree_.children_left
children_right = est.tree_.children_right
features = est.tree_.feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should save the actual feature names in the output. These feature names range from 0 to n_col-1 because we convert from pandas to sklearn right before fitting. The problem is that the feature names are not saved in the tree object.

I think we have a couple of options:

  1. Add an option to this function for passing in the feature names
  2. Change the input type from a tree estimator to a pipeline with a tree estimator. This would allow us to use input_feature_names[estimator.name]
  3. File an issue for saving the feature names to the tree estimator and leaving this function as-is for now.

I think I prefer 2 but what do you think? I believe our model understanding methods take in a pipeline instead of an estimator so it'd be more consistent with what we already have. I guess we can also do both 1 and 2, where we add the feature names as parameter to this function and then add another function that accepts a pipeline that calls this function.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I added a few suggestions on additional tests, but nothing blocking.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh this is awesome!

I left a few suggestions and questions. Ones I'd like us to address before merge:

  • Method naming
  • Use data method in graph method
  • Split graph unit tests into checking the returned graph content vs checking the filepath image saving

I also left a note about #1535 , that could be cool to look at next!

evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/model_understanding/graphs.py Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/model_understanding/graphs.py Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/tests/model_understanding_tests/test_graphs.py Outdated Show resolved Hide resolved
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Just added some tiny nit-picky comments about breaking down tests more and docstrings.

evalml/tests/model_understanding_tests/test_graphs.py Outdated Show resolved Hide resolved
evalml/tests/model_understanding_tests/test_graphs.py Outdated Show resolved Hide resolved
…, and release notes. Also fixed typo in Decision Tree Regressor name
# Conflicts:
#	docs/source/release_notes.rst
… test to cover list casting of passed feature names
@ParthivNaresh ParthivNaresh merged commit 9b0a1b4 into main Dec 10, 2020
@dsherry dsherry mentioned this pull request Dec 29, 2020
@freddyaboulton freddyaboulton deleted the 1239-visualize-decision-trees branch May 13, 2022 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Visualize Decision Trees
6 participants