Skip to content

1239 visualize decision trees#1511

Merged
ParthivNaresh merged 26 commits intomainfrom
1239-visualize-decision-trees
Dec 10, 2020
Merged

1239 visualize decision trees#1511
ParthivNaresh merged 26 commits intomainfrom
1239-visualize-decision-trees

Conversation

@ParthivNaresh
Copy link
Copy Markdown
Contributor

@ParthivNaresh ParthivNaresh commented Dec 7, 2020

Fixes #1239


The output of visualize_decision_tree() will be a graphviz.files.Source object.

For example visualize_decision_tree(clf=regression_estimator, filled=True, max_depth=2).view()

Screen Shot 2020-12-08 at 11 29 46 AM

The output of clean_format_tree will be an OrderedDict.

Screen Shot 2020-12-08 at 1 10 15 PM

@ParthivNaresh ParthivNaresh self-assigned this Dec 7, 2020
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 7, 2020

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 7, 2020

Codecov Report

Merging #1511 (47adf09) into main (3de3b12) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1511     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         232      232             
  Lines       16430    16639    +209     
=========================================
+ Hits        16422    16631    +209     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/model_understanding/graphs.py 99.7% <100.0%> (+0.1%) ⬆️
...s/estimators/regressors/decision_tree_regressor.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 100.0% <100.0%> (ø)
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3de3b12...47adf09. Read the comment docs.

@ParthivNaresh ParthivNaresh marked this pull request as ready for review December 8, 2020 15:02
@dsherry
Copy link
Copy Markdown
Contributor

dsherry commented Dec 8, 2020

@ParthivNaresh I'm excited to review this! Could you please include an example of the output in the PR description? Will make it quick for reviewers to understand what's up.

@ParthivNaresh ParthivNaresh marked this pull request as draft December 8, 2020 16:16
@ParthivNaresh ParthivNaresh marked this pull request as ready for review December 8, 2020 18:44
Copy link
Copy Markdown
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh Looks great! And tests look solid. I left some comments but I think the only thing holding merge is the discussion about how to display the column names in the clean_format_tree

Comment thread evalml/model_understanding/prediction_explanations/explainers.py Outdated
Comment thread evalml/model_understanding/prediction_explanations/explainers.py Outdated
Comment thread evalml/model_understanding/graphs.py Outdated
num_nodes = est.tree_.node_count
children_left = est.tree_.children_left
children_right = est.tree_.children_right
features = est.tree_.feature
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should save the actual feature names in the output. These feature names range from 0 to n_col-1 because we convert from pandas to sklearn right before fitting. The problem is that the feature names are not saved in the tree object.

I think we have a couple of options:

  1. Add an option to this function for passing in the feature names
  2. Change the input type from a tree estimator to a pipeline with a tree estimator. This would allow us to use input_feature_names[estimator.name]
  3. File an issue for saving the feature names to the tree estimator and leaving this function as-is for now.

I think I prefer 2 but what do you think? I believe our model understanding methods take in a pipeline instead of an estimator so it'd be more consistent with what we already have. I guess we can also do both 1 and 2, where we add the feature names as parameter to this function and then add another function that accepts a pipeline that calls this function.

Copy link
Copy Markdown
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I added a few suggestions on additional tests, but nothing blocking.

Comment thread evalml/model_understanding/graphs.py
Comment thread evalml/tests/model_understanding_tests/test_graphs.py
Comment thread evalml/model_understanding/graphs.py Outdated
Comment thread evalml/tests/model_understanding_tests/test_graphs.py Outdated
Copy link
Copy Markdown
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh this is awesome!

I left a few suggestions and questions. Ones I'd like us to address before merge:

  • Method naming
  • Use data method in graph method
  • Split graph unit tests into checking the returned graph content vs checking the filepath image saving

I also left a note about #1535 , that could be cool to look at next!

Comment thread evalml/model_understanding/graphs.py Outdated
Comment thread evalml/model_understanding/graphs.py Outdated
Comment thread evalml/model_understanding/graphs.py
Comment thread evalml/model_understanding/graphs.py Outdated
Comment thread evalml/model_understanding/graphs.py
Comment thread evalml/model_understanding/graphs.py Outdated
Comment thread evalml/tests/model_understanding_tests/test_graphs.py Outdated
Comment thread docs/source/release_notes.rst Outdated
Copy link
Copy Markdown
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Just added some tiny nit-picky comments about breaking down tests more and docstrings.

Comment thread evalml/tests/model_understanding_tests/test_graphs.py Outdated
Comment thread evalml/tests/model_understanding_tests/test_graphs.py Outdated
…, and release notes. Also fixed typo in Decision Tree Regressor name
# Conflicts:
#	docs/source/release_notes.rst
… test to cover list casting of passed feature names
@ParthivNaresh ParthivNaresh merged commit 9b0a1b4 into main Dec 10, 2020
@dsherry dsherry mentioned this pull request Dec 29, 2020
@freddyaboulton freddyaboulton deleted the 1239-visualize-decision-trees branch May 13, 2022 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Visualize Decision Trees

6 participants