Skip to content

Update visualize_decision_tree to include feature names #1813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Feb 10, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Feb 9, 2021

Closes #1718. Also removes feature_names as a parameter from decision_tree_data_from_estimator, since we can extract the feature names used to fit from the estimator now.

Not sure what the best way to add testing to our codebase to ensure the output png has the right name, but here's a code snippet and the resulting image 😁

from sklearn import datasets
from evalml.pipelines.components import DecisionTreeRegressor, DecisionTreeClassifier
from evalml.model_understanding.graphs import (
    visualize_decision_tree
)

def X_y():
    X, y = datasets.make_classification(n_samples=100, n_features=20,
                                        n_informative=2, n_redundant=2, random_state=0)

    return X, y


X_b, y_b = X_y()
X_b = pd.DataFrame(X_b, columns=[f'Testing_{col}' for col in range(len(X_b[0]))])
dt = DecisionTreeClassifier()
dt.fit(X_b, y_b)
visualize_decision_tree(dt, filepath='test.png')

test

@angela97lin angela97lin self-assigned this Feb 9, 2021
@angela97lin angela97lin added this to the Sprint 2021 Feb A milestone Feb 9, 2021
@codecov
Copy link

codecov bot commented Feb 9, 2021

Codecov Report

Merging #1813 (de85e0f) into main (a1f6e3a) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1813     +/-   ##
=========================================
- Coverage   100.0%   100.0%   -0.0%     
=========================================
  Files         252      252             
  Lines       20061    20052      -9     
=========================================
- Hits        20053    20044      -9     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/model_understanding/graphs.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 100.0% <100.0%> (ø)
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1f6e3a...de85e0f. Read the comment docs.

def test_decision_tree_data_from_estimator(fitted_tree_estimators):
est_class, est_reg = fitted_tree_estimators

formatted_ = decision_tree_data_from_estimator(est_reg, feature_names=[f'Testing_{col_}' for col_ in range(est_reg._component_obj.n_features_)])
formatted_ = decision_tree_data_from_estimator(est_reg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing feature names parameter. Since this test already checks that the output has the feature names but now we're just grabbing them from the estimator rather than setting explicitly, no other code changes needed.

@angela97lin angela97lin marked this pull request as ready for review February 10, 2021 20:43
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only comment I have is maybe we can make the feature names stand out a bit (bold or italicize it?).

@angela97lin
Copy link
Contributor Author

@jeremyliweishih That's a great suggestion! With our current implementation though, I'm not sure there's an API available to easily do this though so we might need to punt on that: https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@angela97lin angela97lin merged commit 94f554e into main Feb 10, 2021
@angela97lin angela97lin deleted the 1718_name_features branch February 10, 2021 22:51
@chukarsten chukarsten mentioned this pull request Feb 23, 2021
@dsherry dsherry mentioned this pull request Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update visualize_decision_tree to track feature names
4 participants