Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DecisionTreeClassifier and DecisionTreeRegressor classes #1223

Merged
merged 10 commits into from Sep 28, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Sep 24, 2020

Closes #1196 by adding DecisionTreeClassifier and DecisionTreeRegressor

  • Add DecisionTree* classes (+API reference)

TODO in later PR: Add DecisionTree* to AutoML and do perf testing

@angela97lin angela97lin self-assigned this Sep 24, 2020
@angela97lin angela97lin added this to the September 2020 milestone Sep 24, 2020
@codecov
Copy link

codecov bot commented Sep 24, 2020

Codecov Report

Merging #1223 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1223   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         196      200    +4     
  Lines       12206    12293   +87     
=======================================
+ Hits        12197    12284   +87     
  Misses          9        9           
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.00% <ø> (ø)
evalml/pipelines/components/__init__.py 100.00% <ø> (ø)
evalml/pipelines/components/estimators/__init__.py 100.00% <ø> (ø)
...components/estimators/classifiers/rf_classifier.py 100.00% <ø> (ø)
...s/components/estimators/regressors/rf_regressor.py 100.00% <ø> (ø)
evalml/model_family/model_family.py 100.00% <100.00%> (ø)
...ines/components/estimators/classifiers/__init__.py 100.00% <100.00%> (ø)
...estimators/classifiers/decision_tree_classifier.py 100.00% <100.00%> (ø)
...lines/components/estimators/regressors/__init__.py 100.00% <100.00%> (ø)
...s/estimators/regressors/decision_tree_regressor.py 100.00% <100.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43ef057...5ef0b33. Read the comment docs.

@angela97lin angela97lin marked this pull request as ready for review Sep 28, 2020
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

LGTM. Just a couple questions and an extra test case but nothing blocking. Very cool!

@@ -9,6 +9,7 @@ class ModelFamily(Enum):
LINEAR_MODEL = 'linear_model'
CATBOOST = 'catboost'
EXTRA_TREES = 'extra_trees'
DECISION_TREE = 'decision_tree'
Copy link
Contributor

@jeremyliweishih jeremyliweishih Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move towards generalizing model families - I see that we have is_tree_estimator below and I think it could be a good idea to put all tree based models together. Likewise with all gradient boosted machines! We should file an issue if we like that idea.

Copy link
Contributor Author

@angela97lin angela97lin Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this thought also crossed my mind while I was adding this and I agree! It's a bit tricky though, since we currently rely on ModelFamily to determine when we're dealing with XGBoost and CatBoost which both have to be handled differently in quite a few places 🤔

hyperparameter_ranges = {
"criterion": ["gini", "entropy"],
"max_features": ["auto", "sqrt", "log2"],
"max_depth": Integer(4, 10)
Copy link
Contributor

@jeremyliweishih jeremyliweishih Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is max_depth values just a placeholder for now until we do perf testing? It seems a little low off the top of my head.

Copy link
Contributor Author

@angela97lin angela97lin Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I'm using the same value from our ExtraTrees components for now, don't know if there's anything better until we do some perf testing 😎

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Left some notes about potential tests to add. Decision Tree doesn't currently work with categorical data, so adding coverage of that would be important!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

@angela97lin Looks good to me! Thanks for updating is_tree_estimator in model family and for excluding these from automl. I look forward to looking at the perf test results!

@@ -27,7 +27,3 @@ def __init__(self, n_estimators=100, max_depth=6, n_jobs=-1, random_state=0, **k
super().__init__(parameters=parameters,
component_obj=rf_classifier,
random_state=random_state)

Copy link
Contributor Author

@angela97lin angela97lin Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but I believe these can be cleaned up.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

LGTM!

@angela97lin angela97lin merged commit 7e8f614 into main Sep 28, 2020
@angela97lin angela97lin deleted the 1196_decision_tree branch Sep 28, 2020
@angela97lin angela97lin mentioned this pull request Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sklearn DecisionTreeRegressor and DecisionTreeClassifier estimators
4 participants