-
Notifications
You must be signed in to change notification settings - Fork 89
Add DecisionTreeClassifier and DecisionTreeRegressor classes #1223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1223 +/- ##
=======================================
Coverage 99.92% 99.92%
=======================================
Files 196 200 +4
Lines 12206 12293 +87
=======================================
+ Hits 12197 12284 +87
Misses 9 9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple questions and an extra test case but nothing blocking. Very cool!
@@ -9,6 +9,7 @@ class ModelFamily(Enum): | |||
LINEAR_MODEL = 'linear_model' | |||
CATBOOST = 'catboost' | |||
EXTRA_TREES = 'extra_trees' | |||
DECISION_TREE = 'decision_tree' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can move towards generalizing model families - I see that we have is_tree_estimator
below and I think it could be a good idea to put all tree based models together. Likewise with all gradient boosted machines! We should file an issue if we like that idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this thought also crossed my mind while I was adding this and I agree! It's a bit tricky though, since we currently rely on ModelFamily
to determine when we're dealing with XGBoost
and CatBoost
which both have to be handled differently in quite a few places 🤔
hyperparameter_ranges = { | ||
"criterion": ["gini", "entropy"], | ||
"max_features": ["auto", "sqrt", "log2"], | ||
"max_depth": Integer(4, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is max_depth
values just a placeholder for now until we do perf testing? It seems a little low off the top of my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya I'm using the same value from our ExtraTrees components for now, don't know if there's anything better until we do some perf testing 😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some notes about potential tests to add. Decision Tree doesn't currently work with categorical data, so adding coverage of that would be important!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Looks good to me! Thanks for updating is_tree_estimator
in model family and for excluding these from automl. I look forward to looking at the perf test results!
evalml/pipelines/components/estimators/regressors/decision_tree_regressor.py
Show resolved
Hide resolved
@@ -27,7 +27,3 @@ def __init__(self, n_estimators=100, max_depth=6, n_jobs=-1, random_state=0, **k | |||
super().__init__(parameters=parameters, | |||
component_obj=rf_classifier, | |||
random_state=random_state) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated but I believe these can be cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes #1196 by adding DecisionTreeClassifier and DecisionTreeRegressor
TODO in later PR: Add DecisionTree* to AutoML and do perf testing