Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update estimators to keep track of input feature names #1794

Merged
merged 20 commits into from Feb 10, 2021

Conversation

angela97lin
Copy link
Contributor

Closes #1757

@angela97lin angela97lin self-assigned this Feb 8, 2021
@codecov
Copy link

codecov bot commented Feb 8, 2021

Codecov Report

Merging #1794 (42ba2dd) into main (c3bd8d3) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1794     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         252      252             
  Lines       20047    20061     +14     
=========================================
+ Hits        20039    20053     +14     
  Misses          8        8             
Impacted Files Coverage Δ
...ents/estimators/classifiers/catboost_classifier.py 100.0% <100.0%> (ø)
...ents/estimators/classifiers/lightgbm_classifier.py 100.0% <100.0%> (ø)
...nents/estimators/classifiers/xgboost_classifier.py 100.0% <100.0%> (ø)
...valml/pipelines/components/estimators/estimator.py 100.0% <100.0%> (ø)
...onents/estimators/regressors/catboost_regressor.py 100.0% <100.0%> (ø)
...onents/estimators/regressors/lightgbm_regressor.py 100.0% <100.0%> (ø)
...ponents/estimators/regressors/xgboost_regressor.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_estimators.py 100.0% <100.0%> (ø)
...alml/tests/component_tests/test_lgbm_classifier.py 100.0% <100.0%> (ø)
...valml/tests/component_tests/test_lgbm_regressor.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3bd8d3...42ba2dd. Read the comment docs.

X_encoded = self._encode_categories(X, fit=True)
y_encoded = self._encode_labels(y)
return super().fit(X_encoded, y_encoded)
self._component_obj.fit(X_encoded, y_encoded)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating to just call here, because Estimator will call self.input_feature_names = list(X.columns) but since we've changed the feature names in _encode_categories (since LGBM isn't compatible with weird names), this would override the original names.

@@ -42,8 +46,15 @@ def __init__(self, eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, r
random_state=random_state)

def fit(self, X, y=None):
X = _convert_to_woodwork_structure(X)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to LGBM, updating to just call implementation, because Estimator will call self.input_feature_names = list(X.columns) but since we've changed the feature names in _encode_categories (since XGBoost isn't compatible with weird names), this would override the original names.

clf.fit(X, y)
assert len(clf.feature_importance) == len(X.columns)
assert not np.isnan(clf.feature_importance).all().all()
predictions = clf.predict(X).to_series()
assert len(predictions) == len(y)
assert not np.isnan(predictions).all()
assert (clf.input_feature_names == col_names)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than create a whole new test, tacking it on here, where we test passing weird names to estimators :d

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this PR. I like moving the functionality up into the parent class. I think the convert X/y to woodwork can be moved out and reused a little bit more. I submitted a PR to your PR, check it out and let me know if that works. If not, no big deal.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@angela97lin angela97lin merged commit a1f6e3a into main Feb 10, 2021
@angela97lin angela97lin deleted the 1757_estimator_input_feature_names branch February 10, 2021 18:54
@chukarsten chukarsten mentioned this pull request Feb 23, 2021
@dsherry dsherry mentioned this pull request Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update estimators to keep track of input feature names
3 participants