-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove usage of scikit-learn's LabelEncoder
#3161
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3161 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 324 324
Lines 31233 31232 -1
=======================================
Hits 31128 31128
+ Misses 105 104 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impeccable work!
@@ -167,7 +168,7 @@ def _encode_labels(self, y): | |||
if not is_integer_dtype(y_encoded): | |||
self._label_encoder = LabelEncoder() | |||
y_encoded = pd.Series( | |||
self._label_encoder.fit_transform(y_encoded), dtype="int64" | |||
self._label_encoder.fit_transform(None, y_encoded)[1], dtype="int64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the pd.Series
call here when we don't have it in catboost?
@@ -103,7 +103,7 @@ def _convert_bool_to_int(X): | |||
def _label_encode(self, y): | |||
if not is_integer_dtype(y): | |||
self._label_encoder = LabelEncoder() | |||
y = pd.Series(self._label_encoder.fit_transform(y), dtype="int64") | |||
y = pd.Series(self._label_encoder.fit_transform(None, y)[1], dtype="int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Agreed with @jeremyliweishih, the label encoder component already returns the y
value as an int-encoded series, so I don't think we need to do the extra initialization and type-casting in lightgbm and xgboost
Closes #3132