Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Target Encoder to Components #1401

Merged
merged 7 commits into from
Nov 16, 2020
Merged

Add Target Encoder to Components #1401

merged 7 commits into from
Nov 16, 2020

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Nov 3, 2020

fix #1390

Adding target encoder to components. Perf test results are here

Docs are here

[UPDATE]
On Friday, 11/6, we decided to add additional perf tests for TE (discussion in the quip doc above).
Things to test:

  1. Running OHE vs TE on categorical string data
  2. Running OHE vs TE on categorical data that has already been converted to int/float
  3. Running OHE vs TE on categorical data that has already been converted to int/float, where we convert the column type to categorical if there are less than 10 unique values in that column.

The perf test doc contains these 3 tests, ordered from 3, 1, 2.

@bchen1116 bchen1116 self-assigned this Nov 3, 2020
@CLAassistant
Copy link

CLAassistant commented Nov 3, 2020

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Nov 3, 2020

Codecov Report

Merging #1401 (2e3b007) into main (c470196) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1401     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         218      220      +2     
  Lines       14433    14582    +149     
=========================================
+ Hits        14426    14575    +149     
  Misses          7        7             
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/pipelines/pipeline_base.py 100.0% <ø> (ø)
...alml/pipelines/components/transformers/__init__.py 100.0% <100.0%> (ø)
...lines/components/transformers/encoders/__init__.py 100.0% <100.0%> (ø)
...components/transformers/encoders/target_encoder.py 100.0% <100.0%> (ø)
...valml/tests/component_tests/test_target_encoder.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c470196...2e3b007. Read the comment docs.

@bchen1116 bchen1116 force-pushed the bc_1390_target branch 2 times, most recently from 738dd3e to 008f495 Compare November 5, 2020 21:31
@bchen1116 bchen1116 marked this pull request as ready for review November 5, 2020 23:08
@bchen1116 bchen1116 marked this pull request as draft November 9, 2020 20:36
@bchen1116 bchen1116 marked this pull request as ready for review November 13, 2020 16:57
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 Looks good! Thanks for hunting down that weird bug hehe. I have a suggestion for how to guard against that bug without taking the extreme measure of resetting the index at the pipeline level. But maybe that is something we should consider in the future! Custom indices have shot us in the foot so many times lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Target encoding
4 participants