Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert url/email features to object before categorical #3388

Merged
merged 3 commits into from
Mar 22, 2022

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #3366


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

# until sklearn imputer can handle pd.NA in release 1.1
# FT returns these as string types, currently there isn't much difference
# in terms of performance between object and string
# see https://pandas.pydata.org/docs/user_guide/text.html#text-data-types
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to perf test this

@freddyaboulton freddyaboulton force-pushed the 3366-nullables-in-email-url-features branch from ca70bb2 to 1dc5fe3 Compare March 18, 2022 21:47
@codecov
Copy link

codecov bot commented Mar 18, 2022

Codecov Report

Merging #3388 (ba12347) into main (8d010d7) will increase coverage by 52.9%.
The diff coverage is 100.0%.

❗ Current head ba12347 differs from pull request most recent head b3087a8. Consider uploading reports for the commit b3087a8 to get more accurate results

@@           Coverage Diff            @@
##            main   #3388      +/-   ##
========================================
+ Coverage   46.8%   99.7%   +52.9%     
========================================
  Files        329     329              
  Lines      32394   32405      +11     
========================================
+ Hits       15149   32276   +17127     
+ Misses     17245     129   -17116     
Impacted Files Coverage Δ
...rs/preprocessing/transform_primitive_components.py 100.0% <100.0%> (+18.7%) ⬆️
...lml/tests/integration_tests/test_nullable_types.py 100.0% <100.0%> (+100.0%) ⬆️
...components/transformers/encoders/onehot_encoder.py 100.0% <0.0%> (+0.8%) ⬆️
...s/components/transformers/samplers/undersampler.py 100.0% <0.0%> (+1.6%) ⬆️
...nents/estimators/classifiers/xgboost_classifier.py 100.0% <0.0%> (+1.7%) ⬆️
...components/transformers/imputers/target_imputer.py 100.0% <0.0%> (+1.9%) ⬆️
...ansformers/preprocessing/replace_nullable_types.py 100.0% <0.0%> (+2.1%) ⬆️
evalml/pipelines/binary_classification_pipeline.py 100.0% <0.0%> (+4.2%) ⬆️
...l/pipelines/components/transformers/transformer.py 100.0% <0.0%> (+4.8%) ⬆️
...components/transformers/encoders/target_encoder.py 100.0% <0.0%> (+5.6%) ⬆️
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d010d7...b3087a8. Read the comment docs.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only thing i might consider is possibly just adding the columns to a module level dataframe for testing instead of making a new one, but thats not a big deal.

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf tests look great!

@freddyaboulton freddyaboulton force-pushed the 3366-nullables-in-email-url-features branch from ba12347 to b3087a8 Compare March 22, 2022 16:40
@freddyaboulton freddyaboulton enabled auto-merge (squash) March 22, 2022 16:41
@freddyaboulton freddyaboulton merged commit 390d1b9 into main Mar 22, 2022
@chukarsten chukarsten mentioned this pull request Mar 25, 2022
@freddyaboulton freddyaboulton deleted the 3366-nullables-in-email-url-features branch May 13, 2022 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Imputer cannot impute generated features from EmailFeaturizer if missing values are present
4 participants