Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle boolean and categorical features for time series #3083

Merged
merged 6 commits into from
Nov 19, 2021

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #3077, Fixes #3082


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Nov 19, 2021

Codecov Report

Merging #3083 (04d003f) into main (72b8048) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3083     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        312     313      +1     
  Lines      30437   30468     +31     
=======================================
+ Hits       30347   30378     +31     
  Misses        90      90             
Impacted Files Coverage Δ
...mponent_tests/test_delayed_features_transformer.py 99.1% <ø> (ø)
...components/transformers/encoders/onehot_encoder.py 100.0% <100.0%> (ø)
...rmers/preprocessing/delayed_feature_transformer.py 100.0% <100.0%> (ø)
...alml/tests/component_tests/test_one_hot_encoder.py 100.0% <100.0%> (ø)
.../integration_tests/test_time_series_integration.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72b8048...04d003f. Read the comment docs.

@@ -144,9 +144,9 @@ def test_null_values_in_dataframe():
"col_4": [2, 0, 1, 3, 0, 1, 2],
}
)

X.ww.init(logical_types={"col_1": "categorical"})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logical types were treated as Unknown before.

ProblemTypes.TIME_SERIES_REGRESSION,
],
)
def test_can_run_automl_for_time_series_with_categorical_and_boolean_features(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18 second run time seems reasonable to me

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! Just have a question about potentially adding a test case for the numeric OHE case you added.

@@ -144,9 +144,9 @@ def test_null_values_in_dataframe():
"col_4": [2, 0, 1, 3, 0, 1, 2],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test case where encoder_error = OneHotEncoder(handle_missing="error") and it fits on a data frame with a numeric column with a null value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested through the time series integration tests but agree it makes sense to add a test here!

@freddyaboulton freddyaboulton merged commit 20ec36e into main Nov 19, 2021
@freddyaboulton freddyaboulton deleted the 3077-handle-boolean-cat-for-ts branch November 19, 2021 18:49
@freddyaboulton freddyaboulton restored the 3077-handle-boolean-cat-for-ts branch November 19, 2021 18:49
@chukarsten chukarsten mentioned this pull request Nov 29, 2021
@freddyaboulton freddyaboulton deleted the 3077-handle-boolean-cat-for-ts branch May 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants