Skip to content

Add Support for Bool Features for ARIMA#3187

Merged
freddyaboulton merged 5 commits intomainfrom
3181-arima-bool-features
Jan 12, 2022
Merged

Add Support for Bool Features for ARIMA#3187
freddyaboulton merged 5 commits intomainfrom
3181-arima-bool-features

Conversation

@freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Jan 5, 2022

Pull Request Description

Fixes #3181

Perf tests:
report.html.zip


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Jan 5, 2022

Codecov Report

Merging #3187 (280f231) into main (cb846dd) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3187     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        326     326             
  Lines      31372   31392     +20     
=======================================
+ Hits       31283   31303     +20     
  Misses        89      89             
Impacted Files Coverage Δ
...omponents/estimators/regressors/arima_regressor.py 100.0% <100.0%> (ø)
...alml/tests/component_tests/test_arima_regressor.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 97.0% <100.0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb846dd...280f231. Read the comment docs.

@freddyaboulton freddyaboulton force-pushed the 3181-arima-bool-features branch 6 times, most recently from 1a97244 to 4eff434 Compare January 11, 2022 15:27
@freddyaboulton freddyaboulton marked this pull request as ready for review January 11, 2022 16:21
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Just one question.

X, y = self._manage_woodwork(X, y)
fh_ = self._set_forecast(X)
X = X.select_dtypes(exclude=["datetime64"])
X = X.ww.select(exclude=["Datetime"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call self._remove_datetime here as well? Unsure if we need the other functionaility in self._remove_datetime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I think we could but that would cause an extra copy so I'd rather keep as is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me!

ar._component_obj = MagicMock()
ar.fit(X, y)

pd.testing.assert_series_equal(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool!

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@freddyaboulton freddyaboulton force-pushed the 3181-arima-bool-features branch from 4eff434 to 2c6475e Compare January 11, 2022 17:15
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @freddyaboulton ! Just left some comments on testing I was curious about :)

X.ww.init()
X.ww["bool_1"] = (
pd.Series([True, False])
.sample(n=10, replace=True, random_state=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely nit-picky but I wonder if we need any randomness here as opposed to just creating a numpy array of alternating true/false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We technically do not 😅

ar._component_obj.fit.call_args[1]["X"]["bool_2"], X["bool_2"].astype(float)
)

ar = ARIMARegressor(time_index="dates")
Copy link
Contributor

@angela97lin angela97lin Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering-- ar._component_obj = MagicMock() no longer applies to this line, right? Is that intentional? Are we just checking that we can actually fit/predict? I think this was not immediately clear to me. Could be clearer by just checking for these statements above when we mock, but I'm not sure 😅

If so, should we also check that predict doesn't return nan values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the intention is to check that we can call non-mocked fit/predict on data with bools and not error out. I will add a comment and check for non-nan!

@freddyaboulton freddyaboulton force-pushed the 3181-arima-bool-features branch from 2c6475e to 4989517 Compare January 12, 2022 16:05
@freddyaboulton freddyaboulton enabled auto-merge (squash) January 12, 2022 16:05
@freddyaboulton freddyaboulton force-pushed the 3181-arima-bool-features branch from 4989517 to 280f231 Compare January 12, 2022 16:36
@freddyaboulton freddyaboulton merged commit aa0ec23 into main Jan 12, 2022
@freddyaboulton freddyaboulton deleted the 3181-arima-bool-features branch January 12, 2022 17:16
@chukarsten chukarsten mentioned this pull request Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ARIMA cannot handle boolean features

4 participants