Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding IsYearEnd and IsYearStart primitives #2124

Merged
merged 21 commits into from
Jun 28, 2022
Merged

Conversation

sbadithe
Copy link
Contributor

Adding the following primitives: IsYearEnd, IsYearStart

Fixes #2061
Fixes #2060

@codecov
Copy link

codecov bot commented Jun 17, 2022

Codecov Report

Merging #2124 (66d0ca4) into main (4f35a15) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2124   +/-   ##
=======================================
  Coverage   99.21%   99.21%           
=======================================
  Files         143      143           
  Lines       16837    16869   +32     
=======================================
+ Hits        16705    16737   +32     
  Misses        132      132           
Impacted Files Coverage Δ
...imitives/standard/datetime_transform_primitives.py 100.00% <100.00%> (ø)
.../tests/primitive_tests/test_transform_primitive.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f35a15...66d0ca4. Read the comment docs.

gsheni
gsheni previously approved these changes Jun 18, 2022
@sbadithe sbadithe marked this pull request as ready for review June 21, 2022 16:42
gsheni
gsheni previously approved these changes Jun 22, 2022
@@ -174,6 +176,22 @@ def test_is_quarter_start():
np.testing.assert_array_equal(iqs_bools, correct_bools)


def test_is_year_end():
is_year_end = IsYearEnd()
dates = pd.Series([datetime(2020, 12, 31), datetime(2020, 1, 1)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is a NaN?

>>> from datetime import datetime
>>> dates = [datetime(2019, 12, 31),
... datetime(2019, 1, 1),
... datetime(2019, 11, 30)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include an example with NaN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure we did this with the previous Datetime primitives we merged in

Copy link
Contributor Author

@sbadithe sbadithe Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question regarding the behavior of primitives with NaN:

If I use the Month primitive on the following data:

pd.Series([datetime(2020, 3, 3), np.nan])

I get [3.0, nan]

If I pass pd.Series([np.nan]) in to the Month primitive, I get an error.

I get analogous behavior for Year and the like. I just wanted to confirm that this is indeed the correct expected behavior before writing the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may have found a bug :)

Copy link
Contributor

@thehomebrewnerd thehomebrewnerd Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should always expect the series dtype for Datetime columns to be datetime64[ns] based on the behavior of Woodwork, which enforces that dtype for the series.

If you don't otherwise specify the dtype a series of pd.Series([np.nan]) will have a dtype of float64 and you can't perform datetime operations on a float64 column.

I haven't tested this out, but I think if you convert pd.Series([np.nan]) to a datetime column first, the Month primitive will no longer error. If you still get an error at that point, then I agree with @dvreed77 there is a bug we need to fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I confirmed this doesn't error

s = pd.Series([np.nan]).astype('datetime64[ns]')

ft.primitives.Month()(s)

@sbadithe sbadithe requested a review from dvreed77 June 27, 2022 21:48
Examples:
>>> from datetime import datetime
>>> dates = [datetime(2019, 12, 31),
... datetime(2019, 1, 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbadithe can you put a NaN example in the doc string?

Copy link
Contributor

@dvreed77 dvreed77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sbadithe sbadithe merged commit 87cd39a into main Jun 28, 2022
@sbadithe sbadithe deleted the more-year-and-day-primitives branch June 28, 2022 21:37
@ozzieD ozzieD mentioned this pull request Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add IsYearEnd primitive Add IsYearStart primitive
4 participants