-
Notifications
You must be signed in to change notification settings - Fork 908
Add DistanceToHoliday transform primitive #1853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dvreed77
commented
Jan 20, 2022
- fixes Add DistanceToHoliday primitive #1847
| return date_to_holiday | ||
|
|
||
|
|
||
| class DistanceToHoliday(TransformPrimitive): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to move some of this into a utility? Since it uses similar functionality to our other primitive (DateToHoliday)?
We did this with CountString & WhiteSpaceCount
- https://github.com/alteryx/nlp_primitives/blob/main/nlp_primitives/whitespace_count.py#L6
- https://github.com/alteryx/nlp_primitives/blob/main/nlp_primitives/count_string.py#L9
We have a primitive util file too:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsheni upon second look I think you had a good point. Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, looks good.
Codecov Report
@@ Coverage Diff @@
## main #1853 +/- ##
=======================================
Coverage 98.79% 98.80%
=======================================
Files 143 144 +1
Lines 15916 15984 +68
=======================================
+ Hits 15725 15793 +68
Misses 191 191
Continue to review full report at Codecov.
|
| np.testing.assert_array_equal(given_answer, answer) | ||
|
|
||
|
|
||
| def test_valid_country(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original version of this test was perhaps testing something slightly different since it was named test_holiday_out_of_range, but I'm not exactly clear now what the intent of that test was. Do you have any ideas? I see the original test introduced np.nan values in the result, but this one does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thehomebrewnerd I dug into this a bit. The test above isn't replacing that test, I actually didn't see the original tests when moving things over.
The out of range test you mentioned brought to my attention some unexpected behavior when using the holidays library. It appears the observed holidays are only present in the dataset. For example, the test date of 2010-1-1 results in a NaN because there is no holiday named "Boxing Day" in the data set for 2009 and 2010 since they fall on Saturday and Sunday respectively. Instead, the dataset contains "Boxing Day (Observed)".
I think this is confusing behavior in my opinion.
cc: @gsheni
|
|
||
| distance_list = distance_to_holiday(dates).tolist() | ||
|
|
||
| assert distance_list[0] == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion, but can we create a list of expected values and assert the same way we do in test_nat and test_valid_country instead of checking each value individually?