Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DateToHoliday transform primitive #1848

Merged
merged 15 commits into from
Jan 19, 2022
Merged

Add DateToHoliday transform primitive #1848

merged 15 commits into from
Jan 19, 2022

Conversation

dvreed77
Copy link
Contributor

@dvreed77 dvreed77 commented Jan 18, 2022

@dvreed77 dvreed77 self-assigned this Jan 18, 2022
@CLAassistant
Copy link

CLAassistant commented Jan 18, 2022

CLA assistant check
All committers have signed the CLA.

@gsheni gsheni requested a review from a team January 18, 2022 21:53
@gsheni gsheni changed the title 1846 date to holiday Add DateToHoliday transform primitive Jan 18, 2022
@@ -14,3 +14,4 @@ pydata_sphinx_theme==0.7.1
nlp_primitives>=2.0.0
autonormalize >= 1.0.2
-r koalas-requirements.txt
holidays>=0.12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going to want this to be in requirements.txt as most users wouldn't install the requirements in dev-requirements.txt - those are primarily for developers contributing to the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, of course. It should be in both place tho, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvreed77 This requirement only needs to be in one place. The typical path for installing requirements for development would be to run make installdeps. After upgrading pip, that make command then installs featuretools in editable mode, and that process will grab everything from requirements.txt. Finally, the requirements from dev-requirements.txt are installed by the make command, which also includes references to the test-requirements.txt and koalas-requirements.txt files, so essentially make installdeps installs everything found in all of our requirements files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thehomebrewnerd should I update contributing.md to run make installdeps? Guide currently has this:

  virtualenv venv
  source venv/bin/activate
  python -m pip install -e .
  python -m pip install -r dev-requirements.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should replace the python* commands with make installdeps. The only real difference is that make installdeps will also make sure pip is upgraded first, which should be a good thing to have first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thehomebrewnerd I added holiday package back to dev-requirements because it looks like github actions uses this for install before running unit tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this should be removed from dev requirements and should be covered by the other requirements installs.

@@ -9,6 +9,7 @@ Future Release
* Fixes
* Changes
* Add autonormalize as an add-on library (:pr:`1840`)
* Add Date to Holiday Transform Primitive (:pr: `1846`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor detail here, but I would stylize this as DateToHoliday - use the class name without spaces. See line 24 below for an example. Also be sure to add your username to the list of contributors to the release on line 19.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@codecov
Copy link

codecov bot commented Jan 19, 2022

Codecov Report

Merging #1848 (a3ab9cf) into main (d2bb3b4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1848   +/-   ##
=======================================
  Coverage   98.79%   98.79%           
=======================================
  Files         142      143    +1     
  Lines       15850    15916   +66     
=======================================
+ Hits        15659    15725   +66     
  Misses        191      191           
Impacted Files Coverage Δ
...retools/primitives/standard/transform_primitive.py 100.00% <100.00%> (ø)
...s/primitive_tests/test_datetoholiday_primitives.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2bb3b4...a3ab9cf. Read the comment docs.

@@ -11,3 +11,4 @@ boto3==1.17.46
composeml==0.3.0
urllib3==1.26.5
pyarrow==3.0.0
holidays>=0.12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...I don't think you should need this here. These should just be packages that are needed for running the tests, but not already included in the other requirements files .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanShot 2022-01-19 at 10 45 15

This test was failing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last line does a pip instal from minimum_test_requirements.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the holidays install for these tests should get handled a little lower in this file. It will try to install requirements from minimum_core_requirements.txt or minimum_koalas_requirements.txt depending on the version of the test being run, so as long as you have holidays included in those two files I think you should be good.

image

Copy link
Contributor

@thehomebrewnerd thehomebrewnerd Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other point of clarification here, in the minimum_test*.txt files, we will want to pin the version to a specific number to make sure we always get that lowest version installed, instead of using >=version syntax which would allow newer versions to be installed.

Copy link
Contributor

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small things yet to clean up, but overall looking good.

@@ -9,6 +9,7 @@ Future Release
* Fixes
* Changes
* Add autonormalize as an add-on library (:pr:`1840`)
* Add DateToHoliday Transform Primitive (:pr:`1848`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add your username to the list of contributors on line 19.

@@ -14,3 +14,4 @@ pydata_sphinx_theme==0.7.1
nlp_primitives>=2.0.0
autonormalize >= 1.0.2
-r koalas-requirements.txt
holidays>=0.12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this should be removed from dev requirements and should be covered by the other requirements installs.

assert holiday_series[0] == "New Year's Day"
assert np.isnan(holiday_series[1])
assert holiday_series[2] == "Memorial Day"
assert holiday_series[3] == "Independence Day"
Copy link
Contributor

@gsheni gsheni Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add some additional tests?

    def test_nat(self):
        primitive_instance = self.primitive()
        primitive_func = primitive_instance.get_function()
        case = pd.Series([
            '2019-10-14',
            'NaT',
            '2016-02-15',
            'NaT',
        ]).astype('datetime64')
        answer = ['Columbus Day', np.nan, "Washington's Birthday", np.nan]
        given_answer = primitive_func(case).astype('str')
        np.testing.assert_array_equal(given_answer, answer)

    def test_valid_country(self):
        primitive_instance = self.primitive(country='Canada')
        primitive_func = primitive_instance.get_function()
        case = pd.Series([
            '2016-07-01',
            '2016-11-11',
            '2017-12-26',
            '2018-09-03',
        ]).astype('datetime64')
        answer = ['Canada Day', np.nan, 'Boxing Day', 'Labour Day']
        given_answer = primitive_func(case).astype('str')
        np.testing.assert_array_equal(given_answer, answer)

    def test_multiple_countries(self):
        primitive_mexico = self.primitive(country='Mexico')
        primitive_func = primitive_mexico.get_function()
        case = pd.Series([datetime(2000, 9, 16),
                          datetime(2005, 1, 1)])
        assert len(primitive_func(case)) > 1
        primitive_india = self.primitive(country='IND')
        primitive_func = primitive_mexico.get_function()
        case = pd.Series([datetime(2048, 1, 1),
                          datetime(2048, 10, 2)])
        primitive_func = primitive_india.get_function()
        assert len(primitive_func(case)) > 1
        primitive_uk = self.primitive(country='UK')
        primitive_func = primitive_uk.get_function()
        case = pd.Series([datetime(2048, 3, 17),
                          datetime(2048, 4, 6)])
        assert len(primitive_func(case)) > 1
        countries = ['Argentina', 'AU', 'Austria', 'BY', 'Belgium',
                     'Brazil', 'Canada', 'Colombia', 'Croatia',
                     'England', 'Finland', 'FRA', 'Germany',
                     'Germany', 'Italy', 'NewZealand', 'PortugalExt',
                     'PTE', 'Spain', 'ES', 'Switzerland', 'UnitedStates',
                     'US', 'UK', 'UA', 'CH', 'SE', 'ZA']
        for x in countries:
            self.primitive(country=x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thank you

Copy link
Contributor

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dvreed77 dvreed77 merged commit 22bc381 into main Jan 19, 2022
@dvreed77 dvreed77 deleted the 1846-date-to-holiday branch January 19, 2022 21:10
@dvreed77 dvreed77 mentioned this pull request Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add DateToHoliday primitive
5 participants