Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Datetime format inference #1666

Merged
merged 25 commits into from Apr 11, 2023
Merged

Update Datetime format inference #1666

merged 25 commits into from Apr 11, 2023

Conversation

simha104
Copy link
Contributor

@simha104 simha104 commented Apr 4, 2023


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request.

@codecov
Copy link

codecov bot commented Apr 4, 2023

Codecov Report

Merging #1666 (168745a) into main (34db526) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1666   +/-   ##
=======================================
  Coverage   98.79%   98.79%           
=======================================
  Files          98       98           
  Lines       11786    11800   +14     
=======================================
+ Hits        11644    11658   +14     
  Misses        142      142           
Impacted Files Coverage Δ
woodwork/tests/conftest.py 100.00% <ø> (ø)
woodwork/tests/fixtures/datetime_freq.py 100.00% <100.00%> (ø)
woodwork/tests/logical_types/test_logical_types.py 100.00% <100.00%> (ø)
woodwork/tests/utils/test_utils.py 100.00% <100.00%> (ø)
woodwork/utils.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

"%d/%m/%y",
"%y/%d/%m",
"%d/%m/%y %H:%M:%S",
"%y/%d/%m %H:%M:%S",
"%y/%d/%m %H:%M:%S%z",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't support timezone for all formats with two digit years, we'd want the timezone appended to the end of every iteration of two digit years. I'd recommend adding it as a secondary check if two digit year inference fails

.astype("datetime64[ns]")
)
df = pd.concat([head, pd.Series(dates), tail]).reset_index(drop=True)
df = pd.to_datetime(df, utc=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is utc=True valid for the non-timezone use cases? I think this is converting all the datetimes to including timezones right? The tests are all passing so I think it's alright but I just wanted to check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ultimately convert everything to UTC anyway right? If we got something in a different timezone would this just convert it to UTC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it would do that

"%y/%m/%d",
"%m/%d/%y %H:%M:%S",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come these were removed?

Copy link
Contributor Author

@simha104 simha104 Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because I add on the timestamps below in a different way that makes sure every format is paired up with a timezone rather than only a few

This is the line:

time_stamp_formats = []
        for format_ in datetime_only_formats:
            time_stamp_formats.append(format_ + " %H:%M:%S")
time_stamp_formats_with_timezone = []
for format_ in datetime_only_formats:
            time_stamp_formats_with_timezone.append(format_ + " %H:%M:%S%z")

@Cmancuso
Copy link
Contributor

Cmancuso commented Apr 5, 2023

Couple of questions and needs a rebase, but otherwise looking good

@ParthivNaresh ParthivNaresh self-requested a review April 10, 2023 20:46
Copy link
Collaborator

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@simha104 simha104 merged commit a3e5459 into main Apr 11, 2023
31 checks passed
@simha104 simha104 deleted the add_timezone_to_format branch April 11, 2023 02:22
@jeff-hernandez jeff-hernandez mentioned this pull request Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datetime Logical Type isn't able to infer dates with a format including 2 digit years and timezone
4 participants