Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12994: [R] stringr tests fails on non-UTC machines due to strptime defaulting to local timezone and Arrow defaulting to UTC [WIP] #10495

Closed
wants to merge 2 commits into from

Conversation

thisisnic
Copy link
Member

No description provided.

@thisisnic thisisnic changed the title Add default timezone to strptime ARROW-12994: [R] stringr tests fails on non-UTC machines due to strptime defaulting to local timezone and Arrow defaulting to UTC Jun 9, 2021
@thisisnic thisisnic changed the title ARROW-12994: [R] stringr tests fails on non-UTC machines due to strptime defaulting to local timezone and Arrow defaulting to UTC ARROW-12994: [R] stringr tests fails on non-UTC machines due to strptime defaulting to local timezone and Arrow defaulting to UTC Jun 9, 2021
@github-actions
Copy link

github-actions bot commented Jun 9, 2021

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@github-actions
Copy link

github-actions bot commented Jun 9, 2021

@nealrichardson nealrichardson self-requested a review June 11, 2021 15:59
@nealrichardson
Copy link
Contributor

I'm wary of hardcoding UTC in our tests because it opens up all kinds of ways where we might do the wrong thing if you're not on UTC. We've had bug reports about this in the past, and the problem literally was that we were hardcoding UTC in the data translation layer itself.

Here are some tests I added at that time to test all of the combinations: timezone-naive, UTC, timezone specified but definitely not your timezone (hence Pyonyang): https://github.com/apache/arrow/blob/master/r/tests/testthat/test-Array.R#L253-L292

I'm not sure that we can do the exact same thing here but maybe we can. One of the bugs I recall seeing (aside from the UTC hardcoding) was that R and Arrow appeared to be inconsistent with timezones, but the inconsistency was entirely in the print method: R localizes timezone-naive POSIXt data when it prints it and Arrow just prints as-is, which is only the same as R if you're in UTC.

If there are limitations or incompatibilities in what Arrow's strptime function does, let's be sure to document them and/or link to JIRAs for addressing the issues.

@thisisnic
Copy link
Member Author

This ticket has been opened about the C++ function: https://issues.apache.org/jira/browse/ARROW-12820

Do I need to do anything else here @nealrichardson or should I just close this PR?

@jorisvandenbossche
Copy link
Member

[@nealrichardson] I'm wary of hardcoding UTC in our tests because it opens up all kinds of ways where we might do the wrong thing if you're not on UTC.

Isn't the issue here that R's strptime assumes your local machine timezone, while Arrow's version doesn't do this. So some way or another, the test will need to take into account this difference in behaviour (unless we want to adapt the R bindings for strptime to use the machine timezone?).

Here are some tests I added at that time to test all of the combinations

That are general tests about timezone handling (eg when roundtripping between R <-> Arrow), I think? While here it is a specific test about the behaviour of the strptime kernel.

This ticket has been opened about the C++ function: https://issues.apache.org/jira/browse/ARROW-12820

That ticket is about not ignoring the timezone if it is present in the string. The current test you are modifying here doesn't have that, so I think this is unrelated to ARROW-12820

@thisisnic
Copy link
Member Author

I guess if the aim here is to write R bindings which mimic the expected behaviour of the R function itself, the best solution would be to adapt the R binding to use the machine timezone.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jul 6, 2021

Just chatted with Nic about this, and if we want to follow the R strptime behaviour of using the locale's timezone, then that depends on ARROW-13033 (has a WIP PR, and which adds a kernel to convert from timestamp without timezone to with timezone, so we can change to a different timezone preserving the clock time)

@thisisnic thisisnic changed the title ARROW-12994: [R] stringr tests fails on non-UTC machines due to strptime defaulting to local timezone and Arrow defaulting to UTC ARROW-12994: [R] stringr tests fails on non-UTC machines due to strptime defaulting to local timezone and Arrow defaulting to UTC [WIP] Jul 12, 2021
@thisisnic thisisnic marked this pull request as draft July 12, 2021 08:34
@nealrichardson
Copy link
Contributor

Closing in favor of #10706

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants