Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Detection of %z in strptime format fails in certain cases, not returning timestamp with tz=UTC #35448

Closed
jorisvandenbossche opened this issue May 5, 2023 · 0 comments · Fixed by #35449
Assignees
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 5, 2023

Describe the bug, including details regarding any error messages, version, and platform.

#11358 refactored the timezone offset handling in string parsing (both in read_csv, casting str->timestamp and strptime).

The general rule was that if there is a %z in the format string you provide to strptime, we apply the offset and return a timestamp type with tz=UTC.
But this seems to work only in certain cases:

# space before %z -> works
>>> pc.strptime(["5/1/2020 +0100", None, "12/11/1900 -0130"], format="%m/%d/%Y %z", unit='us').type
TimestampType(timestamp[us, tz=UTC])
# no space before %z -> doesn't work
>>> pc.strptime(["5/1/2020+0100", None, "12/11/1900-0130"], format="%m/%d/%Y%z", unit='us').type
TimestampType(timestamp[us])

Component(s)

C++

jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue May 5, 2023
pitrou pushed a commit that referenced this issue May 16, 2023
### Rationale for this change

See gh-35448 for the failing example. The current code in `GetZone` was assuming there was always some character between the `%z` and the preceding `%` code (like a whitespace, or `-` or `/`). That is often not the case with `%z` (in time formats like `00:00+01`, the `+` is part of `%z`, and so the format is `%H:%M%z` without character between `%M` and `%z`)
 
### Are these changes tested?

Test is added

### Are there any user-facing changes?

The result type will no now correctly have a `tz=UTC` parametrization
* Closes: #35448

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 13.0.0 milestone May 16, 2023
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…35449)

### Rationale for this change

See apachegh-35448 for the failing example. The current code in `GetZone` was assuming there was always some character between the `%z` and the preceding `%` code (like a whitespace, or `-` or `/`). That is often not the case with `%z` (in time formats like `00:00+01`, the `+` is part of `%z`, and so the format is `%H:%M%z` without character between `%M` and `%z`)
 
### Are these changes tested?

Test is added

### Are there any user-facing changes?

The result type will no now correctly have a `tz=UTC` parametrization
* Closes: apache#35448

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants