Skip to content

Conversation

@ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Feb 14, 2023

Allow timestamp fields with UTC suffix

e.g. "2023-02-14 12:00:00.123 UTC"

Also, change to use one date time formatter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not catch an exception to determine we should use an alternate date time formatter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was already the current implementation, will change to have one date time formatter to parse from.

@ahmedabu98 ahmedabu98 changed the title Handle Timestamp fields with UTC in Storage writes One formatter for Timestamp fields in Storage writes Feb 14, 2023
@ahmedabu98
Copy link
Contributor Author

R: @johnjcasey PTAL

.appendLiteral(' ')
.append(DateTimeFormatter.ISO_LOCAL_TIME)
.optionalStart()
.appendLiteral(" UTC")
Copy link
Contributor

@an2x an2x Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should support any time zone here, not just UTC.
Even better, support all formats listed in BigQuery documentation, i.e.:

  • 2014-09-27T12:30:00.45Z
  • 2014-09-27 12:30:00.45-8:00
  • 2014-09-27 12:30:00.123456 UTC
  • 2014-09-27 12:30:00.123456 America/Los_Angeles

Note that in all 4 formats BQ accepts either space or "T" as the date/time separator. And as for the datetime/time zone separator: for "Z" and "-8:00" time zone format there must be no space before the time zone, and for "UTC" and "America/Los_Angeles" format there must be a space before.

Comment on lines +84 to +89
.optionalStart()
.appendLiteral(' ')
.optionalEnd()
.optionalStart()
.appendLiteral('T')
.optionalEnd()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this allows for timestamp fields in TableRows to look like this:

  1. 2000-02-13T12:30:00
  2. 2000-02-13 12:30:00
  3. 2000-02-13 T12:30:00
  4. 2000-02-1312:30:00

All of these would be handled the same way so I don't think there is much concern. If there's an issue however, I can split up the DATETIME_SPACE_FORMATTER so only 1) and 2) are valid

@ahmedabu98
Copy link
Contributor Author

ahmedabu98 commented Feb 15, 2023

@johnjcasey, @an2x ready for review

@johnjcasey
Copy link
Contributor

LGTM

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @apilloud for label java.
R: @johnjcasey for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@apilloud
Copy link
Member

R: @johnjcasey @an2x

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@johnjcasey johnjcasey merged commit 8ec5673 into apache:master Feb 27, 2023
ruslan-ikhsan pushed a commit to akvelon/beam that referenced this pull request Mar 10, 2023
* handle timestamps with one formatter; handle UTC suffix

* timestamp formatter that handles zone region suffix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants