Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Snowflake: Write extracted_at in UTC #35308

Merged
merged 33 commits into from Mar 6, 2024

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Feb 14, 2024

Hopefully most changes are reasonably self-explanatory, but a few callouts:

  • SnowflakeDestinationHandler now upcases the database name in the constructor, before assigning it to the field. This is because in the rest of the connector, we never enquote the database name - so when we query information_schema, we need to always upcase it.
  • There's a new test case that runs a sync with destination-snowflake:3.5.11, i.e. a version that still wrote extracted_at in the user local timezone instead of UTC. I added some ltz_*_expectedrecords_*.jsonl files to verify that that sync behaved as expected - not strictly necessary, but it's nice to have a baseline assertion that the known-bad sync behaved in the way that we expect.
  • All the other jsonl changes are exactly what you'd expect, i.e. switching extracted_at from -08:00 to Z.

We are not migrating existing timestamps in raw tables or final tables, instead we are relying on timestamp arithmetic to always force normalize it to UTC as below. Also as part of this PR, we are removing the optimization to use _airbyte_extracted_at > [a very old value] because of using the following conversion, that query became slow.

TIMESTAMPADD(
  HOUR,
  EXTRACT(TIMEZONE_HOUR from "_airbyte_extracted_at"),
  TIMESTAMPADD(
    MINUTE,
    EXTRACT(TIMEZONE_MINUTE from "_airbyte_extracted_at"),
    CONVERT_TIMEZONE('UTC', "_airbyte_extracted_at")
  ))

Copy link

vercel bot commented Feb 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Mar 6, 2024 3:40pm

Copy link
Contributor

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@edgao edgao changed the base branch from edgao/destination_state_table to edgao/staging_csv_timestamp_timezone February 14, 2024 23:40
@edgao edgao force-pushed the edgao/snowflake_timezone_migration branch from 62a8eb4 to a05927a Compare February 14, 2024 23:40
@edgao edgao force-pushed the edgao/staging_csv_timestamp_timezone branch from 350f326 to 9b0bca9 Compare February 15, 2024 00:14
@edgao edgao force-pushed the edgao/snowflake_timezone_migration branch from a05927a to 47d114e Compare February 15, 2024 00:14
@edgao edgao force-pushed the edgao/snowflake_timezone_migration branch from 346f804 to cc4cd5f Compare February 15, 2024 15:29
@edgao edgao force-pushed the edgao/staging_csv_timestamp_timezone branch from 3829bea to bbe9cde Compare February 21, 2024 00:28
@edgao edgao force-pushed the edgao/snowflake_timezone_migration branch 2 times, most recently from 801bc47 to cffd162 Compare February 21, 2024 16:11
@edgao edgao force-pushed the edgao/staging_csv_timestamp_timezone branch from bbe9cde to 9f7d650 Compare February 21, 2024 16:11
@edgao edgao force-pushed the edgao/snowflake_timezone_migration branch from cffd162 to 391988a Compare February 21, 2024 20:34
@octavia-squidington-iii octavia-squidington-iii added area/documentation Improvements or additions to documentation area/octavia-cli CDK Connector Development Kit labels Feb 21, 2024
@edgao edgao force-pushed the edgao/staging_csv_timestamp_timezone branch from 9f7d650 to 863ef4a Compare February 21, 2024 20:34
@gisripa gisripa force-pushed the edgao/snowflake_timezone_migration branch from d5c2bce to 61b2d99 Compare March 6, 2024 02:43
@gisripa gisripa merged commit da79f6e into master Mar 6, 2024
29 checks passed
@gisripa gisripa deleted the edgao/snowflake_timezone_migration branch March 6, 2024 18:00
xiaohansong pushed a commit that referenced this pull request Mar 7, 2024
Signed-off-by: Gireesh Sreepathi <gisripa@gmail.com>
Co-authored-by: Gireesh Sreepathi <gisripa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation area/octavia-cli connectors/destination/snowflake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants