Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix postgres data handling from WAL logs in CDC mode #15481

Merged
merged 7 commits into from Aug 10, 2022

Conversation

subodh1810
Copy link
Contributor

Issue : #14628

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@subodh1810 subodh1810 self-assigned this Aug 9, 2022
@subodh1810 subodh1810 requested a review from a team as a code owner August 9, 2022 20:13
@github-actions github-actions bot added the area/connectors Connector related issues label Aug 9, 2022
@subodh1810
Copy link
Contributor Author

subodh1810 commented Aug 9, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2828036497
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2828036497
No Python unittests run

Build Passed

Test summary info:

All Passed

@subodh1810 subodh1810 temporarily deployed to more-secrets August 9, 2022 20:14 Inactive
Copy link
Contributor

@tuliren tuliren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subodh1810, in the PR description, it says that this PR relates to #14628. However, it's only related to Postgres CDC WAL, not that timezone issue.

@subodh1810 subodh1810 temporarily deployed to more-secrets August 9, 2022 20:30 Inactive
@subodh1810
Copy link
Contributor Author

subodh1810 commented Aug 9, 2022

@tuliren it does relate to the issue cause few temporal types (timestamp with timezone, time with timezone, etc.) were not generating right values + few other data types were also causing trouble

@subodh1810, in the PR description, it says that this PR relates to #14628. However, it's only related to Postgres CDC WAL, not that timezone issue.

@edgao edgao mentioned this pull request Aug 9, 2022
} else if (date instanceof LocalDate d) {
// Incremental mode
if (isBce(d)) {
d = d.minusYears(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15485 to make this better also

.fullSourceDataType(fullSourceType)
.airbyteType(JsonSchemaType.STRING_TIME_WITH_TIMEZONE)
.addInsertValues("null", "'13:00:01'", "'13:00:00+8'", "'13:00:03-8'", "'13:00:04Z'", "'13:00:05.012345Z+8'", "'13:00:06.00000Z-8'")
// A time value without time zone will use the time zone set on the database, which is Z-7,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is inaccurate? (since everything is returned in UTC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its relevant cause it helps us understand why 13:00:01 is being converted to 20:00:01.000000Z cause since 13:00:01 doesnt have timezone, we use the DB's timezone which is -7 and thus the value becomes 13:00:01-07 which in UTC representation would be 20:00:01.000000Z

@subodh1810 subodh1810 temporarily deployed to more-secrets August 10, 2022 10:14 Inactive
@subodh1810
Copy link
Contributor Author

subodh1810 commented Aug 10, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2831862042
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2831862042
No Python unittests run

Build Passed

Test summary info:

All Passed

@subodh1810 subodh1810 temporarily deployed to more-secrets August 10, 2022 10:22 Inactive
@subodh1810 subodh1810 temporarily deployed to more-secrets August 10, 2022 10:34 Inactive
@subodh1810
Copy link
Contributor Author

/test connector=connectors/source-postgres

@subodh1810 subodh1810 temporarily deployed to more-secrets August 10, 2022 14:32 Inactive
@subodh1810
Copy link
Contributor Author

subodh1810 commented Aug 10, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2833698151
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2833698151
No Python unittests run

Build Passed

Test summary info:

All Passed

@subodh1810 subodh1810 temporarily deployed to more-secrets August 10, 2022 15:43 Inactive
@subodh1810 subodh1810 merged commit 0092712 into master Aug 10, 2022
@subodh1810 subodh1810 deleted the fix-postgres-data-handling-in-transaction-logs branch August 10, 2022 16:18
@tuliren
Copy link
Contributor

tuliren commented Aug 10, 2022

This PR is published in #15496.

pmossman added a commit that referenced this pull request Aug 10, 2022
commit 10fb1dc137175d09826cdfcf419698e3000cd418
Author: pmossman <parker@airbyte.io>
Date:   Wed Aug 10 13:37:00 2022 -0700

    format and pmd

commit 7c223ec2e0abfec864c11395d7310d679706a49c
Author: pmossman <parker@airbyte.io>
Date:   Wed Aug 10 13:08:09 2022 -0700

    update peristStateActivity test

commit 763e9e2c5ca5f998ab49ffd0dcafb7ae81201b2b
Author: pmossman <parker@airbyte.io>
Date:   Fri Aug 5 15:24:10 2022 -0700

    format

commit c176e63c3841a1f08c7a43359d293b12297e03e4
Author: pmossman <parker@airbyte.io>
Date:   Fri Aug 5 15:18:03 2022 -0700

    move converters to module that worker can access, convert statePersistence calls to API calls, convert statePersistence helper to local private method

commit 1b583487b4ea7dd058944cdbce4de6197f967523
Author: pmossman <parker@airbyte.io>
Date:   Fri Aug 5 10:37:00 2022 -0700

    add createOrUpdateState API endpoint

commit d87eed6215ce451a3e126d433991967317839876
Author: pmossman <parker@airbyte.io>
Date:   Fri Aug 5 13:42:16 2022 -0700

    add AirbyteApiClient to WorkerApp for data plane workers to use

commit a65524a
Author: Teal Larson <LARSON.TEAL@GMAIL.COM>
Date:   Wed Aug 10 16:03:59 2022 -0400

    🪟 🔧 Add testing and storybook component for CatalogDiffModal (#15426)

    * wip diff modal test setup

    * starting storybook add

    * storybook working now

    * cleanup

    * aria labels

    * test syncmode string

commit 2f17e99
Author: Liren Tu <tuliren.git@outlook.com>
Date:   Wed Aug 10 13:02:01 2022 -0700

    🐞 Postgres source: fix bug in intermediate state emission (#15496)

    * Rename record counter

    * Rename method

    * Emit intermediate state after all cursor records

    * Emit intermediate state only when it is ready

    * Merge two checks

    * Add a testing message

    * Fix unit tests

    * Add one more testing record and add comments

    * Add test case for multiple records with the same cursor value

    * Revert irrelevant change

    * Add explanation in javadoc

    * Format code

    * Rename testing methods

    * Fix comment

    * Bump version

    * auto-bump connector version [ci skip]

    Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

commit f540499
Author: Alexandre Girard <alexandre@airbyte.io>
Date:   Wed Aug 10 11:37:07 2022 -0700

    [low-code connectors]: Assert there are no custom top-level fields (#15489)

    * move components to definitions field

    * Also update the references

    * validate the top level fields and add version

    * raise exception on unknown fields

    * newline

    * unit tests

    * set version to 0.1.0

    * newline

commit f52bfb6
Author: Xiaohan Song <xiaohan@airbyte.io>
Date:   Wed Aug 10 11:16:17 2022 -0700

    change query frequency to 1hour (#15499)

commit f143c8f
Author: midavadim <midavadim@yahoo.com>
Date:   Wed Aug 10 21:13:51 2022 +0300

    :tada: Source File - add support for custom encoding (#15293)

    * added support for custom encoding

    * fixed unit test for utf16

    * updated docs

    * bumped connector version

    * auto-bump connector version [ci skip]

    Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

commit bbf3584
Author: Alexandre Girard <alexandre@airbyte.io>
Date:   Wed Aug 10 10:58:22 2022 -0700

    Remove unused field from JsonSchema (#15425)

    * few fixes from working with sendgrid

    * reset to master

    * only update the docstring

    * reset

commit a280113
Author: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com>
Date:   Wed Aug 10 20:44:51 2022 +0300

    Destination S3: add LZO compression support (#15394)

    * Fixed bucket naming for S3

    * Destination S3: add LZO compression support for parquet files

    * Destination S3: add LZO compression support for parquet files

    * implemented logic for aarch64

    * removed redundant logging

    * updated changelog

    * moved intstall of native-lzo lib to Dockerfile

    * removed redundant logging

    * add unit test for aarch64

    * bump version

    * auto-bump connector version [ci skip]

    Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

commit 29c3426
Author: sivankumar86 <sivankumar86@users.noreply.github.com>
Date:   Thu Aug 11 02:34:47 2022 +1000

    Source MSSQL: special character support in dbname for CDC method (#15430)

    * information schema included

    * special character handle

    * Revert "information schema included"

    This reverts commit f0aee6a.

    * version change

    * doc update

    * auto-bump connector version [ci skip]

    Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

commit 959d862
Author: Baz <oleksandr.bazarnov@globallogic.com>
Date:   Wed Aug 10 19:20:22 2022 +0300

    🐛 Source SalesForce: changed `DEFAULT_WAIT_TIMEOUT_SECONDS` to 24-hour limit (#15444)

commit 0092712
Author: Subodh Kant Chaturvedi <subodh1810@gmail.com>
Date:   Wed Aug 10 21:48:57 2022 +0530

    fix postgres data handling from WAL logs in CDC mode (#15481)

    * fix postgres data handling from WAL logs in CDC mode

    * format

    * use formatter for dates also (#15485)

    * format

    * change test structure

    * change log to debug

    Co-authored-by: Edward Gao <edward.gao@airbyte.io>

commit fdb5eb9
Author: Evan Tahler <evan@airbyte.io>
Date:   Wed Aug 10 09:03:02 2022 -0700

    Simplify the `MigrationAcceptanceTest` (#15497)

    * disable `testAutomaticMigration`

    * empty commit to retry tests

    * Simplify the MigrationAcceptanceTest

    * lint

    * Fix PMD. Reorder some calls to make clear what is happening.

    Co-authored-by: Davin Chia <davinchia@gmail.com>

commit fd70913
Author: Augustin <augustin.lafanechere@gmail.com>
Date:   Wed Aug 10 17:42:07 2022 +0200

    SAT: compatibility tests for catalogs (#15486)

commit 1ad5152
Author: Evan Tahler <evan@airbyte.io>
Date:   Wed Aug 10 08:21:52 2022 -0700

    Disable automaticMigrationAcceptanceTest (#15492)

    * disable `testAutomaticMigration`

    * empty commit to retry tests

commit 1228451
Author: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>
Date:   Wed Aug 10 11:11:45 2022 -0400

    Fix styles impacting global ul, li in FieldSection component (#15484)

commit 6aa08e0
Author: Jonathan Pearlin <jonathan@airbyte.io>
Date:   Wed Aug 10 10:55:46 2022 -0400

    Add micronaut dependencies and bundles (#15459)

    * Add micronaut dependencies and bundles

    * Update Micronaut core

commit 7662956
Author: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>
Date:   Wed Aug 10 10:51:43 2022 -0400

    🪟 🧹 Cleanup documentation panel components (#15455)

    * Add docs/ to frontend workspace

    * Migrate Markdown components to scss and cleanup when not found is rendered

    * Add white-space: break-spaces rule to markdown code blocks

commit 1258ab4
Author: Topher Lubaway <asimplechris@gmail.com>
Date:   Wed Aug 10 09:05:14 2022 -0500

    Revert "Adds PAT check to shared pr check (#15453)" (#15511)

    This reverts commit 06a18d4.

commit 853b88a
Author: Kyryl Skobylko <xpuska513@gmail.com>
Date:   Wed Aug 10 16:48:20 2022 +0300

    fix: fix gcs-log creds secret name, add externaldb configuration for temporal, fix webapp ingress (#15510)

commit c782303
Author: Yatsuk Bogdan <yatsukbogdan@gmail.com>
Date:   Wed Aug 10 15:57:26 2022 +0300

    :window: :art: Increases GroupTitle followed divs width from 180px to 250px (#13956)

    * Increases GroupControls followed divs width from 180px to 250px

    * Increases min-width for GroupTitle

    * Change layout to flexbox

    Co-authored-by: Tim Roes <tim@airbyte.io>

commit e28bc3a
Author: Serhii Chvaliuk <grubberr@gmail.com>
Date:   Wed Aug 10 13:55:29 2022 +0300

    🎉Source Harvest: Added `parent_id` for all streams which have parent stream (#15221)

    Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

commit aaa3aae
Author: Tuhai Maksym <kimerinn@gmail.com>
Date:   Wed Aug 10 12:43:55 2022 +0300

    15310: Destination Scylla: Handle per-stream state (#15399)

    * 15310: Destination Scylla: Handle per-stream state

    * 15399: test fix

    * 15318: test fix

    * 15318: updating version

    * auto-bump connector version [ci skip]

    Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

commit c724630
Author: Yurii Bidiuk <35812734+yurii-bidiuk@users.noreply.github.com>
Date:   Wed Aug 10 10:17:23 2022 +0300

    Add test case for new fields appearing in data (#15372)

    * add test case for new field(s) appearing in data

    * rework test to verify that sync at least not failed if new fields are present

commit 6e1a76f
Author: Serhii Chvaliuk <grubberr@gmail.com>
Date:   Wed Aug 10 09:24:40 2022 +0300

    🐛 Source Amazon Ads: define primary_key for all report streams (#15469)

    Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

commit c1a0cbc
Author: Octavia Squidington III <90398440+octavia-squidington-iii@users.noreply.github.com>
Date:   Wed Aug 10 04:20:12 2022 +0200

    Bump Airbyte version from 0.39.42-alpha to 0.40.0-alpha (#15493)

    Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com>

commit f6766ee
Author: Benoit Moriceau <benoit@airbyte.io>
Date:   Wed Aug 10 07:50:41 2022 +0800

    Revert "Revert "Release per stream to the OSS project (#15008)" (#15177)" (#15401)

    This reverts commit 362fc4e.

commit eab0013
Author: Edward Gao <edward.gao@airbyte.io>
Date:   Tue Aug 9 16:13:09 2022 -0700

    🐛 Source snowflake: int columns should be discovered as ints (#15314)

    * snowflake discovers ints as ints

    * version bump+changelog

    * bump version+changelog

    * auto-bump connector version [ci skip]

    Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

commit f506c60
Author: Anne <102554163+alovew@users.noreply.github.com>
Date:   Tue Aug 9 16:07:35 2022 -0700

    Track number of streams in syncs (#15478)

    * Add number_of_streams to job sync tracking

commit 6c5d1ff
Author: Augustin <augustin.lafanechere@gmail.com>
Date:   Wed Aug 10 00:33:58 2022 +0200

    SAT: measure unit test coverage (#15443)

commit e9afa9b
Author: Anne <102554163+alovew@users.noreply.github.com>
Date:   Tue Aug 9 15:30:48 2022 -0700

    Error Prone PMD rules (#15010)

    * Implement ErrorProne PMD rules:
    AssignmentInOperand
    AvoidAccessibilityAlteration
    AvoidBranchingStatementAsLastInLoop
    AvoidCatchingNPE
    AvoidCatchingThrowable
    AvoidDuplicateLiterals rule

commit c536e51
Author: Tim Roes <tim@airbyte.io>
Date:   Wed Aug 10 00:11:12 2022 +0200

    Fix copy link to logs functionality (#15368)

    * Fix copy link to logs functionality

    * Update airbyte-webapp/src/components/JobItem/JobItem.tsx

    Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

    * Fix scrolling

    * Remove smooth scrolling

    * Improve effect for better return statements

    * Better scroll

    Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

commit 62303a8
Author: Augustin <augustin.lafanechere@gmail.com>
Date:   Tue Aug 9 23:07:13 2022 +0200

    SAT: check that previous config schema validates against current connector spec (#15367)

commit 123705c
Author: Stephen Wentling <stephen@swentling.com>
Date:   Tue Aug 9 21:30:14 2022 +0100

    Source Jira: Added updates to include issue components and fixes to README files (#15135)

    * solve readme conflict

    * updated jira sources with open PR details

    * correct additionalProperties test discover

    Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>

commit 9e691d8
Author: Alex <109167606+alex-gron@users.noreply.github.com>
Date:   Tue Aug 9 14:28:38 2022 -0500

    fix broken link (#15379)

commit 36ed6ce
Author: Denys Davydov <davydov.den18@gmail.com>
Date:   Tue Aug 9 21:58:52 2022 +0300

    #15445 source typeform: integration tests (#15446)

commit 06a18d4
Author: Topher Lubaway <asimplechris@gmail.com>
Date:   Tue Aug 9 13:33:20 2022 -0500

    Adds PAT check to shared pr check (#15453)

    * Adds PAT check to shared pr check

    * Name change

    * Removes "safe_to_push" string

    * Adds OCTAVIA_PAT and uses the found PAT

    found PAT was not used in all locales, so this could have still failed
    on an expired OCTAVIA_PAT before this change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants