Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres Source Fixed JSONB array mapping #23608

Closed
wants to merge 4 commits into from

Conversation

VitaliiMaltsev
Copy link
Contributor

What

Issue https://github.com/airbytehq/oncall/issues/1570

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Mar 1, 2023

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4302214178
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4302214178
No Python unittests run

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:509: The previous and actual discovered catalogs are identical.
=================== 62 passed, 5 skipped in 85.10s (0:01:25) ===================

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (3)

Connector Version Changelog Publish
source-alloydb 1.0.49
source-alloydb-strict-encrypt 1.0.49 🔵
(ignored)
🔵
(ignored)
source-postgres-strict-encrypt 1.0.51 🔵
(ignored)
🔵
(ignored)
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for this

@@ -216,7 +216,7 @@ private void putJsonbArray(ObjectNode node, String columnName, ResultSet resultS
final ResultSet arrayResultSet = resultSet.getArray(colIndex).getResultSet();

while (arrayResultSet.next()) {
final PGobject object = getObject(arrayResultSet, colIndex, PGobject.class);
final PGobject object = getObject(arrayResultSet, 2, PGobject.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am not sure I understand this. Why is the value hardcoded at 2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to traverse the result set and increase the index as you go. This seems wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am not sure I understand this. Why is the value hardcoded at 2?

Please take a look how all of other arrays mapping implemented (varchar array, int array, etc)
During iteration over arrays resultSet there are only 2 indexes:
index 1 - the ordinal number of the element in the array
index 2 - value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can you add a test for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can you add a test for this.

we already have test for jsonb[] datatype in AbstractPostgresSourceDatatypeTest. Updated it to contain more than 2 elements in array

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Mar 1, 2023

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4302683470
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4302683470
No Python unittests run

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:509: The previous and actual discovered catalogs are identical.
=================== 62 passed, 5 skipped in 83.72s (0:01:23) ===================

@VitaliiMaltsev
Copy link
Contributor Author

/publish connector=connectors/source-postgres

@VitaliiMaltsev
Copy link
Contributor Author

/publish connector=connectors/source-postgres-strict-encrypt

@VitaliiMaltsev
Copy link
Contributor Author

/publish connector=connectors/source-postgres

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Mar 1, 2023

/publish connector=connectors/source-postgres

🕑 Publishing the following connectors:
connectors/source-postgres
https://github.com/airbytehq/airbyte/actions/runs/4304838458


Connector Did it publish? Were definitions generated?
connectors/source-postgres

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Mar 1, 2023

/publish connector=connectors/source-postgres-strict-encrypt

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4304877489


Connector Did it publish? Were definitions generated?
connectors/source-postgres-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@subodh1810
Copy link
Contributor

subodh1810 commented Mar 1, 2023

@VitaliiMaltsev tests to carry out with this PR.

  1. Run a sync for a Postgres table with jsonb data type to a bigquery destination on (postgres version 1.0.49)
  2. Update the postgres connector version which has your fix
  3. Run the sync again and see if the data written in the destination is correct or not.

Please share how the data looks in destination on version 1.0.49 and with your fix

@subodh1810
Copy link
Contributor

subodh1810 commented Mar 1, 2023

@VitaliiMaltsev another set of tests to carry out with this PR.

  1. Run a sync for a Postgres table with jsonb data type to a bigquery destination on (postgres version 1.0.49)
  2. Update the postgres connector version which has your fix
  3. Refresh the schema
  4. Run the sync again and see if the data written in the destination is correct or not.

Please share how the data looks in destination on version 1.0.49 and with your fix

@VitaliiMaltsev
Copy link
Contributor Author

@VitaliiMaltsev tests to carry out with this PR.

  1. Run a sync for a Postgres table with jsonb data type to a bigquery destination on (postgres version 1.0.49)
  2. Update the postgres connector version which has your fix
  3. Run the sync again and see if the data written in the destination is correct or not.

Please share how the data looks in destination on version 1.0.49 and with your fix

@subodh1810 @edgao
##Test results

1.0.49 sync

_airbyte_raw_table

1049

Normalized table

1049schema

1049final

--------------------------------------------------------------------------

1.0.51 sync without schema refresh

_airbyte_raw_table

1050

Normalized table

1049schema

1050final

--------------------------------------------------------------------------

1.0.51 sync with schema refresh

_airbyte_raw_table

1050

Normalized table
1049schema
1050schema-refresh

@subodh1810
Copy link
Contributor

subodh1810 commented Mar 1, 2023

@VitaliiMaltsev the normalized table for 1.0.51 without schema refresh doesnt show the data, its all null. why is that?

@subodh1810
Copy link
Contributor

@edgao shouldnt the datatype of normalized column after refresh be json instead of string (i am guessing we support json data type in bigquery)

@subodh1810
Copy link
Contributor

@VitaliiMaltsev the screenshots are difficult to follow. Can you use order by in your queries so that all the screenshots have the same ordering.

@VitaliiMaltsev
Copy link
Contributor Author

@VitaliiMaltsev the normalized table for 1.0.51 without schema refresh doesnt show the data, its all null. why is that?

@subodh1810 because actual json schema changed to "test_column":{"type":"object","oneOf":[{"type":"array"},{"type":"object"},{"type":"number"},{"type":"string"},{"type":"boolean"}],"airbyte_type":"json"}}} and put all the values to the proper json nodes (see _airbyte_raw table) but normalization used the old schema "test_column":{"type":"string"} without schema refresh

@VitaliiMaltsev
Copy link
Contributor Author

@VitaliiMaltsev the screenshots are difficult to follow. Can you use order by in your queries so that all the screenshots have the same ordering.

Screenshots have the same order

  1. Postgres version
  2. _airbyte_raw table
  3. Normalized table schema
  4. Normalized table values

@subodh1810
Copy link
Contributor

subodh1810 commented Mar 1, 2023

#23608 (comment)
I meant the screenshots showing the data in normalized tables. If you order btid column it will be easier to compare the screenshots side by side

@subodh1810
Copy link
Contributor

subodh1810 commented Mar 1, 2023

@VitaliiMaltsev for #23608 (comment)
but the problem is that now it becomes breaking change and unless the user refreshes the schema we are going to write null values which is bad.

@subodh1810
Copy link
Contributor

also I dont see the values from postgres. Can you share the table structure in postgres (\d <table_name>) and how the actual data looks in postgres

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Mar 1, 2023

also I dont see the values from postgres. Can you share the table structure in postgres (\d <table_name>) and how the actual data looks in postgres

Original Postgres Table
@subodh1810
postgres-original-json-table

@subodh1810
Copy link
Contributor

subodh1810 commented Mar 1, 2023

@VitaliiMaltsev
Update from a quick discussion that I and
@edgao had. The fact that the version with your changes is writing null values for jsonb columns in the destination if customers dont trigger a refresh schema (as mentioned here) is bad. We decided that the right way to go about this is to revert the original PR and publish new version so that OSS users have a fix and release of postgres is unblocked.

Once done we need to revisit the decision making on how we want to introduce jsonb change and after thorough testing with proper backward compatibility test you can again raise a new PR.

We will have to delete the image 1.0.51 published here and here

Update : Deleted the 1.0.51 tags from docker hub

@subodh1810
Copy link
Contributor

Update : Deleted the 1.0.51 tags from docker hub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants