Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigquery destination SCD table does not work with array type from source Postgres table #4357

Closed
Tracked by #6996
shreyaspurohit opened this issue Jun 25, 2021 · 2 comments · Fixed by #8749
Closed
Tracked by #6996

Comments

@shreyaspurohit
Copy link

shreyaspurohit commented Jun 25, 2021

Expected Behavior

The SCD table in BigQuery should have array values from postgres table populated instead of null.

Current Behavior

The SCD table in BigQuery has null populated for array values. The _airbyte_raw_table has the array values transferred from postgres correctly. But the transformation from _airbyte_raw_table to table_scd populates null in the array field. Thanks to the help from @marcosmarxm on slack, we were able to find the root cause of the issue to be the usage of the legacy JSON function - json_extract_scalar in the DBT model for the SCD in BigQuery. The json_query or json_query_array should be used based on if the destination column needs to be string or array type as per this Google docs.

Logs

  1. Postgres tables: postgres_sql_log.txt
  2. _airbyte_raw_table on BigQuery screenshot: airbyte_raw_array_test_table
  3. SCD table on BigQuery screenshot: array_test_scd_table
  4. Sync logs: logs-55-0.txt
  5. DBT generated models: dbt_array_test_scd_sql.txt

Steps to Reproduce

  1. Create a test table in postgres with create table array_test (id bigserial primary key, values text[] not null, created_at timestamptz not null DEFAULT (NOW() AT TIME ZONE 'utc'));
  2. Insert some data - insert into array_test(values) values('{"a","b","c"}');
  3. Configure a CDC from postgres to BigQuery
  4. Run the Sync
  5. Look at data in SCD tables. The data {"a", "c", "c"} is missing. (the data is present in _raw_airbyte_array_test)

Severity of the bug for you

High

Airbyte Version

0.25.0-alpha

Connector Version (if applicable)

Destination (bigquery): 0.3.6
Source (postgres): 0.3.4

Additional context

n/a

@shreyaspurohit shreyaspurohit added the type/bug Something isn't working label Jun 25, 2021
@ChristopheDuong
Copy link
Contributor

ChristopheDuong commented Jun 28, 2021

In the catalog, the field values is marked as "values":{"type":"string"} but in the record message, the field is an array of string.

If it was an array of string, then normalization would try to unnest it, but Postgres CDC is not being consistent here

Another issue related to #1421

@tuliren
Copy link
Contributor

tuliren commented Dec 20, 2021

TODO:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment