Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix normalization SCD partition by float columns errors with BigQuery #9281

Merged
merged 20 commits into from
Jan 6, 2022

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Jan 4, 2022

What

Relates to https://github.com/airbytehq/oncall/issues/76
Closes #9215

How

  • Cast cursor field column when used in partition by if the column is a float type on BigQuery
  • Switch fall back priority on cursor columns when the cursor field is not provided. CDC columns if exists, take precedence over airbyte_emitted_at columns (and they can be float type)
  • Change integration test for stream dedup_cdc_excluded to use _ab_cdc_lsn as cursor (float column)
  • Re-indent columns in generated SQL to be all aligned
  • Remove indent of '{%' jinja directives in generated SQL to better separate these macros from actual SQL indentation (and make it visually clear)
  • Fix Oracle SCD dedup bug not ordering deleted rows properly (failing the new test otherwise)

Recommended reading order

  1. airbyte-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py
  2. airbyte-integrations/bases/base-normalization/integration_tests/resources/test_simple_streams/data_input/catalog.json
  3. the rest

@github-actions github-actions bot added area/api Related to the api area/connectors Connector related issues area/documentation Improvements or additions to documentation area/frontend area/platform issues related to the platform area/scheduler area/server area/worker Related to worker kubernetes normalization labels Jan 4, 2022
@ChristopheDuong ChristopheDuong changed the base branch from chris/normalization-scd-refactor to master January 4, 2022 14:41
@airbytehq airbytehq deleted a comment from CLAassistant Jan 4, 2022
@github-actions github-actions bot removed area/frontend area/api Related to the api labels Jan 5, 2022
@github-actions github-actions bot removed area/documentation Improvements or additions to documentation area/connectors Connector related issues area/worker Related to worker area/platform issues related to the platform labels Jan 5, 2022
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets January 5, 2022 10:50 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets January 5, 2022 11:58 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets January 5, 2022 13:56 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets January 5, 2022 14:09 Inactive
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Jan 5, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1658883203
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1658883203
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            89     64    28%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/cdk/utils/event_timing.py         47      3    94%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     15    55%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        713    397    44%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     124      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 516    330    36%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     32    78%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1221    518    58%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     124      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 516    330    36%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     32    78%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1221    518    58%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     12    92%
	 normalization/transform_catalog/destination_name_transformer.py     124      4    97%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 516     41    92%
	 normalization/transform_catalog/table_name_registry.py              174     51    71%
	 normalization/transform_catalog/transform.py                         45     30    33%
	 normalization/transform_catalog/utils.py                             33      0   100%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     45    69%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1221    189    85%

@jrhizor jrhizor temporarily deployed to more-secrets January 5, 2022 14:52 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets January 6, 2022 14:04 Inactive
@ChristopheDuong ChristopheDuong changed the title fix bq normalization scd float 🐛 Fix partition by float columns with BigQuery Jan 6, 2022
@ChristopheDuong ChristopheDuong changed the title 🐛 Fix partition by float columns with BigQuery 🐛 Fix partition by float columns errors with BigQuery Jan 6, 2022
@ChristopheDuong ChristopheDuong changed the title 🐛 Fix partition by float columns errors with BigQuery 🐛 Fix normalization partition by float columns errors with BigQuery Jan 6, 2022
@ChristopheDuong ChristopheDuong changed the title 🐛 Fix normalization partition by float columns errors with BigQuery 🐛 Fix normalization SCD partition by float columns errors with BigQuery Jan 6, 2022
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@ChristopheDuong ChristopheDuong merged commit e0bac4a into master Jan 6, 2022
@ChristopheDuong ChristopheDuong deleted the chris/fix-bq-normalization-scd-float branch January 6, 2022 17:49
@ChristopheDuong
Copy link
Contributor Author

Will publish normalization in another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source MySQL CDC: _airbyte_start_at and _airbyte_end_at = _airbyte_emitted_at
5 participants