-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalization: solve conflict when stream and field have same name #4557
Normalization: solve conflict when stream and field have same name #4557
Conversation
/test connector=bases/base-normalization
|
@@ -74,7 +74,8 @@ def setup_test_path(request): | |||
] | |||
), | |||
) | |||
@pytest.mark.parametrize("destination_type", list(DestinationType)) | |||
# @pytest.mark.parametrize("destination_type", list(DestinationType)) | |||
@pytest.mark.parametrize("destination_type", [DestinationType.POSTGRES]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remember to revert this
/test connector=bases/base-normalization
|
@ChristopheDuong I added logic using |
-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema | ||
select | ||
_airbyte_conflict_stream_name_hashid, | ||
{{ json_extract('conflict_stream_name', ['conflict_stream_name']) }} as conflict_stream_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChristopheDuong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"conflict_name": {"conflict_name": "hi"}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if somehow the macro would translate into something like json_extract('conflict_stream_name', ['conflict_stream_name', 'conflict_stream_name'])
maybe that would make it less ambiguous for the SQL engine (Destination)?
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
|
FYI I've been looking at fixing normalization integration tests here: #4910 (and you might need to merge master as I also included some other fixes recently too, on top of all the conflicts this generated...) |
/test connector=bases/base-normalization
|
@ChristopheDuong mysql is getting a cycle and I'm having problems to run locally the integration test for mysql.
Redshift failed and locally is not working too, the problem I'm 100% it already worked before :( |
Yes, redshift is broken for me too The test is failing on writing raw JSON records to redshift using only the |
/test connector=destination-redshift
|
@@ -322,7 +323,7 @@ def generate_json_parsing_model(self, from_table: str, column_names: Dict[str, T | |||
{{ field }}, | |||
{%- endfor %} | |||
_airbyte_emitted_at | |||
from {{ from_table }} | |||
from {{ from_table }} as {{ table_alias }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YOu need to introduce table_alias
in only the extract_json cte not all of them
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
|
.../test_primary_key_streams/final/airbyte_ctes/test_normalization/conflict_stream_name_ab2.sql
Outdated
Show resolved
Hide resolved
...es/test_normalization/conflict_stream_name_conflict_stream_name_conflict_stream_name_ab1.sql
Show resolved
Hide resolved
airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py
Outdated
Show resolved
Hide resolved
...ons/bases/base-normalization/normalization/transform_catalog/destination_name_transformer.py
Outdated
Show resolved
Hide resolved
...ons/bases/base-normalization/normalization/transform_catalog/destination_name_transformer.py
Outdated
Show resolved
Hide resolved
...te-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py
Show resolved
Hide resolved
...te-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py
Show resolved
Hide resolved
/test connector=bases/base-normalization
|
# tests: | ||
# - dbt_utils.expression_is_true: | ||
# expression: "double_array_data is not null" | ||
# - dbt_utils.expression_is_true: | ||
# expression: "DATA is not null" | ||
# - dbt_utils.expression_is_true: | ||
# expression: "\"column`_'with\"\"_quotes\" is not null" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you open a new issue to look closer into this later? thanks
/test connector=bases/base-normalization
|
/publish connector=bases/base-normalization
|
/publish connector=bases/base-normalization
|
@@ -115,6 +118,7 @@ def setup_mysql_db(self): | |||
] | |||
print("Executing: ", " ".join(commands)) | |||
subprocess.call(commands) | |||
time.sleep(120) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be here, it makes the test wait for two minutes before doing anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry Chris! I add this because in my Ubuntu Setup the mysql and postgres took longer to start. I created this: #6091
What
Closes #4099
How
Add a check when parent stream_name is equal to stream_name.
@ChristopheDuong this only solve for a 1 level nested conflict. Do you think should be something random to handle deeper conflicts?
Recommended reading order
x.java
y.python
Pre-merge Checklist
Expand the checklist which is relevant for this PR.
Connector checklist
airbyte_secret
in the connector's spec./gradlew :airbyte-integrations:connectors:<name>:integrationTest
./test connector=connectors/<name>
command as documented here is passing.README.md
docs/SUMMARY.md
if it's a new connectordocs/integrations/<source or destination>/<name>
.docs/integrations/...
. See changelog exampledocs/integrations/README.md
contains a reference to the new connector/publish
command described hereConnector Generator checklist
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes