Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization: solve conflict when stream and field have same name #4557

Merged
merged 21 commits into from
Aug 11, 2021

Conversation

marcosmarxm
Copy link
Member

What

Closes #4099

How

Add a check when parent stream_name is equal to stream_name.
@ChristopheDuong this only solve for a 1 level nested conflict. Do you think should be something random to handle deeper conflicts?

Recommended reading order

  1. x.java
  2. y.python

Pre-merge Checklist

Expand the checklist which is relevant for this PR.

Connector checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • Secrets are annotated with airbyte_secret in the connector's spec
  • Credentials added to Github CI if needed and not already present. instructions for injecting secrets into CI.
  • Unit & integration tests added as appropriate (and are passing)
    • Community members: please provide proof of this succeeding locally e.g: screenshot or copy-paste acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • /test connector=connectors/<name> command as documented here is passing.
    • Community members can skip this, Airbyters will run this for you.
  • Code reviews completed
  • Documentation updated
    • README.md
    • docs/SUMMARY.md if it's a new connector
    • Created or updated reference docs in docs/integrations/<source or destination>/<name>.
    • Changelog in the appropriate page in docs/integrations/.... See changelog example
    • docs/integrations/README.md contains a reference to the new connector
    • Build status added to build page
  • Build is successful
  • Connector version bumped like described here
  • New Connector version released on Dockerhub by running the /publish command described here
  • No major blockers
  • PR merged into master branch
  • Follow up tickets have been created
  • Associated tickets have been closed & stakeholders notified

Connector Generator checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jul 6, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1002686506
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1002686506

@@ -74,7 +74,8 @@ def setup_test_path(request):
]
),
)
@pytest.mark.parametrize("destination_type", list(DestinationType))
# @pytest.mark.parametrize("destination_type", list(DestinationType))
@pytest.mark.parametrize("destination_type", [DestinationType.POSTGRES])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember to revert this

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jul 7, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1006670650
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1006670650

@marcosmarxm
Copy link
Member Author

@ChristopheDuong I added logic using json_path but look mysql, bigquery and snowflake failed because the json extraction with same name...

-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema
select
_airbyte_conflict_stream_name_hashid,
{{ json_extract('conflict_stream_name', ['conflict_stream_name']) }} as conflict_stream_name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"conflict_name": {"conflict_name": "hi"}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if somehow the macro would translate into something like json_extract('conflict_stream_name', ['conflict_stream_name', 'conflict_stream_name'])

maybe that would make it less ambiguous for the SQL engine (Destination)?

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jul 21, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1054076661
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1054076661

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jul 21, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1054085229
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1054085229

@ChristopheDuong
Copy link
Contributor

FYI I've been looking at fixing normalization integration tests here: #4910

(and you might need to merge master as I also included some other fixes recently too, on top of all the conflicts this generated...)

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jul 28, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1073474397
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1073474397

@marcosmarxm
Copy link
Member Author

@ChristopheDuong mysql is getting a cycle and I'm having problems to run locally the integration test for mysql.

Encountered an error:
Found a cycle: model.airbyte_utils.conflict_stream_name_conflict_stream_name --> model.airbyte_utils.conflict_stream_name___flict_stream_name_ab1 --> model.airbyte_utils.conflict_stream_name___flict_stream_name_ab2 --> model.airbyte_utils.conflict_stream_name___flict_stream_name_ab3

Redshift failed and locally is not working too, the problem I'm 100% it already worked before :(

@ChristopheDuong
Copy link
Contributor

Redshift failed and locally is not working too, the problem I'm 100% it already worked before :(

Yes, redshift is broken for me too

The test is failing on writing raw JSON records to redshift using only the destination-redshift connector... so it is not even starting to run anything about normalization yet.

@ChristopheDuong
Copy link
Contributor

ChristopheDuong commented Jul 28, 2021

/test connector=destination-redshift

🕑 destination-redshift https://github.com/airbytehq/airbyte/actions/runs/1074350881
❌ destination-redshift https://github.com/airbytehq/airbyte/actions/runs/1074350881

@@ -322,7 +323,7 @@ def generate_json_parsing_model(self, from_table: str, column_names: Dict[str, T
{{ field }},
{%- endfor %}
_airbyte_emitted_at
from {{ from_table }}
from {{ from_table }} as {{ table_alias }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YOu need to introduce table_alias in only the extract_json cte not all of them

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jul 28, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1075570298
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1075570298

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Aug 4, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1098266840
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1098266840

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Aug 9, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1114533773
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1114533773

@jrhizor jrhizor temporarily deployed to more-secrets August 9, 2021 22:21 Inactive
Comment on lines +47 to +53
# tests:
# - dbt_utils.expression_is_true:
# expression: "double_array_data is not null"
# - dbt_utils.expression_is_true:
# expression: "DATA is not null"
# - dbt_utils.expression_is_true:
# expression: "\"column`_'with\"\"_quotes\" is not null"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you open a new issue to look closer into this later? thanks

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Aug 11, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1120781700
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1120781700

@jrhizor jrhizor temporarily deployed to more-secrets August 11, 2021 16:14 Inactive
@github-actions github-actions bot added area/documentation Improvements or additions to documentation area/worker Related to worker labels Aug 11, 2021
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Aug 11, 2021

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1120946518
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1120946518

@jrhizor jrhizor temporarily deployed to more-secrets August 11, 2021 17:08 Inactive
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Aug 11, 2021

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1121048854
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1121048854

@jrhizor jrhizor temporarily deployed to more-secrets August 11, 2021 17:42 Inactive
@marcosmarxm marcosmarxm merged commit e4fe62f into master Aug 11, 2021
@marcosmarxm marcosmarxm deleted the marcosmarxm/conflict-hashid-normalization branch August 11, 2021 23:18
@@ -115,6 +118,7 @@ def setup_mysql_db(self):
]
print("Executing: ", " ".join(commands))
subprocess.call(commands)
time.sleep(120)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here, it makes the test wait for two minutes before doing anything

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Chris! I add this because in my Ubuntu Setup the mysql and postgres took longer to start. I created this: #6091

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation area/worker Related to worker normalization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Jira->Postgres connection: stream group column name conflicts with child table for column group object
3 participants