Add normalization test cases #2992

ChristopheDuong · 2021-04-20T14:28:24Z

What

Add a test case to normalization (#2750) where name collisions happens on postgres

How

Describe the solution
No solution yet.

Pre-merge Checklist

Run integration tests
Publish Docker images

Recommended reading order

test.java
component.ts
the rest

ChristopheDuong · 2021-04-20T17:19:47Z

airbyte-integrations/bases/base-normalization/unit_tests/test_destination_name_transformer.py

@@ -171,6 +171,7 @@ def test_normalize_column_name(input_str: str, destination_type: str, expected:
        ("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii", "Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii"),
        # over the limit
        ("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"),
+        ("Aaaa_Bbbb_Cccc_Dddd_a_very_long_name_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"),


When truncated, this test case ends up with the exact same truncated name as the previous test...
So there is potential table name conflict!

nice catch! that's bad. how should we handle it?

it looks like in the implementation above you're throwing an error. is there anyway to handle this without throwing an error?

when it's finding a duplicate name, the code tries a second chance by adding a 3 character hash of the full stream name (without truncation) at the end of the truncated stream name.

If that second chance still has collisions (because the 3 character hash was already used too? or maybe because the catalog somehow contains the exact same stream name twice) then it'll fail
(I assume the second chance with the hash should have low probability to fail though)

We could make it not throw an error by appending a random suffix character or number (_1, _2, _3 etc) instead of erroring but it's not deterministic anymore ...

cgardens

nice catch. is there a way we can solve this collision without throwing an exception?

are all the files being added in this PR actually used? was having trouble tracking that.

cgardens · 2021-04-21T17:04:31Z

airbyte-integrations/bases/base-normalization/unit_tests/test_destination_name_transformer.py

@@ -171,6 +171,7 @@ def test_normalize_column_name(input_str: str, destination_type: str, expected:
        ("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii", "Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii"),
        # over the limit
        ("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"),
+        ("Aaaa_Bbbb_Cccc_Dddd_a_very_long_name_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"),


nice catch! that's bad. how should we handle it?

cgardens · 2021-04-21T17:08:21Z

airbyte-integrations/bases/base-normalization/unit_tests/test_destination_name_transformer.py

@@ -171,6 +171,7 @@ def test_normalize_column_name(input_str: str, destination_type: str, expected:
        ("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii", "Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii"),
        # over the limit
        ("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"),
+        ("Aaaa_Bbbb_Cccc_Dddd_a_very_long_name_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"),


it looks like in the implementation above you're throwing an error. is there anyway to handle this without throwing an error?

cgardens · 2021-04-21T17:10:13Z

...ations/bases/base-normalization/unit_tests/resources/edge_cases_catalog_expected_nested.json

@@ -0,0 +1,3 @@
+{


is this file being used?

the test was originally made for testing nested streams in the catalog but in this scenario, it's not using any nesting so it needs an empty nested file for expected nested streams

So for the test to pass, it is needed yes

cgardens · 2021-04-21T17:10:17Z

...ons/bases/base-normalization/unit_tests/resources/edge_cases_catalog_expected_top_level.json

@@ -0,0 +1,13 @@
+{


is this file being used?

yes it is used and refer as the parameter "edge_cases_catalog" in airbyte-integrations/bases/base-normalization/unit_tests/test_stream_processor.py

cgardens · 2021-04-21T17:10:21Z

.../base-normalization/unit_tests/resources/edge_cases_catalog_expected_top_level_postgres.json

@@ -0,0 +1,12 @@
+{


is this file being used?

ChristopheDuong added 2 commits April 20, 2021 16:24

Add normalization test cases

bc210bb

Fix new normalization test on name collisions

d04f898

ChristopheDuong commented Apr 20, 2021

View reviewed changes

ChristopheDuong marked this pull request as ready for review April 20, 2021 17:20

auto-assign bot requested review from michel-tricot and sherifnada April 20, 2021 17:20

Raise exception if collisions persist

75f988a

cgardens requested changes Apr 21, 2021

View reviewed changes

cgardens approved these changes Apr 22, 2021

View reviewed changes

ChristopheDuong merged commit 07a45df into master Apr 22, 2021

ChristopheDuong deleted the chris/normalization-test-stream-name-collisions branch April 22, 2021 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add normalization test cases #2992

Add normalization test cases #2992

ChristopheDuong commented Apr 20, 2021 •

edited

Loading

ChristopheDuong Apr 20, 2021 •

edited

Loading

cgardens Apr 21, 2021

cgardens Apr 21, 2021

ChristopheDuong Apr 21, 2021 •

edited

Loading

cgardens left a comment

cgardens Apr 21, 2021

cgardens Apr 21, 2021

cgardens Apr 21, 2021

ChristopheDuong Apr 21, 2021

cgardens Apr 21, 2021

ChristopheDuong Apr 21, 2021

cgardens Apr 21, 2021

Add normalization test cases #2992

Add normalization test cases #2992

Conversation

ChristopheDuong commented Apr 20, 2021 • edited Loading

What

How

Pre-merge Checklist

Recommended reading order

ChristopheDuong Apr 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChristopheDuong Apr 21, 2021 • edited Loading

Choose a reason for hiding this comment

cgardens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChristopheDuong commented Apr 20, 2021 •

edited

Loading

ChristopheDuong Apr 20, 2021 •

edited

Loading

ChristopheDuong Apr 21, 2021 •

edited

Loading