-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add normalization test cases #2992
Add normalization test cases #2992
Conversation
@@ -171,6 +171,7 @@ def test_normalize_column_name(input_str: str, destination_type: str, expected: | |||
("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii", "Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii"), | |||
# over the limit | |||
("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"), | |||
("Aaaa_Bbbb_Cccc_Dddd_a_very_long_name_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When truncated, this test case ends up with the exact same truncated name as the previous test...
So there is potential table name conflict!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! that's bad. how should we handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like in the implementation above you're throwing an error. is there anyway to handle this without throwing an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when it's finding a duplicate name, the code tries a second chance by adding a 3 character hash of the full stream name (without truncation) at the end of the truncated stream name.
If that second chance still has collisions (because the 3 character hash was already used too? or maybe because the catalog somehow contains the exact same stream name twice) then it'll fail
(I assume the second chance with the hash should have low probability to fail though)
We could make it not throw an error by appending a random suffix character or number (_1, _2, _3 etc) instead of erroring but it's not deterministic anymore ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch. is there a way we can solve this collision without throwing an exception?
are all the files being added in this PR actually used? was having trouble tracking that.
@@ -171,6 +171,7 @@ def test_normalize_column_name(input_str: str, destination_type: str, expected: | |||
("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii", "Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii"), | |||
# over the limit | |||
("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"), | |||
("Aaaa_Bbbb_Cccc_Dddd_a_very_long_name_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! that's bad. how should we handle it?
@@ -171,6 +171,7 @@ def test_normalize_column_name(input_str: str, destination_type: str, expected: | |||
("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii", "Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iii"), | |||
# over the limit | |||
("Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"), | |||
("Aaaa_Bbbb_Cccc_Dddd_a_very_long_name_Ffff_Gggg_Hhhh_Iiii", "Aaaa_Bbbb_Cccc_Dddd___e_Ffff_Gggg_Hhhh_Iiii"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like in the implementation above you're throwing an error. is there anyway to handle this without throwing an error?
@@ -0,0 +1,3 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this file being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test was originally made for testing nested streams in the catalog but in this scenario, it's not using any nesting so it needs an empty nested file for expected nested streams
So for the test to pass, it is needed yes
@@ -0,0 +1,13 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this file being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is used and refer as the parameter "edge_cases_catalog" in airbyte-integrations/bases/base-normalization/unit_tests/test_stream_processor.py
@@ -0,0 +1,12 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this file being used?
What
Add a test case to normalization (#2750) where name collisions happens on postgres
How
Describe the solution
No solution yet.
Pre-merge Checklist
Recommended reading order
test.java
component.ts