-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Remove hash when it is not necessary from normalization outputs #3704
Conversation
@@ -29,6 +29,7 @@ | |||
import pkgutil | |||
import shutil | |||
from enum import Enum | |||
from typing import Any, Dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this file are made to comply to MyPy...
/test connector=bases/base-normalization
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job Chris! Makes lots of sense. Only one comment from me.
I appreciated the great PR description and comments and the use of MyPy. Made much easier to review.
Feel free to merge whenever.
|
||
class TableNameRegistry: | ||
""" | ||
A registry object that records table names being used during the run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! Definitely easier for others to follow with this!
...integrations/bases/base-normalization/normalization/transform_catalog/table_name_registry.py
Outdated
Show resolved
Hide resolved
...integrations/bases/base-normalization/normalization/transform_catalog/table_name_registry.py
Outdated
Show resolved
Hide resolved
...integrations/bases/base-normalization/normalization/transform_catalog/table_name_registry.py
Show resolved
Hide resolved
...integrations/bases/base-normalization/normalization/transform_catalog/table_name_registry.py
Outdated
Show resolved
Hide resolved
|
||
stream_processor.collect_table_names() | ||
for conflict in tables_registry.resolve_names(): | ||
print(f"WARN: Resolving conflict: {conflict[0]}.{conflict[1]} from '{'.'.join(conflict[2])}' into {conflict[3]}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea to print out the changed names!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the future, these "exceptions/conflicts" could be collected and displayed as warnings in the UI when setting up normalization for the connection
# before reaching the database name length limit | ||
# 2 characters for signaling truncate with '__' and 6 others for generating unique strings | ||
TRUNCATE_RESERVED_SIZE: int = 8 | ||
# we keep 4 characters for 1 underscore and 3 characters for suffix (_ab1, _ab2, etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
/publish connector=bases/base-normalization
|
What
Closes #3523
Closes #2389
How
Introduce a two-stage loop through the catalog
In the future, we could separate the first stage when setting up the connection page, so the UI could have a chance to customize how to resolve naming conflicts. (The table name registry can then be imported from a file)
Pre-merge Checklist
Recommended reading order
Ignore .sql and .json files changes
airbyte-integrations/bases/base-normalization/normalization/transform_catalog/table_name_registry.py
airbyte-integrations/bases/base-normalization/normalization/transform_catalog/catalog_processor.py
airbyte-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py