-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/unity-catalog): Support external S3 lineage #9025
feat(ingest/unity-catalog): Support external S3 lineage #9025
Conversation
s3_name = strip_s3_prefix(s3_uri) | ||
|
||
if s3_name.endswith("/"): | ||
s3_name = s3_name[:-1] | ||
|
||
name, extension = os.path.splitext(s3_name) | ||
|
||
if extension != "": | ||
if remove_extension and extension != "": | ||
extension = extension[1:] # remove the dot | ||
return f"urn:li:dataset:(urn:li:dataPlatform:s3,{name}_{extension},{env})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of underscore it would be better to just URL encode the dot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general maybe we should url encode the urn parts
s3_name = strip_s3_prefix(s3_uri) | ||
|
||
if s3_name.endswith("/"): | ||
s3_name = s3_name[:-1] | ||
|
||
name, extension = os.path.splitext(s3_name) | ||
|
||
if extension != "": | ||
if remove_extension and extension != "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if remove_extension and extension != "": | |
if remove_extension and extension: |
@treff7es can you explain why
make_s3_urn
removes the period in the extension. I had to undo that for external lineage to match urns, as the S3 source creates urns with periods in them. Do you know if we can make this consistent?Checklist