New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/bigquery): BigQuery Owner Label to Datahub Ownership #10047
feat(ingestion/bigquery): BigQuery Owner Label to Datahub Ownership #10047
Conversation
owner_lable_character_mapping: Dict[str, str] = Field( | ||
default={}, | ||
description="A mapping of bigquery owner label character to datahub owner character." | ||
"Provided mapping will get added to default mapping.", | ||
) | ||
|
||
owner_key_pattern: str = Field( | ||
default="_owner_email", | ||
description="A pattern which defines what identifies an owner label.", | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at the BigQuery datahub source docs and seeing capture_table_labels_as_tags
and think we should model this to be similar. So something like:
capture_table_owner_label_as_owner:
enabled: true
# we want them to be able to define a mapping, so our BQ label -> datahub owner ingestion is generic
label_character_mapping:
- "_": "."
- "-": "@"
- "__": "_"
- "--": "-"
- "_-": "#"
- "-_": " "
# we want them to be able to define what identifies an owner label, so it is also generic
owner_key_pattern: "_owner_email"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ownership_from_tags
@@ -111,5 +156,7 @@ def transform_aspect( | |||
), | |||
) | |||
) | |||
|
|||
return None | |||
if not self.config.replace_existing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sense - we should always return aspect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even I thought so. But previous behavior was removing tag metadata from ingestion after extracting ownership from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functionality looks good, just some code cleanup things remaining
metadata-ingestion/src/datahub/ingestion/transformer/extract_ownership_from_tags.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/transformer/extract_ownership_from_tags.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/transformer/extract_ownership_from_tags.py
Outdated
Show resolved
Hide resolved
…into BigQuery-Owner-Label-to-Datahub-Ownership
metadata-ingestion/src/datahub/ingestion/transformer/extract_ownership_from_tags.py
Outdated
Show resolved
Hide resolved
``` | ||
|
||
So if we have input dataset tag like | ||
- `urn:li:tag:dataset_owner_email:abc@email.com` | ||
- `urn:li:tag:dataset_owner_email:xyz@email.com` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these have dataset_owner_email
but the example is tag_pattern: "owner_email:"
- why the mismatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because now the config is tag_pattern and not tag_prefix. We can provide pattern in this way as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to approve and merge this for now, but there are multiple issues that need to be addressed in a follow up
Checklist