-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest): Create Browse Paths V2 under flag #8120
feat(ingest): Create Browse Paths V2 under flag #8120
Conversation
@@ -478,10 +482,14 @@ def strip_types(field_path: str) -> str: | |||
BrowsePathsClass([f"/prod/datahub/entities/{entity_display_name}"]), | |||
], | |||
) | |||
dataset.browse_path_v2 = BrowsePathsV2Class( # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think dataset.browse_path_v2
exists, and it's not part of the snapshot class. You'll need to emit a separate MCPW
events.append(mcp) | ||
return events | ||
yield MetadataChangeProposalWrapper( | ||
entityUrn=d.urn, aspect=d.browse_path_v2 # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a huge fan of the type ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sooo how much do we care about this modeldocgen class? Originally I tried changing DatasetSnapshot.pdl
to also allow BrowsePathV2, but ran into issues with that... may just be the fact I've never changed a pdl and don't know the steps. So this was my second option, so I didn't have to alter too much code -- just store it on the object and produce the MCP later. Alternatively I can change this to only produce MCPs rather than the MCE but honestly I have no clue what this class does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped to MCPs
] | ||
browse_path_processor = partial( | ||
auto_browse_path_v2, | ||
[s for s in browse_path_drop_dirs if s is not None], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it work to simply drop n
elements from the original browse path - why do we have this whole dynamic system here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we include env + platform, sometimes just platform
@@ -208,3 +220,13 @@ def auto_browse_path_v2( | |||
path=[BrowsePathEntryClass(id=urn, urn=urn) for urn in paths[node]] | |||
), | |||
).as_workunit() | |||
processed_urns.add(node) | |||
|
|||
# Yield browse paths v2 based on browse paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Yield browse paths v2 based on browse paths | |
# Yield browse paths v2 based on browse paths v1 (legacy) ones. This only happens if the entity isn't part of a container hierarchy. |
@@ -45,6 +45,7 @@ class PipelineConfig(ConfigModel): | |||
source: SourceConfig | |||
sink: DynamicTypedConfig | |||
transformers: Optional[List[DynamicTypedConfig]] | |||
flags: Optional[Dict[str, bool]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flags should be a strongly typed config object, not a dict
it should also have a real default, not None
we should also add a comment saying that this is an experimental feature and we don't guarantee backwards compat on it
also, I'm thinking that maybe this belongs under SourceConfig and not at the PipelineConfig level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can just drop backwards compatibility guarantees that's super nice.
Do we not think we could have sink-related or transformer-related or pipeline-related flags? I like using the most generic version possible, until we have good reason to do differently
@@ -123,7 +123,7 @@ def check_golden_file( | |||
golden_path: Union[str, os.PathLike], | |||
ignore_paths: Optional[List[str]] = None, | |||
) -> None: | |||
update_golden = pytestconfig.getoption("--update-golden-files") | |||
update_golden = pytestconfig.getoption("--update-golden-files") or True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits around comments but otherwise lgtm
def get_config(self) -> Optional[ConfigModel]: | ||
return None | ||
return getattr(self, "config", None) or getattr(self, "source_config", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also check __config
since we use that in powerbi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, maybe let's add a comment explaining the context around this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll manually override for powerbi to keep things clean... do not want to encourage use of __config
for other / new sources. Will add a quick comment
class FlagsConfig(ConfigModel): | ||
"""Experimental flags for the ingestion pipeline. | ||
|
||
As this is an experimental feature, we do not guarantee backwards compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is an experimental feature, we do not guarantee backwards compatibility. | |
As ingestion flags are an experimental feature, we do not guarantee backwards compatibility. |
Checklist