Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern #7842

Merged

Conversation

mayurinehate
Copy link
Collaborator

@mayurinehate mayurinehate commented Apr 18, 2023

Since the upstreams are not minted automatically, this change does not create ghost entities. Also, UI currently hides the non-minted upstreams.
If separate recipes are used to ingest from different snowflake databases in same snowflake account, one can set the below config to emit all upstreams :

validate_upstreams_against_patterns: false

However Having single recipe for a snowflake account remains the first recommended solution.

Also added tests for snowflake legacy lineage (default lineage method as of now.)

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Apr 18, 2023
@@ -228,36 +228,32 @@ def _populate_table_lineage(self):
def get_table_upstream_workunits(self, discovered_tables):
if self.config.include_table_lineage:
for dataset_name in discovered_tables:
if self._is_dataset_pattern_allowed(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern check is not required, as its already present on discovered_tables here.


def get_view_upstream_workunits(self, discovered_views):
if self.config.include_view_lineage:
for view_name in discovered_views:
if self._is_dataset_pattern_allowed(view_name, "view"):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern check is not required, as its already present on discovered_views here.

@@ -285,6 +285,30 @@ def default_query_results(query): # noqa: C901
),
}
for op_idx in range(1, NUM_OPS + 1)
] + [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is in in mocked query result for legacy lineage method.

@@ -307,7 +331,11 @@ def default_query_results(query): # noqa: C901
{
"upstream_object_name": "TEST_DB.TEST_SCHEMA.VIEW_1",
"upstream_object_domain": "VIEW",
}
},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is in in mocked query result for new optimised lineage method.

]
+ ( # This additional upstream is only for TABLE_1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is in in mocked query result for new optimised lineage method, table lineage only.

@mayurinehate mayurinehate changed the title feat(ingest/snowflake): emit all upstreams irrespective of recipe pattern feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern Apr 19, 2023
Copy link
Collaborator

@asikowitz asikowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support customers making multiple ingestion pipelines for the same source? I thought our solution here was to combine into one. However, if this is something we want to support, then this looks good to me

@mayurinehate
Copy link
Collaborator Author

Do we want to support customers making multiple ingestion pipelines for the same source? I thought our solution here was to combine into one. However, if this is something we want to support, then this looks good to me

The solution of combining recipes into one is the recommended solution, hence this config is disabled by default. The config option is only if for some reason, recipes per database need to be kept separate.

@asikowitz asikowitz merged commit 3212e74 into datahub-project:master Apr 24, 2023
iprentic pushed a commit that referenced this pull request Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants