Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/tableau): support column level lineage for custom sql #8466

Merged
merged 21 commits into from
Aug 1, 2023

Conversation

siddiquebagwan
Copy link
Contributor

@siddiquebagwan siddiquebagwan commented Jul 20, 2023

  • CLL extraction in custom data sources (self.emit_custom_sql_datasources())

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jul 20, 2023
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some high level questions:

  • it looks like we're calling parse_custom_sql twice, once for table lineage and once for column lineage - what's the reasoning behind that
  • if self.ctx.graph is unset, we should still be able to get table lineage using sqlglot_lineage. It might be necessary to use the underyling method instead of graph.parse_sql_lineage
  • does it make sense to fail if database info is missing? what was the logic behind that?

@siddiquebagwan siddiquebagwan marked this pull request as ready for review July 25, 2023 06:26
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had a few comments on the unit test, and seems like on of the tableau tests is still failing

"query": "SELECT user_id, source, user_source FROM (SELECT *, ROW_NUMBER() OVER (partition BY user_id ORDER BY __partition_day DESC) AS rank_ FROM invent_dw.UserDetail ) source_user WHERE rank_ = 1",
"isUnsupportedCustomSql": "true",
"database": {
"name": "production database",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change this to be my-bigquery-project

Given that it's set, shouldn't this be used as upstream_db, and eventually passed as default_db to sqlglot_lineage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mcp.entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:tableau,09988088-05ad-173c-a2f1-f33ba3a13d1a,PROD)"
)
sqlglot_lineage.return_value = SqlParsingResult( # type:ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to mock this - does it not produce correct results otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test_tableau_cll_ingest graph is None follow getting tested and in this test case I am providing Mock graph, so without mocking the sqlglot_lineage it returns empty array

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully follow this - let's talk about it next time we chat

mcp.entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:tableau,09988088-05ad-173c-a2f1-f33ba3a13d1a,PROD)"
)
sqlglot_lineage.return_value = SqlParsingResult( # type:ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully follow this - let's talk about it next time we chat

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Jul 27, 2023
@hsheth2
Copy link
Collaborator

hsheth2 commented Jul 27, 2023

@mohdsiddique looks like there's a merge conflict

@anshbansal anshbansal merged commit 547e1f4 into datahub-project:master Aug 1, 2023
43 checks passed
yoonhyejin pushed a commit that referenced this pull request Aug 24, 2023
…8466)

Co-authored-by: MohdSiddiqueBagwan <mohdsiddique.bagwan@gslab.com>
spadhi7 added a commit to spadhi7/datahub that referenced this pull request Aug 29, 2023
* tag 'v0.10.5': (222 commits)
  fix(test): increase siblings.js test stability (datahub-project#8542)
  feat(search): Allow aggregating on facets that are not explicitly part of default filter set (datahub-project#8540)
  fix(ui) Make multiple small updates to new search and browse (datahub-project#8524)
  feat(presto-on-hive): allow v1 fieldpaths in the presto-on-hive source (datahub-project#8474)
  feat(cli): Adds ability to upload recipes to DataHub's UI (datahub-project#8317)
  feat(browseV2): add browseV2 logic to system update (datahub-project#8506)
  fix(ingest/json-schema): convert non-string enums to strings (datahub-project#8479)
  feat(ingestion/tableau): support column level lineage for custom sql (datahub-project#8466)
  test(ingest): test case statements with sql parser (datahub-project#8437)
  feat(ingest/vertica): performance improvement and bug fixes (datahub-project#8328)
  ci: reduce git fetch depth (datahub-project#8473)
  fix(ingest): remove duplication of tags (datahub-project#8532)
  docs: small update to homepage (datahub-project#8483)
  fix(ingest): pin boto3-stubs in CI (datahub-project#8527)
  feat(siblings): hiding non-existant siblings in FE (datahub-project#8528)
  fix(ingest/build): Fix sagemaker mypy and flake8 issues (datahub-project#8530)
  feat(metrics): add metrics for aspect write and bytes (datahub-project#8526)
  feat(elasticsearch): allow bulk delete (datahub-project#8424)
  fix(ui): use locale lowercase when filtering columns of an entity in the lineage (datahub-project#8213)
  fix(auth): ignore case when comparing http headers (datahub-project#8356)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants