feat(ingest): extract powerbi endorsements to tags #6638

looppi · 2022-12-05T12:56:37Z

Extract endorsements from workspace scan result to datasets, reports or dashboards.

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

github-actions · 2022-12-05T14:11:55Z

Unit Test Results (build & test)

621 tests 617 ✔️ 16m 8s ⏱️
157 suites     4 💤
157 files     0 ❌

Results for commit e8d4780.

github-actions · 2022-12-05T14:12:02Z

Unit Test Results (metadata ingestion)

      8 files       8 suites 59m 11s ⏱️
  767 tests   765 ✔️ 2 💤 0 ❌
1 536 runs 1 531 ✔️ 5 💤 0 ❌

Results for commit e8d4780.

siddiquebagwan · 2023-01-12T13:25:29Z

metadata-ingestion/docs/sources/powerbi/powerbi_recipe.yml

@@ -16,6 +16,8 @@ source:
    client_secret: bar
    # Enable / Disable ingestion of user information for dashboards
    extract_ownership: true
+    # Enable / Disable ingestion of endorsements
+    extract_endorsements_to_tags: false


we can keep it true by default

As per John's comment, I'll keep this disabled by default.

siddiquebagwan · 2023-01-12T13:31:22Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/proxy.py

@@ -633,57 +673,6 @@ def get_pages_by_report(
            for raw_instance in response_dict["value"]
        ]

-    def get_reports(


Any specific reason to delete this method?

It wasn't called anymore as the reports were parsed from the workspace scan. No other reason, basically the logic moved to the part where the scan result is handled.

The previous implementation was gathering the reports through the REST API, the current the API implementation doesn't include endorsementDetails in the payload, only the workspace scan has that data. That's the main reason why I ended up rewriting the logic.

Highly frustrating to have some data in the workspace scan and some data in the REST API.

siddiquebagwan · 2023-01-12T13:33:13Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/powerbi.py

@@ -422,6 +443,16 @@ def chart_custom_properties(dashboard: PowerBiAPI.Dashboard) -> dict:
        if owner_mcp is not None:
            list_of_mcps.append(owner_mcp)

+        if self.__config.extract_endorsements_to_tags and dashboard.tags:


Could we go for the append method? where append the MCP if flag is enabled to avoid checking at multiple places

Meaning something like the tags in the dataclasses are populated only when extract_endorsements_to_tags==True?

Then the code would look something like this, right?

if dashboard.tags: list_of_mcps.append( self.new_mcp( Constant.DASHBOARD, dashboard_urn, Constant.GLOBAL_TAGS, self.transform_tags(dashboard.tags) tags, ) )

jjoyce0510 · 2023-01-13T03:55:50Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/config.py

@@ -139,6 +140,10 @@ class PowerBiAPIConfig(EnvBasedSourceConfigBase):
    extract_lineage: bool = pydantic.Field(
        default=True, description="Whether lineage should be ingested"
    )
+    # Enable/Disable extracting endorsements to tags
+    extract_endorsements_to_tags: bool = pydantic.Field(
+        default=True, description="Whether to extract endorsements to tags"


Can we default this to "False"? Most people may be surprised by this behavior, since this can overwrite existing Tags defined in DataHub

jjoyce0510 · 2023-01-13T03:56:23Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/config.py

@@ -139,6 +140,10 @@ class PowerBiAPIConfig(EnvBasedSourceConfigBase):
    extract_lineage: bool = pydantic.Field(
        default=True, description="Whether lineage should be ingested"
    )
+    # Enable/Disable extracting endorsements to tags
+    extract_endorsements_to_tags: bool = pydantic.Field(
+        default=True, description="Whether to extract endorsements to tags"


Please warn the author that enabling this will potentially overwrite Tags defined inside the DataHub application

You can also mention that there's an AddTagTransformer that supports semantic addition of tags (e.g. PATCH instead of REPLACE)

jjoyce0510 · 2023-01-13T03:57:56Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/proxy.py

+        dashboards: List[PowerBiAPI.Dashboard] = []
+        dashboard_data = self.get_dashboard_data(workspace)
+
+        for scanned_dashboard in scan_result["dashboards"]:


Please extract into a separate method. This is getting too long

jjoyce0510 · 2023-01-13T03:58:24Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/proxy.py

+        for scanned_dashboard in scan_result["dashboards"]:
+            # Iterate through response and create a list of PowerBiAPI.Dashboard
+            dashboard_details = next(
+                (x for x in dashboard_data if x["id"] == scanned_dashboard["id"]), None


Is ID guaranteed to be present? Should would be defensive on this check?

jjoyce0510 · 2023-01-13T03:58:50Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/proxy.py

+            # Iterate through response and create a list of PowerBiAPI.Dashboard
+            dashboard = PowerBiAPI.Dashboard(
+                id=scanned_dashboard.get("id"),
+                isReadOnly=dashboard_details.get("isReadOnly"),


isReadOnly guaranteed to be there? Always?

jjoyce0510 · 2023-01-13T04:00:43Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/proxy.py

                workspace_id=workspace.id,
                workspace_name=workspace.name,
                tiles=[],
                users=[],
+                tags=self.parse_endorsement(


Is this a safe "get"? Will endorsementDetails always be there?

jjoyce0510 · 2023-01-13T04:01:58Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/proxy.py

@@ -877,6 +868,45 @@ def init_dashboard_tiles(workspace: PowerBiAPI.Workspace) -> None:

            return None

+        def handle_report(report_data: dict) -> Optional[PowerBiAPI.Report]:


Why is this change required?

jjoyce0510

Overall, I'd recommend reducing the surface area of this PR. It changes too many things at once :) Any way to break it into a few smaller chunks?

Cheers
John

looppi · 2023-01-16T13:30:26Z

I've refactored the implementation a bit and brought back the original implementation to handle fetching of reports. Now the implementation collects endorsements by id in a dict to the Workspace dataclass and those dicts are used when parsing dashboards and reports. As the datasets are parsed from the scan result directly, there was no need to implement that kind of handling, rather than use the scan result data directly.

With this change, there's no need to separate [App] reports and dashboards from the scan result and it "stays true" to the original implementation.

What do you think?

jjoyce0510 · 2023-01-18T03:44:22Z

metadata-ingestion/docs/sources/powerbi/powerbi_pre.md

+
+By default, extracting endorsement information to tags is disabled. The feature may be useful if organization uses [endorsements](https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-endorse-content) to identify content quality.
+
+Please note that the default implementation overwrites tags for the ingested entities, if you need to preserve existing tags, consider using a [transformer](../../../../metadata-ingestion/docs/transformer/dataset_transformer.md#simple-add-dataset-globaltags) with `semantics: PATCH` tags instead of `OVERWRITE`.


jjoyce0510

Changes look great!

Going to give this a big old "LGTM". Thanks for the hard work!

)

feat(ingest): extract powerbi endorsements to tags

e8d4780

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Dec 5, 2022

jjoyce0510 requested a review from siddiquebagwan-gslab December 5, 2022 21:22

anshbansal added the community-contribution PR or Issue raised by member(s) of DataHub Community label Dec 6, 2022

looppi added 3 commits January 9, 2023 15:38

Merge branch 'master' into powerbi-endorsement-to-tag

fbb8181

fix: formatting and type fixes

9bdbe96

Merge branch 'master' into powerbi-endorsement-to-tag

ab98c02

looppi changed the title ~~DRAFT feat(ingest): extract powerbi endorsements to tags~~ feat(ingest): extract powerbi endorsements to tags Jan 10, 2023

siddiquebagwan suggested changes Jan 12, 2023

View reviewed changes

fix: change endorsement data extraction default to true

ccc8bab

jjoyce0510 reviewed Jan 13, 2023

View reviewed changes

siddiquebagwan mentioned this pull request Jan 13, 2023

feat(ingest) Add PowerBI Admin only connector #7009

Closed

5 tasks

looppi added 5 commits January 13, 2023 16:21

fix: endorsements to tags should be disabled by default

f5e0081

fix: reduce amount of changed code

c6bb78e

fix: simplify if statements for tag mce creation

d2f3f55

Merge branch 'master' into powerbi-endorsement-to-tag

587123a

fix: update golden file to match upstream changes

0d75e77

looppi added 3 commits January 16, 2023 16:25

fix: refactor to more readable format

7865f85

Merge branch 'master' into powerbi-endorsement-to-tag

d654421

docs: add alert about losing tags when using endorsements to tags

ad9ef1d

jjoyce0510 reviewed Jan 18, 2023

View reviewed changes

jjoyce0510 approved these changes Jan 18, 2023

View reviewed changes

jjoyce0510 merged commit 87b3a5d into datahub-project:master Jan 18, 2023

shirshanka pushed a commit to shirshanka/datahub that referenced this pull request Jan 18, 2023

feat(ingest): extract powerbi endorsements to tags (datahub-project#6638

9820927

)

ericyomi pushed a commit to ericyomi/datahub that referenced this pull request Jan 18, 2023

feat(ingest): extract powerbi endorsements to tags (datahub-project#6638

1ffcd89

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest): extract powerbi endorsements to tags #6638

feat(ingest): extract powerbi endorsements to tags #6638

looppi commented Dec 5, 2022

github-actions bot commented Dec 5, 2022

github-actions bot commented Dec 5, 2022

siddiquebagwan Jan 12, 2023

looppi Jan 12, 2023

looppi Jan 13, 2023

siddiquebagwan Jan 16, 2023

siddiquebagwan Jan 12, 2023

looppi Jan 12, 2023 •

edited

Loading

siddiquebagwan Jan 12, 2023

looppi Jan 12, 2023 •

edited

Loading

jjoyce0510 Jan 13, 2023

jjoyce0510 Jan 13, 2023

jjoyce0510 Jan 13, 2023

jjoyce0510 Jan 13, 2023

jjoyce0510 Jan 13, 2023

jjoyce0510 Jan 13, 2023

jjoyce0510 Jan 13, 2023

jjoyce0510 Jan 13, 2023

jjoyce0510 left a comment

looppi commented Jan 16, 2023

jjoyce0510 Jan 18, 2023

jjoyce0510 left a comment

		@@ -877,6 +868,45 @@ def init_dashboard_tiles(workspace: PowerBiAPI.Workspace) -> None:

		return None

		def handle_report(report_data: dict) -> Optional[PowerBiAPI.Report]:


		By default, extracting endorsement information to tags is disabled. The feature may be useful if organization uses [endorsements](https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-endorse-content) to identify content quality.

		Please note that the default implementation overwrites tags for the ingested entities, if you need to preserve existing tags, consider using a [transformer](../../../../metadata-ingestion/docs/transformer/dataset_transformer.md#simple-add-dataset-globaltags) with `semantics: PATCH` tags instead of `OVERWRITE`.

feat(ingest): extract powerbi endorsements to tags #6638

feat(ingest): extract powerbi endorsements to tags #6638

Conversation

looppi commented Dec 5, 2022

Checklist

github-actions bot commented Dec 5, 2022

Unit Test Results (build & test)

github-actions bot commented Dec 5, 2022

Unit Test Results (metadata ingestion)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

looppi Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

looppi Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjoyce0510 left a comment

Choose a reason for hiding this comment

looppi commented Jan 16, 2023

Choose a reason for hiding this comment

jjoyce0510 left a comment

Choose a reason for hiding this comment

looppi Jan 12, 2023 •

edited

Loading

looppi Jan 12, 2023 •

edited

Loading