Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Proposal: Tableau dashboard extractors for databuilder #565

Closed
ccarterlandis opened this issue Jul 17, 2020 · 2 comments
Closed
Labels
status:needs_votes Issue or bug fix that needs support from the community to be considered type:feature A new feature request

Comments

@ccarterlandis
Copy link
Contributor

Expected Behavior or Use Case

With the recent addition of dashboard support for Amundsen, it's now possible to build extractors for Tableau dashboards and visualizations. Since Tableau is widely used as a data visualization and analysis tool, having the ability to index these Tableau dashboards inside Amundsen gives better context for how the data is actually being used and enables users to discover and share dashboards and visualizations that have already been built.

Service or Ingestion ETL

These extractors would be implemented in the amundsendatabuilder module. Currently, the extractors would not require changes to any other Amundsen module.

Possible Implementation

This proposal is currently a work in progress. You can track the progress here: amundsen-io/amundsendatabuilder#303

Overview

The extractors are built around Tableau workbooks being the Amundsen equivalent of a dashboard. The extractors utilize Tableau's Metadata API to query information about workbooks and their associated entities like projects (dashboard_groups), custom SQL queries (dashboard_query), and sheets/dashboards within the workbooks (dashboard_chart).

Relations between the Amundsen dashboard model and Tableau

Luckily, the Tableau Metadata API uses a GraphQL schema for querying, so retrieving the data and loading it into Neo4j's GraphQL schema is relatively straightforward. However, there are a few notable differences in the conceptual models that need to be addressed:

  • In Amundsen, dashboard charts belong to dashboard queries; that is, each chart is built on a query. However, in Tableau, the closest thing to a chart is a "sheet", which is not necessarily built on a custom SQL query. There are a few options for resolving this, which could include categorizing the same empty query object for Tableau sheets not built on a query, or updating the model to restructure the hierarchy between dashboard charts and dashboard queries.
  • While Tableau does support multi-level projects, for simplicity's sake the extractor currently only uses the top level projects to create the dashboard groups. Is this desirable behavior?
  • Usage statistics are available only at the sheet level, and are only available through Tableau's REST API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htm). Should these be aggregated by workbook and count towards workbook views/frequent users, or ignored?

Technical notes

  • The Tableau dashboard table extractor will likely be specific to each implementation of Amundsen. Currently, there are no solid plans to build an open source version of this extractor, but I would be interested in discussing what a generic version might look like if that would prove useful.
  • Most of the data needed is available through the Tableau Metadata API, but some, like project descriptions, dashboard previews, and usage statistics are only available through the Tableau REST API. The two APIs share authorization tokens, so there is some element of re-usability that can be abstracted out. While there is currently no solid plan for how these calls to the Tableau REST API will hook into the rest of the extractors, we would like to include this data in the final integration, so any ideas are welcome.

Context

@alevene and I are building this integration on behalf of Gusto. For Gusto's use case, we are interested in exposing Tableau resources in Amundsen to better facilitate the discovery of existing dashboard resources, so we can avoid duplicate dashboard development and to provide background on the provenance/lineage of the dashboards.

@feng-tao feng-tao added keep fresh Disables stalebot from closing an issue Project: All status:needs_votes Issue or bug fix that needs support from the community to be considered type:feature A new feature request labels Jul 17, 2020
@dorianj
Copy link
Contributor

dorianj commented Mar 10, 2021

This is implemented, right? Please re-open if I missed something major; if we want to make enhancements, new smaller tickets would be good.

@dorianj dorianj closed this as completed Mar 10, 2021
@ccarterlandis
Copy link
Contributor Author

Yep, this is implemented - my internship at Gusto ended before before the PR got merged, so I totally forgot about this issue. Sorry about that! I think you are right to close it 🚀

dorianj added a commit to dorianj/amundsen that referenced this issue Apr 25, 2021
Signed-off-by: Dorian Johnson <2020@dorianj.net>
feng-tao pushed a commit that referenced this issue May 7, 2021
Signed-off-by: Dorian Johnson <2020@dorianj.net>
hansadriaans pushed a commit to DataChefHQ/amundsen that referenced this issue Jun 30, 2022
Signed-off-by: Dorian Johnson <2020@dorianj.net>
@Golodhros Golodhros removed the keep fresh Disables stalebot from closing an issue label Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:needs_votes Issue or bug fix that needs support from the community to be considered type:feature A new feature request
Projects
None yet
Development

No branches or pull requests

4 participants