Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add RedashDashboardExtractor for extracting dashboards from redash.io #300

Merged
merged 3 commits into from Jul 7, 2020

Conversation

jonhehir
Copy link
Contributor

@jonhehir jonhehir commented Jul 2, 2020

Summary of Changes

Gets dashboard metadata from Redash. Quoting from the README:

The included RedashDashboardExtractor provides support for extracting basic metadata for Redash dashboards (dashboard name, owner, URL, created/updated timestamps, and a generated description) and their associated queries (query name, URL, and raw query). It can be extended with a configurable table parser function to also support extraction of DashboardTable metadata.

Note: DashboardUsage and DashboardExecution metadata are not supported in this extractor, as these concepts are not supported by the Redash API.

Tests

Added unit tests to cover the new code.

Documentation

Added a section about RedashDashboardExtractor to README.md.

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

@jonhehir jonhehir mentioned this pull request Jul 2, 2020
@codecov-commenter
Copy link

codecov-commenter commented Jul 2, 2020

Codecov Report

Merging #300 into master will increase coverage by 1.13%.
The diff coverage is 94.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #300      +/-   ##
==========================================
+ Coverage   73.17%   74.30%   +1.13%     
==========================================
  Files         103      105       +2     
  Lines        4328     4492     +164     
  Branches      384      419      +35     
==========================================
+ Hits         3167     3338     +171     
+ Misses       1053     1049       -4     
+ Partials      108      105       -3     
Impacted Files Coverage Δ
...tor/dashboard/redash/redash_dashboard_extractor.py 92.45% <92.45%> (ø)
...tractor/dashboard/redash/redash_dashboard_utils.py 98.18% <98.18%> (ø)
databuilder/rest_api/base_rest_api_query.py 94.11% <100.00%> (+1.26%) ⬆️
databuilder/task/neo4j_staleness_removal_task.py 85.00% <0.00%> (+0.83%) ⬆️
databuilder/publisher/neo4j_csv_publisher.py 84.43% <0.00%> (+0.94%) ⬆️
databuilder/extractor/mssql_metadata_extractor.py 95.52% <0.00%> (+1.49%) ⬆️
...abuilder/extractor/snowflake_metadata_extractor.py 95.31% <0.00%> (+1.56%) ⬆️
...tabuilder/extractor/postgres_metadata_extractor.py 95.16% <0.00%> (+1.61%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21a763a...8d78d89. Read the comment docs.

@feng-tao
Copy link
Member

feng-tao commented Jul 2, 2020

thanks @jonhehir , I cc a few other people and will take a look next week once back from holiday. A few questions I have in mind before looking at the pr:

  • I wonder whether the redash API is well documented. If so, it would be good to link its doc on which API the extractor tries to call to get the result.
  • Does the API be available after certain versions? Does Asana have any customization on the endpoints? It may be to document on which versions(afterwards) the extractors are targeting. For context, we use superset at Lyft, which is also viz tool like redash. But our preview client hits some internal endpoints which our actual preview client can't be open sourced.
  • Does redash support RBAC? And does it require the airflow task which executes the extractor to impersonate certain users to get the result?

Also we plan to remove the py27 test and make it full py3 .

Thanks a lot.

@jonhehir
Copy link
Contributor Author

jonhehir commented Jul 6, 2020

Thanks, @feng-tao! Some brief responses are below:

I wonder whether the redash API is well documented. If so, it would be good to link its doc on which API the extractor tries to call to get the result.

The Redash API docs are incomplete. There is some basic documentation that includes the two endpoints that this relies on: GET /api/dashboards and GET /api/dashboards/<dashboard_slug>. If you'd like, I can list these endpoints in the documentation.

Does the API be available after certain versions? Does Asana have any customization on the endpoints? It may be to document on which versions(afterwards) the extractors are targeting. For context, we use superset at Lyft, which is also viz tool like redash. But our preview client hits some internal endpoints which our actual preview client can't be open sourced.

We're only relying on the public API for this extractor, nothing internal. (We do have some internal extractors for additional information like recent dashboard view count.)

As for versions, that's a great question. We have tested this on Redash 8 and expect it to work for Redash 9 as well.

Does redash support RBAC? And does it require the airflow task which executes the extractor to impersonate certain users to get the result?

Yes. The type of authentication we're using here is a user-based authentication in Redash. The extractor needs to provide the API key of a user that has access to any dashboards and queries you want to index. We are not impersonating individuals other than a single API user for extraction.

On a related note, we did not build support for dashboard previews because of this RBAC concern. Some users may not have full access to every dashboard that is listed, so we did not want to build a preview that would reveal an image of the dashboard to all users.

feng-tao
feng-tao previously approved these changes Jul 7, 2020
Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jonhehir for the info. I think it would be good to update the doc about which endpoint it relies on as well as the redash version that it tests against. But besides that, the code looks great!

Let me know if you have the chance to update the doc or we could commit the pr as it is. And we could follow up on the doc in a different pr.

@feng-tao
Copy link
Member

feng-tao commented Jul 7, 2020

also it would be super awesome if you could write an engineering blog post about the redash integration :)

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jonhehir !

@feng-tao feng-tao merged commit f1b0dfa into amundsen-io:master Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants