Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use Scarf Gateway for Superset helm charts/Docker compose downloads #24432

Merged

Conversation

arjundevarajan
Copy link
Contributor

@arjundevarajan arjundevarajan commented Jun 16, 2023

SUMMARY

This PR updates the Superset configuration for helm charts and Docker compose to fetch Superset containers via a Scarf endpoint, so that Superset maintainers can collect basic de-identified download and adoption metrics. It does not affect where the containers are being hosted, as Scarf is only redirecting traffic back to Docker Hub.

This change was suggested by Superset maintainers in direct discussions.

Scarf purges PII and provides aggregated statistics. Superset users can easily opt out of analytics in various ways documented here.

TESTING INSTRUCTIONS

To test this, download Apache Superset using the new endpoint (e.g. docker pull apachesuperset.docker.scarf.sh/apache/superset) and verify that the apache/superset container downloads without issue.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • [ x ] Introduces new feature or API
  • Removes existing feature or API

@pull-request-size pull-request-size bot added size/S and removed size/XS labels Jul 7, 2023
@villebro
Copy link
Member

villebro commented Jul 8, 2023

Hey @arjundevarajan it seems you need to fix lint errors, let me know if you need help

@pull-request-size pull-request-size bot added size/M and removed size/S labels Jul 8, 2023
@rusackas
Copy link
Member

rusackas commented Jul 8, 2023

Hey @arjundevarajan it seems you need to fix lint errors, let me know if you need help

I think it was just the helm chart version. I bumped it 🤞

@rusackas
Copy link
Member

rusackas commented Jul 8, 2023

OK, CI is clean. I added some documentation on this PR, so if any reviewer(s) could give that a quick 👀 it would be appreciated. We want to make sure we're transparent enough about the telemetry being added, and how to opt out.

@rusackas rusackas added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Jul 8, 2023
:::

:::note
Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telmetry data to better understand and support the need for patch versions of Sueprset. Scarf purges PII and provides aggregated statistics. Superset users can easily opt out of analytics in various ways documented [here](https://docs.scarf.sh/gateway/#do-not-track). However, if you wish to opt-out of this in your Docker-based installation, you can simply edit your `docker-compose.yml` or `docker-compose-non-dev.yml` file and remove `apachesuperset.docker.scarf.sh/` from the `x-superset-image` setting, so that it's simply pulling `apache/superset:${TAG:-latest-dev}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjundevarajan and @rusackas would it be safer to make it opt-out by default? Granted one might not get the same scale of telemetry data, but it feels significantly less intrusive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @john-bodley mostly chiming in to agree with @rusackas that opting for the non-default option will significantly decrease the amount of useful data that's being collected. It should be reemphasized that all of this data is de-identified no matter what, and that the ASF has approved Scarf as a verified external service provider in the past for other ASF projects (see Privacy Policy here), which have deployed Scarf live to their projects for several years now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arjundevarajan for the context.

Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I first approved this and then added this comment and given I can't retract my approval AFAIK I opted for "Request changes".

@rusackas
Copy link
Member

@john-bodley Given that the approach had lazy consensus on the dev@ list, and the fact that there's documentation (and links to further details) in various places, I think it's safe to fo with opt-out. We can take this to the dev@ list again if it warrants further discussion, but I'm optimistic that when this makes it into a release, we'll have further changes to raise awareness about its existence and how to opt-out, on the wiki, in release notes, in the change log, etc. The main reason I'd advocate for this approach is that if it's opt-in, I suspect that we'll garner very little telemetry at all. I think this sort of telemetry is the norm in the industry at this point, and Scarf is used in other Apache projects as well. Let me know if you think this makes sense, or if this warrants widening the net on the discussion.

@rusackas rusackas added the v2.1 label Jul 18, 2023
Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking per #24432 (comment).

@craig-rueda craig-rueda merged commit 2b0ffb0 into apache:master Jul 19, 2023
@eschutho eschutho removed the v2.1 label Jul 19, 2023
@rusackas rusackas deleted the scarf-docker-compose-helm-downloads branch July 21, 2023 15:49
@mistercrunch mistercrunch added 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants