Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better cache analytics trough custom User-Agent #50

Closed
zimbatm opened this issue Nov 2, 2023 · 3 comments
Closed

Better cache analytics trough custom User-Agent #50

zimbatm opened this issue Nov 2, 2023 · 3 comments

Comments

@zimbatm
Copy link

zimbatm commented Nov 2, 2023

As part of the work on the S3 cache GC, we discovered that most of our Fastly traffic is coming from North America. Could this be primarily GitHub traffic?

It would be interesting if this action could set the "user-agent-suffix" Nix setting with the following info:

  • the name of the action
  • which github org it's being used on

This would allow us to build a better understanding of the usage pattern of the cache.

I'm going to post this to the other Nix install actions as well.

@Hoverbear
Copy link
Contributor

I think the user agent suffix containing the action name is an interesting idea!

To determine if the traffic is from Github, perhaps you could consult the documentation for Github Actions Runners which includes instructions how to get a list of runner IPs.

which github org it's being used on

We very carefully do not track this information for quite legitimate privacy concerns. Perhaps you could show me some community discussion where the right people in the right places agreed this was appropriate?

Further, I believe some of the terms of our privacy policy may be relevant, as we do not share user data (such as their org) with third parties. We'd need to discuss the matter with legal if we were considering doing this.

On other Detsys projects we've previously tracked such data through one way hashing so we can get an idea of how many users we have without being able to unmask them.

@zimbatm
Copy link
Author

zimbatm commented Nov 2, 2023

Thanks for taking a look Hoverbear!

Regarding the GitHub Orgs, I started the conversation with the community over here: https://discourse.nixos.org/t/tracking-nixos-cache-usage-by-user-agent/34937

The Fastly logs are only available to a trusted set of users at the moment, but we can put in place something more formal if that would help.

A variant of the original idea would be to only add the GitHub Org to public repos. In that scenario one could argue that all the information is already available, it's just harder for us to match it.

@grahamc
Copy link
Member

grahamc commented Dec 2, 2023

Since our installer is covered under our privacy policy, we're not able to do this. We could come up with a process for it to be possible, but it would require a revision on our policy and contract between DetSys and the Foundation to accommodate, safeguarding our users' privacy. That would necessarily preclude general, public access to it.

Because of the complexities here, I'm going to close this for now. Feel free to open it again if you want to re-engage on the topic more broadly: I'm not entirely opposed to this, but there is a lot of work in terms of privacy, organization, and legal that we'd need to make happen first.

All is for not, though. I can give you a few datapoints. Looking at aggregate Magic Nix Cache data:

  • It sees relatively modest usage of about 4,000 runs per day.
  • On the average day, its users send approximately 2,000,000 narinfo requests upstream in aggregate.
  • Of those 2,000,000 narinfo requests, 20,000 convert into nars being downloaded.
  • The Magic Nix Cache cuts the typical users' narinfo requests down by half, and saves users approximately 18 minutes.

@grahamc grahamc closed this as completed Dec 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants