Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Collecting stats #2369
Following #2172, this is a proposal to discuss with the community on how we can get more information about deployed Traefik instances.
Why do the developers team need more info ?
As you may know, the Traefik core development team is quite small and as with a lot of open source projects, we lack time and resources. As a consequence, we have to carefully choose which tasks and features need our attention. As a result, we usually invest our time on features needed or requested by most of the community. In order to efficiently do this, we have to know how our community uses Traefik.
To give an illustrative example, we have no way to know which configuration backend is the most used or which configuration backend is used by the least number of people. What if we discover that we maintain a configuration backend that is largely unused? Knowing this, we could have allocated our resource on something more useful, especially since we have a lot of useful things we can work on ;)
Another example is that we have no idea of release adoption/implementation. Having this knowledge would help us to adapt our development cycle to benefit adoption. We don't need or want to release every month if users are waiting for 2 months before updating to the latest release.
We just need to know what is used, and what is not.
What we propose
Ideally, we would like statistics on the toml/flags Traefik configuration and Traefik versions our users are using. The toml/flags configuration would allow the development team to know what is used in Traefik and what is not.
Only export what's needed.
We already use a mechanism to export the whole configuration when using the
We could reuse this in this stats collection mechanism.
What's great with this solution is that exported configuration fields are hard-coded. Each time a new field will be added in the configuration, by default, it will not be exported. We will need to tag it in the code to export it. This allows us to carefully review what's being exported and what's not in future configuration changes and this can be reviewed by the community before implementation.
Collected configuration fields are hard-coded.
Opt-in vs. Opt-out
Another topic we need to discuss is do we make it opt-in or opt-out?
The easiest way would be to set it opt-in: if you want to export your config, you need to enable it in your configuration.
The major downside of this is that we have doubts as to whether users will enable the data collection by themselves. This could lead to a useless feature for the developers team as the whole point of this is to get a good idea of how Traefik is used. We need a certain amount of feedback to get relevant data. Further, we think that only advanced/active users in the community would enable this option and collected data would be biased.
Our ideal goal would be to make it opt-out. But we don't want to scare our community with this :'(. This is the best solution for the developers teams, but it is only going to be possible if users are confident on the collection mechanism and if things are done transparently.
Transparency & Trust
We want to be as transparent as possible on this. Here are few principles we aim to follow:
How could you help ?
The best thing you can do is voice your opinion about this :) We need your feedback, your ideas, your constructive criticism. Help us build a mechanism that will give the developer team a better idea of how is used Traefik and focus on what matters, while still working for you and your businesses.
I think you could add
[traefik-anonymous-stats] collect: true store: /path/to/store (or any other storage) auto-share: true keys-to-share: - volume - network - settings
Not sure to understand the difference
I really think we should stay as simple as possible and this is a bit over-engineered IMHO :)
Same as previous item.
Again, already in the proposal: Detail which data is sent in the documentation
In the proposal, the collected stats are not linked to any bug reporting mechanism. We just want to send some stats at a fixed rate (every day ?).
Here are some more details:
Thanks for the writeup @emilevauge, I appreciate the effort to communicate this as clear as possible.
May I suggest that perhaps having a way for the users to view the exported data (dump to disk, http endpoint, or otherwise) would perhaps make this less "scary"?
Other than that, I think the proposal is good and in my opinion making it an opt-in might gain you next to nothing. As long as it's clearly documented / communicated (your mention of logging the stat-sending action is great) then I think opt-out is very reasonable.
@emilevauge I sort of missed that point!
That sounds good to me, yes, my only concern is really whether it would make logs too noisy in large setups? If that log entry is going to be kilobytes long then maybe it shouldn't include the data (perhaps some users don't even want these statistics to end up in their log aggregation systems - although them being anonymous I don't see that being an issue). It depends on how much data you expect to collect and the collection frequency, I suppose. Personally, I'm not too fussed about the means of "inspection" as long as there's the option.
Happy to help with some real life data, however:
I totally agree, this is why I wrote in first place the fact that is important to have the possibility to: just collect data - then at least 2 options to: auto-send (or not) as well other one to dump it to a file to further manual inspect+send. Just check my first comment on this topic, you'll see that I've covered the main things which are important in Enterprise Environment.