Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic telemetry for the Audacity #835

Closed
wants to merge 29 commits into from
Closed

Conversation

crsib
Copy link
Member

@crsib crsib commented May 4, 2021

Please, see our response:

#889


Dear all,

Due to the large amount of worry about this PR, (which we completely understand), we want to clarify exactly what is going on:

  1. Telemetry is strictly optional and disabled by default. No data is shared unless you choose to opt-in and enable telemetry.
  2. Telemetry only works in the builds made by GitHub CI from the official repo (the telemetry URLs are only defined there).
  3. If you are compiling Audacity from source, we will provide a CMake option to enable the telemetry code. This option will be turned off by default.

image

Why have telemetry at all?

Essentially, it’s to help us to identify product issues early:

  1. Audacity is widely used across several platforms, but we have no information on the application stability.
  2. It is difficult for us to estimate the size of the user base accurately.
  3. We need a way to make informed decisions about which OS versions to support. For example, can we raise the minimum version of the macOS to 10.10 to update the wxWidgets to the latest version?
  4. We have a known issue with the new file format introduced in Audacity 3.0. We found it with the great help of the community members on our forum. However, there is no way for us to estimate the impact of these issues on users. Is it just a random case? Do we need to rush the work on the recovery tool or help the users one by one? Or do we need to rethink the file format to make it safer and more easily recoverable?

Regarding the concerns about the choice of providers:

  1. We do not incorporate cross-site tracking, limiting the ability to identify the user by both Google and Yandex.
  2. Yandex would only receive the “application opened” event to help us estimate the size of the user base.
  3. Google would only receive:
    a. Session start and end events;
    b. Errors for debugging;
    c. File formats used for import and export;
    d. OS and Audacity versions;
    e. Use of effects, generators, and analysis tools to prioritize future improvements;
  4. We will consider replacing Google and Yandex with another service if we find one that fulfills our requirements - thanks for the suggestions and keep them coming.

Just to reiterate, telemetry is completely optional and disabled by default. We will try to make it as clear as possible exactly what data is collected if the user chooses to opt-in and enable telemetry. We will consider adding the fine-grained controls that some of you have asked for.


Also:

To address the concerns about use of private library versions, the largest part of this pull request is a networking layer built on top of libcurl. This library is chosen as it is an industry-standard for cross-platform networking. It is exceptionally tested and reviewed by industry experts. The layer on top of it, aka lib-network-manager meant to simplify the development of future features. libcurl is used without any patches. It is possible to use the system-provided version of the library available on the Linux distributions, and we will double-check that this works as expected.


Original PR description:

This request provides the basic telemetry for Audacity.

To implement the network layer libcurl is used to avoid issues with the built-in networking of the wxWidgets.

Universal Google Analytics is used to track the following events:

  • Session start and end
  • Errors, including errors from the sqlite3 engine, as we need to debug corruption issues reported on the Audacity forum
  • Usage of effects, sound generators, analysis tools, so we can prioritize future improvements.
  • Usage of file formats for import and export
  • OS and Audacity versions

To identify sessions we use a UUID, which is generated and stored on the client machine.

We use Yandex Metrica to be able to correctly estimate the daily active users correctly. We have to use the second service as Google Analytics is known to have some really tight quotas.

Both services also record the IP the request is coming from.

Telemetry collection is optional and configurable at any time. In case of data sharing is disabled - all calls to the telemetry Report* functions are no-op.

Additionally, this pull request comes with a set of libraries to help the future efforts on Audacity.

It seems, that my local runner has a newer compiler
@crsib crsib changed the title Basic telemetry for the audacity Basic telemetry for the Audacity May 4, 2021
@crsib
Copy link
Member Author

crsib commented May 4, 2021

VERSION was renamed to VERSION-* due to issues with the XCode 12.5.

The same reason stands for raising minSDKVersion on macOS to 10.9

@JamesCrook
Copy link
Contributor

New libs need COPYING files (or equivalent) to state their license.
New libs need to be listed with their licenses in the top level README.txt file.

@crsib
Copy link
Member Author

crsib commented May 4, 2021

New libs need COPYING files (or equivalent) to state their license.

ThreadPool has COPYING, libcurl is downloaded in a similar way to the wxWidgets.

New libs need to be listed with their licenses in the top level README.txt file.

Both libraries are now added to REAMDE.txt

@JamesCrook
Copy link
Contributor

What's the status on lib-string-utils, lib-uuid, lib-timer, lib-telemetry Dmitry?

E.g. if they are entirely ours, are they "GPL2 or later" or MIT or something else? Maybe that just needs a comment in README.txt to say that /libraries is under same license as /src ?

Separately, now assuming these are all ours, the class files will need the /**** and \class \brief style of doxygen comments, so that doxygen https://doxy.audacityteam.org/annotated.html will include these classes. Probably the doxygen .dox.in file needs to be updated to include /libraries too.

@IGBC
Copy link

IGBC commented May 10, 2021

OK this PR is closed. But its link has been plastered all over the internet. I suggest to the devs this thread be locked. As I see it being a hotbed for drama for the foreseeable future.

@ccoenen
Copy link

ccoenen commented May 10, 2021

I would like to highlight one post by @SndChaser which might also have been lost to the torrent of comments:

#835 (comment)

The suggestion to make this not a compile-time decision but a plugin makes this much more OK for me. I am on the "absolutely not tracking period" train, and I think that that suggestion above can suit both things at the same time. The main Codebase would not be doing any networking. But people (and not just developers!) could opt-in to tracking very transparently.

I also think that crash reporting should be equally decoupled from the main codebase, if technically possible.

@GalGreenfield
Copy link

I haven't read the whole thread, but @crsib have you considered using other analytics solutions that don't sell user data? There are various such existing solutions.

@StarWitch
Copy link

@domsson Wow, I hadn't actually seen that site. It does not surprise me at all that download numbers and market share and put above the actual features of the apps. It is very clear where their priorities lie just from this, and their main site only confirms this.

Audacity may very well be in the slaughtering house now.

And please, core contributors, don't paint me or others in this thread as some kind of "outsiders" to Audacity. I've personally casually used it as a tool for over a decade. For you to turn around and tell your userbase that their opinions don't matter is extremely damning concerning the direction of this whole conversation and the FOSS community in general.

@eevvoor
Copy link

eevvoor commented May 10, 2021

I have used audacity for decades and am very disappointed.

@asdf8dfafjk
Copy link

OMG, this is the most hilarious virtue signaling thread I have seen in my entire life. Look at the bold statement making basement dwelling famiu.

This. Exactly how does one "acquire" a GPL project? Because from my understanding, it belongs to the contributors and only to the contributors.

Everyone, lest this be forgotten, it's important to know that famiu has, you guessed, exactly 0 commits. https://github.com/audacity/audacity/pulls/famiu.

I hope all toxic losers who are in this thread (I guess all telemetry FUD spreaders) are losers like him.

@ccoenen
Copy link

ccoenen commented May 10, 2021

Not all contributions are code, you know? Also, GPL/AGPL and pretty much any other license is not about developers. It's about giving the general population rights. Look for the "four basic freedoms", if you need clarification.

@IGBC
Copy link

IGBC commented May 10, 2021

I have used audacity for decades and am very disappointed.

This PR was rejected, before your comment, what are you disappointed about?

@tilda
Copy link

tilda commented May 10, 2021

can this just be locked already? there's not much else anyone can really contribute to the discussion...

@illwieckz
Copy link

@asdf8dfafjk

Everyone, lest this be forgotten, it's important to know that xxxx has, you guessed, exactly 0 commits.

You talked about commits on this project, but some people there only have like a handful of commits in a single repo and a handful of issues created on the whole GitHub over 10 years, others only have code related to school evaluations… And may have said others do not understand how GitHub works or how software development works…

This is what happens when people confuses GitHub with Twitter.

@Martin-Eckleben
Copy link

OMG, this is the most hilarious virtue signaling thread I have seen in my entire life. Look at the bold statement making basement dwelling famiu.

This. Exactly how does one "acquire" a GPL project? Because from my understanding, it belongs to the contributors and only to the contributors.

Everyone, lest this be forgotten, it's important to know that famiu has, you guessed, exactly 0 commits. https://github.com/audacity/audacity/pulls/famiu.

I hope all toxic losers who are in this thread (I guess all telemetry FUD spreaders) are losers like him.

@IGBC is that you drunk again? :D
(#856 (comment))

I had a good laugh I have to say :))

@IGBC
Copy link

IGBC commented May 10, 2021

@Martin-Eckleben Nah I only have the one account, that one wasn't me. Besides I am anti tracking @asdf8dfafjk is firmly pro.

@jxu
Copy link

jxu commented May 10, 2021

I'm disappointed I wasn't the target of a clearly troll personal attack on my commit history.

@famiu
Copy link

famiu commented May 11, 2021

OMG, this is the most hilarious virtue signaling thread I have seen in my entire life. Look at the bold statement making basement dwelling famiu.

This. Exactly how does one "acquire" a GPL project? Because from my understanding, it belongs to the contributors and only to the contributors.

Everyone, lest this be forgotten, it's important to know that famiu has, you guessed, exactly 0 commits. https://github.com/audacity/audacity/pulls/famiu.

I hope all toxic losers who are in this thread (I guess all telemetry FUD spreaders) are losers like him.

Oh wow. it seems nobody hugged you while you were growing up... That's just sad, buddy. It would be nice though if you kept that shitty personality of yours to yourself, that would save yourself a lot of embarassment

@turtlegarden
Copy link

OMG, this is the most hilarious virtue signaling thread I have seen in my entire life. Look at the bold statement making basement dwelling famiu.

This. Exactly how does one "acquire" a GPL project? Because from my understanding, it belongs to the contributors and only to the contributors.

Everyone, lest this be forgotten, it's important to know that famiu has, you guessed, exactly 0 commits. https://github.com/audacity/audacity/pulls/famiu

I hope all toxic losers who are in this thread (I guess all telemetry FUD spreaders) are losers like him.

Oh wow! You work for Microsoft or something? Because there's reason to worry about "annonymous" telemetry (hint: not annonymous). Also, please keep your horrible personality to yourself. This is GitHub, and not Twitter.

@julienbenjamin
Copy link

@asdf8dfafjk

Everyone, lest this be forgotten, it's important to know that xxxx has, you guessed, exactly 0 commits.

You talked about commits on this project, but some people there only have like a handful of commits in a single repo and a handful of issues created on the whole GitHub over 10 years, others only have code related to school evaluations… And may have said others do not understand how GitHub works or how software development works…

This is what happens when people confuses GitHub with Twitter.

Or moved to GitLab after Github got bought.

@turtlegarden
Copy link

Yep, time to close this. But will it be closed?

@RubenKelevra
Copy link

I don't see why there is so much backslash. We have this on browsers, which do process much more private informations than an audio editor.

It's opt-in anyway. If you don't want to share data either don't opt-in or compile it for yourself without the module :)

@falkTX
Copy link

falkTX commented May 11, 2021

Read the backlog if you want to know why.
Not all of us are complicit and don't care. The amount of add-ons needed to make browsing the web a proper experience clearly shows browsers are not a model to follow.

@BenBE
Copy link

BenBE commented May 11, 2021

I don't see why there is so much backslash.

Because the negative aspects very much outweigh the downsides. Read the discussion for more details.

We have this on browsers, which do process much more private information than an audio editor.

And even with browsers there are enough people quite angry with the situation. But given the implications there (about 4 days for custom builds to remove tracking, when building from source on your own hardware and not everybody can manage this) there's unfortunately not much alternatives to choose from. But that's a different topic.

It's opt-in anyway.

That's what the PR officially says. The actual patch had several issues that inverted this logic and thus made it active by default as well as anti-patterns that coerced the user to just click thru to enable it.

Oh, and that telemetry was neither anonymous (use of known advertiser services) nor was it limited to actual telemetry only: The PR includes an UUID per user, thus allows for tracking of installations.

If you don't want to share data either don't opt-in or compile it for yourself without the module :)

Which brings us back to the browsers: Have you actually tried to build Audacity from source? Do you think this is a sustainable solution for the average user?

Overall there are much better alternatives to baking telemetry directly into the core application. One for example is having telemetry added by a plugin that the user consciously installs. This also has the advantage that everything the telemetry monitors would also be available by official APIs for other plugins to use. Doing it that way also guarantees the agreement to the collection of personalizable information under GDPR in conscious, informed and voluntary.

@RubenKelevra
Copy link

Read the backlog if you want to know why.
Not all of us are complicit and don't care. The amount of add-ons needed to make browsing the web a proper experience clearly shows browsers are not a model to follow.

Yeah sorry, not possible with this amount of comments. :/

@RubenKelevra
Copy link

@BenBE alright, I see your point. I expected that the description was accurate. I have read some of the comments and it felt not really substantial what people complained about.

Maybe it's worth reworking this patch and run the collection on servers of the project instead of third party ones.

I do see a large benefit of collecting crash informations for example, since audacity isn't really that stable to begin with. There's room for improvement.

@Nex4rius
Copy link

I don't see why there is so much backslash. We have this on browsers, which do process much more private informations than an audio editor.

It's opt-in anyway. If you don't want to share data either don't opt-in or compile it for yourself without the module :)

A doing bad things doesn't justify when B does bad things.

@BenBE
Copy link

BenBE commented May 11, 2021

@BenBE alright, I see your point. I expected that the description was accurate. I have read some of the comments and it felt not really substantial what people complained about.

There's lots of virtue signalling in the comments, unfortunately.

Maybe it's worth reworking this patch and run the collection on servers of the project instead of third party ones.

This PR introduces a full network stack into an application that successfully managed without one for over 20 years. Thus introducing one really should have a very good reason and given the expected overall quality of the collected information the data ain't the compelling reason in this matter. There are simply too many threads attached that negatively affect the quality of the collected data, thus there's much easier and worthwhile methods to produce even better statistics to base your decisions on.

Remains the part regarding crash reports:

I do see a large benefit of collecting crash information for example, since audacity isn't really that stable to begin with. There's room for improvement.

One of the few points that I agree. Unfortunately even here you don't really need to introduce networking into the core application. To give one example: htop simply gives a backtrace, some basic information required for a bug report and instruction how to help to make the backtrace more informative (basically: How you call objdump - or its equivalent - on your platform so you can attach that file to the bug report). This gives the option for the user to opt-in sending the bug report and still manages to respect the freedom of the user to not being tracked by the application. If you want to make it more friendly for beginners you could even use a separate application that pops up just when the main application crashed (like Firefox and several other tools do), but even then the main application does not need any linkage to any networking stack.

And FWIW: The information that htop asks the user to provide usually helps to pin down the exact line of the crash within about 20 minutes, even when handling the objdump file by hand. Could be more convenient to do, but the fact is, that the actual bug reporting can and should be fully offline.

@Daniel34M
Copy link

Seriously please close this pr. I think most of the possible arguments against this have already been established and I hope all the team work in favor of the comunity

@Amolith
Copy link

Amolith commented May 11, 2021

@Daniel34M

Seriously please close this pr. I think most of the possible arguments against this have already been established and I hope all the team work in favor of the comunity

It has already been closed. I do, however, think it would be a good idea to lock the PR to prevent anyone from posting yet more comments; I agree that pretty much every possible argument on all sides has been made and we probably crossed that point a day or two ago.

@audacity audacity locked and limited conversation to collaborators May 11, 2021
@crsib
Copy link
Member Author

crsib commented May 13, 2021

Please, see our response:

#889

@crsib crsib deleted the telemetry branch May 24, 2021 18:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet