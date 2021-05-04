Skip to content

Basic telemetry for the Audacity #835

@crsib commented May 4, 2021
Dear all,

Due to the large amount of worry about this PR, (which we completely understand), we want to clarify exactly what is going on:

  1. Telemetry is strictly optional and disabled by default. No data is shared unless you choose to opt-in and enable telemetry.
  2. Telemetry only works in the builds made by GitHub CI from the official repo (the telemetry URLs are only defined there).
  3. If you are compiling Audacity from source, we will provide a CMake option to enable the telemetry code. This option will be turned off by default.

image

Why have telemetry at all?

Essentially, it’s to help us to identify product issues early:

  1. Audacity is widely used across several platforms, but we have no information on the application stability.
  2. It is difficult for us to estimate the size of the user base accurately.
  3. We need a way to make informed decisions about which OS versions to support. For example, can we raise the minimum version of the macOS to 10.10 to update the wxWidgets to the latest version?
  4. We have a known issue with the new file format introduced in Audacity 3.0. We found it with the great help of the community members on our forum. However, there is no way for us to estimate the impact of these issues on users. Is it just a random case? Do we need to rush the work on the recovery tool or help the users one by one? Or do we need to rethink the file format to make it safer and more easily recoverable?

Regarding the concerns about the choice of providers:

  1. We do not incorporate cross-site tracking, limiting the ability to identify the user by both Google and Yandex.
  2. Yandex would only receive the “application opened” event to help us estimate the size of the user base.
  3. Google would only receive:
    a. Session start and end events;
    b. Errors for debugging;
    c. File formats used for import and export;
    d. OS and Audacity versions;
    e. Use of effects, generators, and analysis tools to prioritize future improvements;
  4. We will consider replacing Google and Yandex with another service if we find one that fulfills our requirements - thanks for the suggestions and keep them coming.

Just to reiterate, telemetry is completely optional and disabled by default. We will try to make it as clear as possible exactly what data is collected if the user chooses to opt-in and enable telemetry. We will consider adding the fine-grained controls that some of you have asked for.

Also:

To address the concerns about use of private library versions, the largest part of this pull request is a networking layer built on top of libcurl. This library is chosen as it is an industry-standard for cross-platform networking. It is exceptionally tested and reviewed by industry experts. The layer on top of it, aka lib-network-manager meant to simplify the development of future features. libcurl is used without any patches. It is possible to use the system-provided version of the library available on the Linux distributions, and we will double-check that this works as expected.

Original PR description:

This request provides the basic telemetry for Audacity.

To implement the network layer libcurl is used to avoid issues with the built-in networking of the wxWidgets.

Universal Google Analytics is used to track the following events:

  • Session start and end
  • Errors, including errors from the sqlite3 engine, as we need to debug corruption issues reported on the Audacity forum
  • Usage of effects, sound generators, analysis tools, so we can prioritize future improvements.
  • Usage of file formats for import and export
  • OS and Audacity versions

To identify sessions we use a UUID, which is generated and stored on the client machine.

We use Yandex Metrica to be able to correctly estimate the daily active users correctly. We have to use the second service as Google Analytics is known to have some really tight quotas.

Both services also record the IP the request is coming from.

Telemetry collection is optional and configurable at any time. In case of data sharing is disabled - all calls to the telemetry Report* functions are no-op.

Additionally, this pull request comes with a set of libraries to help the future efforts on Audacity.
Adds lib-network-manager
Adds lib-string-utils
Adds lib-uuid
Adds lib-timer
Adds lib-telemetry with GA back-end
lib-telemetry is intgrated. Only sessions reporting
Fixes github CI build

Fixes Linux build
ReportEvent renamed to ReportBuiltinEvent, as Windows has a ReportEvent macro
…nt macro
Support reporting for the built in events
Adds a telemetry permission dialog
Adds preference page about the app analytics
Adds implementation for the YandexMetrica
Fixes XCode 12.5 build
Updates the privacy policy URL
Allow reading telemetry configuration from the GH secrets
Attempt to fix macOS workflow
Really truly fix CURL build
There is a bug in CMakeLists.txt for the libcurl, which lead to SSH2 being enabled every time the library is found. This is true for the GitHub runners
Report SQL errors to the telemetry
Report MessageBox exceptions to the telemetry
Fixes workflow for Ubuntu
Fixes Ubuntu build for PR
It seems, that my local runner has a newer compiler
Fixes a crash if UserTrackingService was not set
@crsib
Contributor Author

@crsib commented May 4, 2021

VERSION was renamed to VERSION-* due to issues with the XCode 12.5.

The same reason stands for raising minSDKVersion on macOS to 10.9
@JamesCrook
Copy link
Member

@JamesCrook commented May 4, 2021

New libs need COPYING files (or equivalent) to state their license.
New libs need to be listed with their licenses in the top level README.txt file.
@crsib
Updates the information about the libraries used
Loading status checks…
ac88a4f
@crsib
Copy link
Contributor Author

@crsib commented May 4, 2021

New libs need COPYING files (or equivalent) to state their license.

ThreadPool has COPYING, libcurl is downloaded in a similar way to the wxWidgets.

New libs need to be listed with their licenses in the top level README.txt file.

Both libraries are now added to REAMDE.txt
@JamesCrook
Copy link
Member

@JamesCrook commented May 4, 2021

What's the status on lib-string-utils, lib-uuid, lib-timer, lib-telemetry Dmitry?

E.g. if they are entirely ours, are they "GPL2 or later" or MIT or something else? Maybe that just needs a comment in README.txt to say that /libraries is under same license as /src ?

Separately, now assuming these are all ours, the class files will need the /**** and \class \brief style of doxygen comments, so that doxygen https://doxy.audacityteam.org/annotated.html will include these classes. Probably the doxygen .dox.in file needs to be updated to include /libraries too.
@SndChaser
Copy link

@SndChaser commented May 7, 2021

Seriously, these kinds of controversies have been rehashed over and over and over and over and over and over and ... and they always have the exact same outcome: nothing. If people care, which they do not, they will fork the project and the fork will gain traction over the original, but it won't because they do not.

I would suggest you look at the community you are addressing this to: LibreOffice is the result of a fork from OpenOffice.org when Oracle took it over and started making changes that met with major disapproval.

The Maria Database came about from the same situation with Oracle taking over MySQL.

NeoVim came about because some users felt that the development speed and direction of Vim wasn't moving in the right direction (it wasn't even an issue was controversial as this one).

You should learn some history about the open source world. There is a long history of projects being forked when there are issues that developers have ignored their users feedback.

Reacting with thumbs down doesn't prevent it from happening, it just tags you as the kind of person who thinks reacting with thumbs down prevents something from happening.

No, reacting with a thumbs down is an indicator that your comment doesn't deserve support because it is a deeply flawed and ignorant argument.
@AndreiSva
Copy link

@AndreiSva commented May 7, 2021

NeoVim came about because some users felt that the development speed and direction of Vim wasn't moving in the right direction (it wasn't even an issue was controversial as this one).

Vim and NeoVim are co-existing just fine.
@nikitalita
Copy link

@nikitalita commented May 7, 2021
edited

"We do not incorporate cross-site tracking, limiting the ability to identify the user by both Google and Yandex."
This is misleading. If information is sent to either of these websites along with your IP, it's identifiable, period. It is easily correlated with you via the gobs and gobs of data they already have on you (Yandex more so if you're Russian). This is what they're good at, this is their bread and butter.

And people know this. I will never opt into telemetry with them as the service provider, and so will many, many, many other people who use this app, which will limit the usefulness of this. If you switched to to a self-hosted solution, I'd definitely feel more comfortable with it. I recognize the utility and usefulness of telemetry in designing helpful applications, but there are tremendous privacy concerns that come with it, and using Google or Yandex is a non-starter for a huge swath of your userbase.
@IGBC
Copy link

@IGBC commented May 7, 2021

Given that all of the Muse group people have not even considered removing Google and Yandex, the most controversial sticking points, I'm not hopeful.

They are considering it. They have not committed to not using Google Analytics and Yandex.

They are still not addressing the main points though: why are they collecting the data they chose, why didn't they include the community, and why do they think this is apropriate way of collecting information about their Community (not product)

my poll which has now over 1400 responses shows that 50% of respndants don't support the idea of any tracking at all. https://www.strawpoll.me/45241130/r yet no response on this matter from Audacity or Muse.
@IGBC
Copy link

@IGBC commented May 7, 2021

I think I should restate what I said directed at a developer earlier:

I think this response was carefully thought through but ultimately tone deaf. As has been repeated many times in this now unmanageably long thread, we, the community, are telling you, the developers, we don't believe this is an effective or appropriate measure.

We understand that you want feedback about the development direction of the software, and we are telling you the forum should be enough. I am curious why you do not trust the direct feedback from your thousands of active users, to the point you feel you need automated tracking systems to replace that model?

As for tracking in FOSS, IF it is indeed useful it must also be considered how that data can be shared safely and fairly. What segment of users this data truly represents, and how compliance with the myriad of laws surrounding data protection can be abided. We, the community, have seen no planning or engagement with the community about how to approach these important issues.

Finally with the introduction of new corporate developers in recent times means we don't believe you when you say this is for the god of the community anymore. This is not helped by the repeated statements from devs that this is optional and anonymous, when frankly the "opt in" dialog is already in violation of GDPR, and it is widely accepted by the security community that google analytics is not anonymous. You either fundamentally misunderstand the technology you are trying to implement or your are misleading the community.

You cannot sidestep these issues
@SndChaser
Copy link

@SndChaser commented May 7, 2021

NeoVim came about because some users felt that the development speed and direction of Vim wasn't moving in the right direction (it wasn't even an issue was controversial as this one).

Vim and NeoVim are co-existing just fine.

Exactly... This was less of an issue like this one, there was just a feeling that there needed to be a different direction for the development. LibreOffice and OpenOffice are co-existing as well...but I'd guess the user base for OOorg is a lot smaller now.
@immibis
Copy link

@immibis commented May 7, 2021

I would suggest you look at the community you are addressing this to: LibreOffice is the result of a fork from OpenOffice.org when Oracle took it over and started making changes that met with major disapproval.

The Maria Database came about from the same situation with Oracle taking over MySQL.

You will note that in these cases, the main developers of the project moved to the fork.
@SndChaser
Copy link

@SndChaser commented May 7, 2021

It should probably get some practical and boring name like LibreAudio until someone can come up with a good pun.

Thanks for calling my suggestion boring and practical... LOL
@IGBC
Copy link

@IGBC commented May 7, 2021

Oh **** I didn't even see someone else made the same suggestion, sorry @SndChaser
@falkTX
Copy link

@falkTX commented May 7, 2021

they are not addressing this and will not address it because it is all pretty normal for them.
musescore already added google analytics, so this was the obvious next step. just integrating the new acquisition onto the same toolset they have in place.

the responses here are the exception, not the norm.
I bet some of them are baffled to what is going on, doesnt make sense to them.

I personally have very little hopes of google analytics being taken out. it is now the modus operandi of businesses and corporations. in order to grow in userbase, understand which areas need attention, generally bringing "value" to the "product" 🤢 they take analytics as a necessity.

this is a hard clash of ideals between corporate attitude and free, open-source ethos.
not everyone shares the same ideals of, for example, blender's author when he says he has no interest on money.
a recommended watch https://www.youtube.com/watch?v=qJEWOTZnFeg
@SndChaser
Copy link

@SndChaser commented May 7, 2021

Oh **** I didn't even see someone else made the same suggestion, sorry @SndChaser

It's alright, I was genuinely laughing....
@IGBC
Copy link

@IGBC commented May 7, 2021

this is a hard clash of ideals between corporate attitude and free, open-source ethos.
not everyone shares the same ideals of, for example, blender's author when he says he has no interest on money.
a recommended watch https://www.youtube.com/watch?v=qJEWOTZnFeg

@falkTX Well stated, but it's our job to show muse their corporate mindset will not get them very far here. Audacity's roots are in the scientific community, that its where it grew from. I doubt many users are interested in the Musings or wills of our corporate overlords.

(pun intended)
@SndChaser
Copy link

@SndChaser commented May 7, 2021

I would suggest you look at the community you are addressing this to: LibreOffice is the result of a fork from OpenOffice.org when Oracle took it over and started making changes that met with major disapproval.
The Maria Database came about from the same situation with Oracle taking over MySQL.

You will note that in these cases, the main developers of the project moved to the fork.

In those cases that is true. But there are lots of other cases where the original developer left a project, and others took over by forking the project.

My point was that the concept of forking a project due to disagreements over development trajectories is not unusual in the open source world. Whether the original developers move over, or if the fork gains the same following as the original was not really the point... Addressing the lack of understand of the person I was responding to was the point.
@davidhealey
Copy link

@davidhealey commented May 7, 2021

Boo
@mc776
Copy link

@mc776 commented May 7, 2021

It should probably get some practical and boring name like LibreAudio until someone can come up with a good pun.

Thanks for calling my suggestion boring and practical... LOL

Boring and practical and easy to remember and on-brand and self-explanatory... is good name
@SndChaser
Copy link

@SndChaser commented May 7, 2021

It should probably get some practical and boring name like LibreAudio until someone can come up with a good pun.

Thanks for calling my suggestion boring and practical... LOL

Boring and practical and easy to remember and on-brand and self-explanatory... is good name

I was just making a joke based on the OpenOffice.org -> LibreOffice transition... But it might just stick. :)
@j0lol
Copy link

@j0lol commented May 7, 2021

I would suggest you look at the community you are addressing this to: LibreOffice is the result of a fork from OpenOffice.org when Oracle took it over and started making changes that met with major disapproval.
The Maria Database came about from the same situation with Oracle taking over MySQL.

You will note that in these cases, the main developers of the project moved to the fork.

in these cases, the older projects are also more well known! please can we discuss forks as a last resort, because most people will not ever switch to your new audacity fork meaning they will not see the benefits of an audacity fork
@IGBC
Copy link

@IGBC commented May 7, 2021

Screenshot_2021-05-07 Audacity PR #835
The poll I created is at 1500 votes, now representing 1% an equivalent of the alleged user base on the forum. The statistics remain unchanged 50% of responses state a firm no to any form of tracking.
as always the poll is available at https://www.strawpoll.me/45241130/r
@SndChaser
Copy link

@SndChaser commented May 7, 2021
edited

Have you looked at https://sentry.io/welcome/ as this seems to be what you want you want and would then remove Google and yandex which seems better and you can even host it yourself if I recall correctly

I'd be cautious of this one... I don't see anything licenses, and I don't see any clear statements about how data is handled if it isn't self-hosted. IOW - it would be possible to encounter many similar issues that we have with Google and Yandex now... And the fact that this isn't an open source project gives me quite a bad feeling about it being used in an open source project.

https://github.com/getsentry/

Right, this doesn't use an Open Source license in my opinion: https://github.com/getsentry/sentry/blob/master/LICENSE

In fact to quote the license:

The Business Source License (this document, or the "License") is not an Open
Source license. However, the Licensed Work will eventually be made available
under an Open Source License, as stated in this License.

P.S. I see that someone else replied to this earlier....I'm still catching up on comments since I was away for a couple of hours. (Hopefully caught up now.)
@nicemicro
Copy link

@nicemicro commented May 7, 2021

This just occurred to me, and I'm not sure whether anyone mentioned it already: if the telemetry is using methods and services that goes against the conscious of most privacy minded folks, it would mean that most Linux distributions will build Audacity without it for their repositories, and that will result in no user data from Linux.
Wouldn't it just defeat the purpose of the telemetry from the start then?
@nicolasdanelon
Copy link

@nicolasdanelon commented May 7, 2021

Just edit you /etc/hosts and no google analytics for you.
@logan2611
Copy link

@logan2611 commented May 7, 2021

Just edit you /etc/hosts and no google analytics for you.

You shouldn't have to do that
@nicolasdanelon
Copy link

@nicolasdanelon commented May 7, 2021

Just edit you /etc/hosts and no google analytics for you.

You shouldn't have to do that

you keep using ad blockers?
@logan2611
Copy link

@logan2611 commented May 7, 2021
edited

Just edit you /etc/hosts and no google analytics for you.

You shouldn't have to do that

you keep using ad blockers?

Yes, and I shouldn't have to. I use adblockers because I do not support super annoying and laggy ads, not to mention the tracking that comes with almost all of them. I strongly prefer websites that don't have ads at all. You did nothing to address my point whatsoever
@Amolith
Copy link

@Amolith commented May 7, 2021
edited

@nicemicro

if the telemetry is using methods and services that goes against the conscious of most privacy minded folks, it would mean that most Linux distributions will build Audacity without it for their repositories, and that will result in no user data from Linux.

With this PR, telemetry would only be included in the official releases built through the GitHub CI system; Linux distributions building Audacity from source on their own infrastructure won't include the telemetry bit at all. This means the project wouldn't get any data from any Linux user installing it through their package manager in the first place. On one hand, I'm glad that none of my personal data would get sent to Google and Yandex. On the other hand, I'm pretty sure that means they wouldn't get potentially useful information from a significant chunk of their users so the collected data is incomplete.

Ignoring the privacy concerns for a moment, this functionality just seems badly implemented in its current state.
@SndChaser
Copy link

@SndChaser commented May 7, 2021

@Amolith

GitHub CI system; Linux distributions building Audacity from source on their own infrastructure won't include the telemetry bit at all. This means the project wouldn't get any data from any Linux user installing it through their package manager in the first place. On one hand, I'm glad that none of my personal data would get sent to Google and Yandex. On the other hand, I'm pretty sure that means they wouldn't get potentially useful information from a significant chunk of their users so the collected data is incomplete.

Incomplete is an understatement. From a statistical point of view it would be an invalid data set, which would be no basis for making any decisions (technical or otherwise).

Ignoring the privacy concerns for a moment, this functionality just seems badly implemented in its current state.

Worse than badly implemented. It seems broken by design.
@AndreiSva
Copy link

@AndreiSva commented May 7, 2021

Maybe this PR is more targeted towards windows and Mac OS users?
@HeyBanditoz
Copy link

@HeyBanditoz commented May 7, 2021

@Blu3wolf
Copy link

@Blu3wolf commented May 8, 2021

Reacting with thumbs down doesn't prevent it from happening, it just tags you as the kind of person who thinks reacting with thumbs down prevents something from happening.

So, I gather you hadn't spotted the number of new forks then.
@TripingPC
Copy link

@TripingPC commented May 8, 2021

I've used Audacity all my life, since I was like 6 years old. I'm now 22. (take my data, mine it I don't care) and I did so on Windows. I've also run Linux exclusively for quite a few years. But these days, the only times telemetry genuinely annoys me is if it gets in my way. If I can turn off the targeted advertising that results from my data being collected. I don't care.

The way analytics are being approached in this case is at best useless and at worse worrying. like stated in this comment.

I don't agree with the foam at the mouth raging mentality of "Audacity is dead, let's make a fork now." (call it audavillage since you're reclused away from the city) I think I understand what happened here.

An attempt at a useful feature was made by a community member who was well-intentioned, But did not know how to handle these things properly. Using proprietary obscure services like Google Analytics and Yandex Metrics. As well as explicitly storing records of IPs and UUIDs.

The main developer of Ardour has popped up in this thread and mentionned how Ardour uses (used) anonymized telemetry to get a better idea of the divide in operating systems amongs their users. This is useful info. THAT CAN BE GUESSED FROM PEOPLE DOWNLOADING THE APP.

I will say, a manual analytics feature that lets you get a full system report including data about Audacity to put into a bug report would be useful. But automated background analytics aren't. Recurring data can only ever be used for engagement metrics useless to development.

I realize this is a mess of a comment. sorry.

I wish the best to the Audacity Team.
