Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opt in for error reports and usage statistics #2766

Closed
rradar opened this issue May 5, 2019 · 49 comments
Closed

Opt in for error reports and usage statistics #2766

rradar opened this issue May 5, 2019 · 49 comments

Comments

@rradar
Copy link

rradar commented May 5, 2019

NOTICE:

As a work'a'round just remove etcher completely and use usbimager which does the same as etcher (and a little more as it also can read content of a flash disk/card and saves it as compressed image file). But it comes without all the unnecessary tracking/ads/etc which is included in etcher and by default turned on.



  • Etcher version: 1.5.19
  • Operating system and architecture: amd64, debian
  • Image flashed: none
  • Do you see any meaningful error information in the DevTools? no, why?

I just installed balena etcher via the debian repository. After installing I started with

balena-etcher-electron

got a

ready-to-show: 2810.265ms

and the ui was presented. When I clicked the settings wheel top right I saw that a 'service' called

Anonymously report errors and usage statistics to balena.io

which is/was activated by default.

I didn't digged deeper but I'm almost sure there was already a data leakage before I was able to deactivate this 'feature'. I'm also not a lawyer but with new European laws this is for sure not tolerable anymore.

Please make this option opt in. Thank's

@Zeik0s
Copy link

Zeik0s commented May 6, 2019

Well, if it's really anonymous, then it complys with GDPR. Because GDPR does not apply to anonymous Data.

@sneak
Copy link

sneak commented May 7, 2019

related discussion: #2497

@rradar
Copy link
Author

rradar commented May 8, 2019

I read some old issues/history regarding this and I think the best one would be to file a complain to the official GDPR bureu in Greece to get this checked as the creators of the software deny fixing this since a long time. As stated on the balena website they have office in Athen all legal action can be easily enforced there. For sure (at least) my IP is leaked when 'anonymous data' is transmitted (without my consent!) and this is already enough to violent laws.

Really bad for a foss software to see something like this happens...

I now have to choose to go back to dd or get analysed burning my images? I have a quick work'a'round blocking all network connections from balena-etcher in my firewall.

@thundron
Copy link
Contributor

Please see #2599 (comment)

@rradar
Copy link
Author

rradar commented Jul 13, 2019

Please see more #2599 (comment)

@ghost
Copy link

ghost commented Jul 19, 2019

@rradar
Copy link
Author

rradar commented Aug 12, 2019

complaints@dpa.gr 📬

@petrosagg
Copy link
Contributor

Hi everyone. I wanted to give an update to where we are with this issue. There are multiple issues raised so I will address them individually.

We should separate the discussion between what's legal and required by GDPR and requests that go beyond what the law requires. Specifically, GDPR requires opt-in consent for personally identifiable data, not for anonymous data collection. It is not our intention, nor is it useful for us, to collect personal identifiable information (see Purpose section bellow). So the first question is "Are we collecting personally identifiable information by mistake?" and the second question is "Is making the usage statistics opt-in the best decision for the project?"

Personal data collection

We conducted an extensive audit of all the data we collect from the Etcher application to make sure no personally identifiable data is collected by mistake. Collecting data by mistake might sound strange, but it can easily happen in a desktop application. For example, the mixpanel library will include information about the current system user by default when ran in an Electron app. Whenever we became aware of such issues in the past we prompty fixed them.

The results of our investigation showed that Etcher will make connection to the following systems:

Connection Included intentionally
Sentry YES
Mixpanel YES
Google Analytics (&doubleclick) NO
Go Squared NO
Facebook Pixel NO
Gstatic.com NO
jquery NO
Cloudfront NO
facebook.com/facebook.net NO

The large number of unintended connections happened as a side-effect of loading content from our balena.io website that includes these libraries automatically.
Action item: We are removing all instances of those connections from Etcher

Furthermore, we audited all the data we collect to make sure none can be characterised as personally identifiable. To do this properly are consulting our EU based lawyers that can provide an expert opinion on what the GDPR and EU law in general requires. It is important to refrain from making legal claims unless someone is intimately familiar with the legislation. Unfortunately, there have been a number legal claims in this and other threads with questionable validity.

To make this extremely clear, we are taking the law seriously and are investing time, money, and effort, to consult experts in the field to guide us on this matter. We do this because it is the right thing to do. We've done it before (for balenaCloud) and we'll happily do it for all the products we offer.

Even though our conversation with our legal team is still ongoing we have identified a couple of cases where PII is sent to our data collection system. Sentry, our error collection tool, will log a stacktrace when Etcher hits a critical error that can potentially include a path in the system which includes the username of the user. The IP address of the event was also logged.
Action item: We are fixing both of these problems and will remove or anonymise any data our legal team deems PII

Purpose of data collection

With the legal stuff out of the way, I wanted to touch on the reason we are collecting data which will hopefully help guide the discussion about whether it should be an opt-in or opt-out feature.

For most software engineers writing an image flashing application sounds easy. After all, at the very core it is a simple block copy operation that we've known how to do for ages. It can't possibly be that complex. However, this is far from the truth! After releasing etcher for the first time, and as the tool was gaining adoption we were seeing it run in more and more obscure combinations of systems. This produced a (very) long tail of issues that we couldn't have predicted or tested during development. It was through constant sieving through error reports and measuring success rates across deployed versions that we managed to reach the level of quality that you see today.

When we say that usage data helps develop etcher we're not talking about some abstract possibility. This is very real and has shaped the etcher we know and love. The list of bugs fixed is endless.

Discussion on making collection opt-in

With the full context fleshed out we can now re-engage in the discussion of making data collection opt-in. As mentioned above, we have to make the decision that is best for the project and somehow balance what the users expect from a privacy point of view with what the users expect from a robust piece of software point of view. Given the benefits we've already seen this is not a clear-cut decision. At the same time the userbase of Etcher has grown tremendously and one could argue that most issues have already been seen. Unfortunately I don't have a concrete way forward to offer just yet, but we haven't ruled it out as a possibility.

Finally, to further steer the discussion towards the right direction I will change the title of the issue to just the opt-in discussion. @rradar if you still think there is a legal issue please open a separate ticket clearly explaining the problem. Rest assured that we are working with our legal professionals to ensure we are not breaking the law.

@petrosagg petrosagg changed the title Opt in for error reports and usage statistics (comply with European laws) Opt in for error reports and usage statistics Sep 14, 2019
@thefaj
Copy link

thefaj commented Sep 16, 2019

Maybe aim for higher than "not breaking the law"…

@rradar
Copy link
Author

rradar commented Sep 16, 2019

balance what the users expect from a privacy point of view with what the users expect from a robust piece of software point of view.

Easy! I want to flash complete privately. #2890

  • I don't want to share the time I flash a image
  • I don't want to share if the flashing was successful or not
  • I don't want to share my filename and checksum of my file I'm flashing

Given the benefits we've already seen this is not a clear-cut decision.

Indeed. It is a clear-cut decision. No way around it.

@petrosagg
Copy link
Contributor

Easy! I want to flash complete privately. #2890

@rradar I think we're confusing what this issue is about. You can already disable usage statistics from your settings so if you want to flash completely privately by all means use this feature. It's why we put it there in the first place.

The only current bug with the feature, which we are working on right now and we'll release a fixed version in the following days, is that some libraries make a call to a remote server even if you merely require() them, without doing any API calls.

But please try to keep the discussion on point. What we're discussing here is if anonymous usage statistics should be opt-in. User choice is and will continue to be a feature of Etcher.

@rradar
Copy link
Author

rradar commented Sep 29, 2019

The collection of so called "anonymous usage statistics" needs to be opt-in. Otherwise everyone (even people who prefer not to leak their data) will be forced to participate in the data collection.

The setting which is implemented right now leaks data before it can be turned off -> NO GO!

@thefaj

This comment was marked as abuse.

@petrosagg
Copy link
Contributor

@thefaj I have repeated this many times but for some reason you seem to ignore it. We're not using Google Analytics. We're only using Mixpanel and Sentry.

Secondly, we actually do send anonymous data and strip events from personal information. If you believe this is false you have to provide counter evidence. The code is there for you to inspect. Until then your claim means nothing.

This is a dangerous product made by unethical people.

Personal insults are not allowed in this community, please remove this comment. Next time there won't be a warning.

@sneak
Copy link

sneak commented Sep 29, 2019

Given the benefits we've already seen this is not a clear-cut decision.

It turns out that collecting user data without explicit consent means that you end up violating the consent of your users in a fraction of cases where that's not what the user wants.

Doing things with a user's computer that they don't want makes your software malware.

It is only "not a clear-cut decision" if you don't mind violating the consent of your users, which is a despicable stance, if indeed you hold it. Please default data collection to off. Ask users on a first launch with a modal, if you wish. But do not use the network without explicit permission.

Rest assured that we are working with our legal professionals to ensure we are not breaking the law.

Hiding behind an "is it illegal?" to mask the fact that you violate user consent is not something you should be doing. It is rude and immoral, and you should strive to conduct your business in an ethical and respectful fashion.

@sneak
Copy link

sneak commented Sep 29, 2019

This is a dangerous product made by unethical people.

Said @petrosagg:

Personal insults are not allowed in this community, please remove this comment. Next time there won't be a warning.

There is an argument to be made that this is not a personal insult, but in fact an accurate objective description of the current state of affairs. If circumstances are such that failing to protect one's own privacy would result in danger, then Etcher's privacy-compromising default settings are indeed dangerous.

It is also clear that releasing the current, consent-violating-by-default version, is an unethical business practice, which would have to be undertaken by unethical people, necessarily. It wasn't an accident or oversight, it was a clear and definitive choice made by Balena staff, to place bug acquisition data over that of user consent.

The only thing remaining for Balena to do is to remedy this failure.

@petrosagg
Copy link
Contributor

Hiding behind an "is it illegal?" to mask the fact that you violate user consent is not something you should be doing. It is rude and immoral, and you should strive to conduct your business in an ethical and respectful fashion.

@sneak we are in agreement on this. The reason I brought up legality was only because there were claims that we are breaking the law, which had to be addressed. If you read my comment above I try to steer the conversation away from the legality and towards what is best for Etcher.

It turns out that collecting user data without explicit consent means that you end up violating the consent of your users in a fraction of cases where that's not what the user wants.

It is only "not a clear-cut decision" if you don't mind violating the consent of your users, which is a despicable stance, if indeed you hold it. Please default data collection to off. Ask users on a first launch with a modal, if you wish. But do not use the network without explicit permission.

The problem is that this decision is not in a vacuum. I could reformulate your statement as "It is only "not a clear-cut decision" if you don't mind ignoring the fraction of users that cannot use the software because of their peculiar setup." Is ignoring accessibility not despicable?

Doing things with a user's computer that they don't want makes your software malware.

This is not the widely accepted definition of the term malware. You are using it to have this extra "punch" in your message. That's not good faith discourse. From Wikipedia:

Malware (a portmanteau for malicious software) is any software intentionally designed to cause damage to a computer, server, client, or computer network.

But even ignoring that, how far can you stretch this definition? What is allowed for software to do on a user's computer in the first place? You say you shouldn't use the network, but one could say it shouldn't use the disk to store state, not take too much real estate on the screen etc.

There is a reason we are ok with some things but not others. This reason, at least for us, is not a deontological one. Since ethics have come up a lot, we are thinking under a consequential framework. When this decision was made we had concluded that having data opt-in will cause some damage to users that don't want to get tracked at all, but it would be less than the damage caused by not improving Etcher from error reports. That's it.

Everything else being equal I would choose no tracking every time too. But everything else is not equal.

Finally, the reason this issue has remained open is because we agree we should re-evaluate what the best state for Etcher is at the moment. For example, we understand that a lot of the big issues have already been fixed from previously collected data, and that at the current size of the user base it could be that the people that would opt-in make a representative sample. If we had a clear way forward I would have closed the issue stating our position.

We would be extremely happy if you could provide different angles from which you can make an argument between the inconvenience of opting out without a modal and better software without resorting to "This is bad, full stop" type of statements.

@sneak
Copy link

sneak commented Sep 29, 2019

It's not that it's bad. It's that you should not use a user's computer to do things that user does not want.

If you don't know if the user wants it or not, ask. But don't assume and proceed, because then in some set of cases you do what the user does not want, which is a universally bad thing, regardless of the benefits to you or to other users.

@rradar
Copy link
Author

rradar commented Oct 3, 2019

I wonder if the terminus Spyware is more valid for etcher than malware. Or at least that this software comes bundled with spyware from a users point of view.

"Spyware is a software that aims to gather information about a person or organization, sometimes without their knowledge, and send such information to another entity without the consumer's consent." https://en.wikipedia.org/wiki/Spyware

Knowledge an consent is the key. Both are not given/asked by etcher.

@lurch
Copy link
Contributor

lurch commented Oct 3, 2019

I'm hesitant to get involved in this heated conversation, but @rradar says "gather information about a person" and @petrosagg has said that the data collected is anonymous, and I'd say that you can't really classify "anonymous data" as "information about a person".

@sneak
Copy link

sneak commented Oct 3, 2019

Anonymous data:

  • ...is still data about a person. you don't know which one, but it's still information about them.

  • ...isn't really anonymous, as it comes in with an IP, which can be geolocated, and is associated with a netblock, which tells you even more about the "anonymous" person.

The real issue is consent, though.

@sneak
Copy link

sneak commented Oct 4, 2019

This affirms that the creators of this software have no respect for user consent.

This will need to be forked.

@lurch
Copy link
Contributor

lurch commented Oct 4, 2019

the creators of this software have no respect for user consent

If that was as blatantly true as you keep claiming, then there wouldn't be any opt-out button at all. IIRC that opt-out option has been there from the very beginning.

This will need to be forked.

Yup, nothing at all stopping you doing that. Hooray for Open Source 🙂

@rradar
Copy link
Author

rradar commented Oct 4, 2019

@vengerst "What user consent is about? The data when writing? I think etcher should have access to the data" [...]

Is like I'm buying a pen and everything I write with it should be accessed by the vendor? Hello?

@lurch [...] "there wouldn't be any opt-out button at all. IIRC that opt-out option has been there from the very beginning."

This opt-out button which never really worked and "accidentally" leaked information about your habits to not less than 9 servers including GAF even when it was turned to NOT send data?...

image

...you can really see how serious balena takes the users choice (not talking about consent) and privacy 😞

@rradar
Copy link
Author

rradar commented Oct 4, 2019

Thoughts how opt-in could look like:

First start modal:

"Hello we are balena team working hard..... Can we have your data to make the world a better place?"
Yes - No (Their should be really no preference by design or visuals what to choose)

-> If I say no I don't want etcher to make any network connection. (Could ask a second one if etcher would be allowed to phone home to check if a new version is available...)

If a error happens during flashing or using the program a modal could be presented to the user:

"We catch a error. To get a chance solving this you can upload the crash report to the balena cloud now"
Yes - No

Settings: "Error Reports and usage statistics" (initially turned off)

@rradar
Copy link
Author

rradar commented Oct 6, 2019

By the way: I did the opt out setting and etcher still want's to gain (again unintentionally? 😞) access to some cloud and tell about my presence... 👎

image

@rradar
Copy link
Author

rradar commented Nov 4, 2019

Action item: Removing all instances of those connections from Etcher 🤔

@tcurdt
Copy link

tcurdt commented Nov 12, 2019

Just came here because this fancy version of dd created connections to all these hosts:

github-production-release-asset-2e65be.s3.amazonaws.com
api.balena-cloud.com
github.com
www.google-analytics.com
code.jquery.com
api.mixpanel.com
balena.io
assets.balena.io
www.balena.io
d1l6p2sc9645hc.cloudfront.net
stats.g.doubleclick.net

and I just could not believe it. Not cool.
Looking forward to the opt-in. Otherwise I am back to dd.

@lurch
Copy link
Contributor

lurch commented Nov 12, 2019

@tcurdt See the earlier comment where @petrosagg acknowledges that some of these services were being included accidentally.
Are you using the latest version of Etcher?

@tcurdt
Copy link

tcurdt commented Nov 13, 2019

@lurch I am on version 1.5.63. That should be the latest release.
The comment from @petrosagg is nice - but is from .2 months ago.
And IIUC most of it could have been fixed even without a new release.

@thefaj
Copy link

thefaj commented Nov 13, 2019

Yeah, the benefit of doubt went out the window long ago. This app is a cesspool of spyware

@floion
Copy link

floion commented Nov 13, 2019

Yeah, the benefit of doubt went out the window long ago. This app is a cesspool of spyware

It would be nice if you did not use that sort of language. Please edit your comment and be civilized.

@thundron
Copy link
Contributor

@tcurdt Did it show those connections after disabling the analytics as well? Did you try restarting the application after disabling the anonymous analytics?

@thundron
Copy link
Contributor

Asking because as you mentioned correctly, we got rid of some of those connections without releasing a new version (beside the content that comes from our marketing website, such as the success banner), while other connections needed a version update (e.g. mixpanel, whose library was still included beside not being used)

@tcurdt
Copy link

tcurdt commented Nov 13, 2019

Just found the setting that was enabled. Will give that a try. Thanks!

@rradar
Copy link
Author

rradar commented Dec 16, 2019

@thundron wrote in #3006:

Note: disabling analytics doesn't mean disabling network connections, it just means ... disabling analytics

@bboc wrote in #3006:

Since every network connection inevitably exposes the IP address, the remote end can track the IP and run analytics on that.

@bboc wrote in #3006:

GDPR considers the IP address personal data

And ask for users permission before exposing a users IP! That's why we need opt in! To comply with the laws! (same applies to the ads showing while flashing)

I'm still don't get why balena is still having so hard times with the laws even they are aware of the situation (see post from @petrosagg) 😞

@petrosagg
Copy link
Contributor

@tcurdt I just tried this myself with Etcher v1.5.70 and the "Anonymously report errors and usage statistic to balena.io" option disabled and I saw no connections to mixpanel, google, doubleclick, or any other analytics service. Can you retry your test with the latest version?

To be clear, there were connections to our static site, balena.io, which are used to display the featured project and are not sending tracking information.

And ask for users permission before exposing a users IP!

@rradar did Github ask you if you want to log your IP before connecting to Github? No, because this is how the internet works. Etcher loads a small content page from the internet as part of its functionality, you can't do that without doing a TCP connection just like you can't have a webpage on the internet without receiving connections from an IP address

@thefaj
Copy link

thefaj commented Dec 16, 2019

Patronizing, much @petrosagg
Maybe accept that your idea of user experience, privacy, the law, and ethics is different than most of the rest of us.

@petrosagg
Copy link
Contributor

@thefaj you can claim things about "most of the rest of us" as much as I or anyone else can. It's your personal opinion, I'll grant you that much.

Putting that aside, I didn't mean to sound patronizing and I apologize if I sounded that way. I'm pointing out that displaying a webpage inherently includes doing a TCP connection, just like when visiting a webpage. If you have a suggestion on how to do that I'd be very interested to hear the solution and even implement it, but as far as I know you can't load a webpage without a TCP connection from your IP.

@tcurdt
Copy link

tcurdt commented Dec 16, 2019

@petrosagg we can argue about the stupidity of GDPR as much as we like - but it is a reality we live in. Of course it can establish TCP/IP connections - but in the EU it now means then there should be a lot of legal information provided to the user for doing so.

Legal issues aside - why a fancy version of dd requires to load a webpage from a remote location is beyond me. Until this issue I never realized it won't work offline.

@petrosagg
Copy link
Contributor

@tcurdt to be clear I don't think GDPR is stupid, in fact I quite like it and even used my rights as an EU citizen. From our talk with lawyers it didn't sound like there need to be a notice just for loading content.

On loading the webpage, it is not a requirement to write the image. If you attempt to write one with no internet it will work just fine. We use the webpage to display a featured project while the write is happening. The featured project is a DIY project, usually with a raspberrypi, that our team has created for the users of etcher. For example currently it walks you through making a bluetooth sound receiver that connects to your stereo.

We believe that this is high quality content that helps both the users by presenting an interesting project and our organisation to continue funding the development of this project.

@tcurdt
Copy link

tcurdt commented Dec 17, 2019

I guess there is a difference in typing in an URL and clicking a link or just loading a resource from an application. And at least for every 3rd party one should provide information what is happening with that kind of information. Even if it is "we are storing nothing" - but IANAL.

I like the the idea of GDPR but I am not a fan of the implementation - so to speak. Anyway!

Thanks for clarifying about the offline support.

@rradar

This comment has been minimized.

@gafarma89

This comment has been minimized.

@whmountains
Copy link

whmountains commented Aug 31, 2020

This thread has been dead for a few months, but I want to step in and add my voice to say that not all balena users feel so strongly about this.

I use etcher and balenaCloud on a regular basis, and am glad the Balena team is tracking crash reports and usage data. If they keep it anonymous, and it helps them make etcher faster and more stable, then I approve. As for the tutorial, well, that seems like a way to fund development. People who decide to build the projects will get an experience with the Balena platform and may decide to buy paid services at some point. I have read the devblogs, and it sounds like a surprisingly large amount of work went into making Balena stable and fully cross-platform. To me promoting actually useful tutorials seems like a pretty benign way to make money compared to some of the alternatives like targeted advertising.

For those of you asking why Etcher uses more than 400Mb, it is because it is built with Electron which means most of Chromium gets bundled into each app. In exchange for this size trade-off you get the ability to write the app with HTML, CSS, Javascript, and familiar browser and Node.js APIs. This is what allows a small company like Balena, who are not really in the business of making desktop apps, to put out something of really high quality like Etcher. I long for the day when a more lightweight framework to build apps with web technologies comes on the scene, but until then I would rather take a bloated app rather than nothing at all.

To @rradar, @thefaj, and others. You make some good points, and I almost agree with you that the tracking should be opt-in. But I disagree with your tone in this discussion. You are demanding that the Balena team respect what you consider to be your "rights" in a very disrespectful manner. It seems like your verbal abuse is getting in the way of persuading people. I think you might have been able to push me, and possibly the balena team, over the fence into the "no tracking" camp if you had presented your viewpoint more tactfully.

Cheers!

@thefaj

This comment has been minimized.

@sneak
Copy link

sneak commented Aug 31, 2020

@whmountains

I use etcher and balenaCloud on a regular basis, and am glad the Balena team is tracking crash reports and usage data. If they keep it anonymous, and it helps them make etcher faster and more stable, then I approve.

That's fine, you're more than welcome to consent to the tracking of your crash reports and your usage data. You are not in a position, however, to consent for the data of other people who are not you.

@thundron
Copy link
Contributor

thundron commented Sep 1, 2020

I think we made our case pretty clear, there's no need to keep the discussion going.
Thank you to everyone involved in it

@thundron thundron closed this as completed Sep 1, 2020
@balena-io balena-io locked as resolved and limited conversation to collaborators Sep 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

12 participants