Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDPR: Make stats export more functionable while keeping users safe #5398

Open
4 tasks
AenBleidd opened this issue Oct 16, 2023 · 64 comments
Open
4 tasks

GDPR: Make stats export more functionable while keeping users safe #5398

AenBleidd opened this issue Oct 16, 2023 · 64 comments

Comments

@AenBleidd
Copy link
Member

Recently there was a discussion about the stats export and the GDPR (among the others me and @davidpanderson were a part of this discussion).

We identified that initially the implementation of the GDPR compliance was made too strict, that prevents data aggregators to show the correct statistics of the BOINC network.
In particular, new users will never be shown in any statistics export by default if they not enable stats export manually.
However, statistics that is exported by the BOINC Projects contains no personal information (with two exceptions explained below) that might identify the BOINC user is one or another way.

Thus I propose next:

  • Enable statistics export of all users and all hosts by default
  • Rename current statistics export option to 'Not include personal information into the exported statistics' (better wording here is required)
  • For those users who don't want to export their personal information, do not include 'name' and 'url' fields (or make them empty) when doing users stats export
  • For those users who have their hosts hidden, do not include <userid> tag (or make it empty) when doing hosts stats export (mostly duplicate ticket shouldn't export credit stats if hosts hidden? #3766 is closed)
@davidpanderson
Copy link
Contributor

Have we decided what constitutes private information?
For example: user name.
When we ask for it, we call it 'screen name', which implies that it's public.
It's shown on the project web site, and can't be hidden.
Similar with URL, country, and user ID.
We show these on the project web site -
why not show them in stats export too?

@brevilo
Copy link
Contributor

brevilo commented Oct 17, 2023

We identified that initially the implementation of the GDPR compliance was made too strict, that prevents data aggregators to show the correct statistics of the BOINC network.

Devil's advocate speaking now: "so what?". The aggregators aren't a necessary component that a BOINC project needs to fulfill its service. It's a totally independent entity that can't expect anything from a BOINC project or its registered users by default. Just to make sure: I'm not arguing about what might be nicer for the BOINC ecosystem or not; I'm just discussing this from a GDPR perspective, since that's the legal basis, whether we like it or not. More on the issues related to data transfers and defaults below.

Rename current statistics export option to 'Not include personal information into the exported statistics' (better wording here is required)

This at least needs to be accompanied by a privacy policy that details it the other way round: you have to say what you do export, not what you don't. As is, such an opt-out violates GDPR's "transparency" and "data protection by design and by default" principles.

We show these on the project web site - why not show them in stats export too?

There's a difference between agreeing to publish certain details on a single project one willingly signed up for and having those details transferred to various third parties - without consent. Opt-out doesn't constitute an informed consent. Also, from the data controller's (i.e project's) point of view, what is the legal basis for this data transfer, if not consent? Legitimate interest? Very unlikely (see first statement above).

Furthermore, exactly what kind of entity is a stats site (or any stats export recipient) in the GDPR framework? Since a project ("controller") transfers data for a kind of service augmentation (or does it really?), it's probably a "processor", potentially in a third country or even outside the EU. Just have a look at the can of worms that already opens.

Have we decided what constitutes private information?

This isn't really up to us to decide; there's a legal definition for it. Screen name, URL, country and user ID can potentially be used to identify a natural person, indirectly or even directly. So we need to tread carefully here since that definition is, as with many legal definitions, not cut crystal-clear or tailored to every single use case. How could it be? That's why we have courts after all...

As a project I don't want to put myself into a tenuous position just by hoping for my correct definition of personal data. If you think that's overly cautious, let me tell you that there's an industry out there which sole business it is to find data controllers with loopholes in their data processing, privacy policy, etc. - and sue them for profit, which works because of GDPR's large fines. This is the reason why we have such an elaborate, yet transparent and comprehensible privacy policy.

Honestly, coming back to the beginning, why should I take that risk? For the data aggregators only? I mean, if my users want to export their data, they can and will do it, by informed consent. If not, then they don't want to, or they simply don't care. How does that constitute a problem? In other words: what's the actual problem you're trying to solve that's worth wandering into such treacherous territory?

@AenBleidd
Copy link
Member Author

@brevilo

This at least needs to be accompanied by a privacy policy that details it the other way round: you have to say what you do export, not what you don't

That is a very good point. We have a document that describes the data we are exporting: https://github.com/BOINC/boinc/wiki/XmlStats
In this particular ticket I highlighted the exact parts that we're gonna change.

This isn't really up to us to decide; there's a legal definition for it.

This is true however the data that we're exporting is a non-personal data, and it can't be used to identify the person using it.
You need users' consent to export personal data.

If you can point which data from your perspective is a personal data except the one I already mentioned - please do that.

In other words: what's the actual problem you're trying to solve that's worth wandering into such treacherous territory?

Currently, the way the GDPR compliance was implemented, it prevents us from seeing the picture of our users.
The way BOINC designed, is that it's a distributed network of a project, and we can't see who are our users without asking them to export the data that is not a sensitive personal data: we don't see how much users have some particular OS, etc.
You can identify the user by their stats, you can't do that using CPID as well, there is no connection between the data we have and any personal data of our users.

@davidpanderson

Have we decided what constitutes private information?

The data I highlighted in the one that could contain personal information about our users. That doesn't mean that it has it, but it might have it, that's why it's important to not export it by default.

@bema-aei
Copy link
Contributor

bema-aei commented Oct 17, 2023

In my understanding the GDPR doesn't allow an "opt-out", any personal data which is publicly visible and in particular shared externally needs to be hidden by default. "personal data" here means anything that could possibly be traced back to or help to identify a person.

I don't think it is a "valid interest" of "aggregators" to gather information about each and every host or user that doesn't want to share it, not even for the time between signing up for a project and finding out how to hide his information. Aggregators will not delete any such information once they have it (unless explicitly asked for, which is another hurdle).

What exactly is the goal here?

If the goal is to just gather statistics of e.g. number of host, users, total credit and RAC of a project or the whole of BOINC, projects could publish that aggregated statistics and stats sites could show that without violating the GDPR, as these allow no tracing back to individuals (well, for projects with a reasonable number of participants).

@brevilo
Copy link
Contributor

brevilo commented Oct 17, 2023

If you can point which data from your perspective is a personal data except the one I already mentioned - please do that.

What I'm trying to get across is that this isn't only about those data that I already deem to be personal data, but also those that might become personal data when combined with any other data out there. For example, for the user stats I would exclude not just name and url but also id, country, cpid, teamid and has_profile. I'm not saying those data clearly are personal data but I can't confidently deny they could ever be used to help identify a person. There have been enough examples of such cases, even without LLMs and security/data breaches.

Currently, the way the GDPR compliance was implemented, it prevents us from seeing the picture of our users. The way BOINC designed, is that it's a distributed network of a project, and we can't see who are our users

Understood, but in terms of GDPR this is irrelevant. I understand "us" and "we" in your statement as referring to the BOINC community as a whole or the BOINC (software) project itself. Please understand that those aren't the data controllers. It's the individual projects who are the data controllers and who thus carry all duties, responsibilities and legal consequences. Strictly speaking, I could even argue that the current proposal might loosen important legal obligations for projects downstream, potentially without them being fully aware.

I don't think it is a "valid interest" of "aggregators" to gather information about each and every host or user

The correct GDPR term would be "legitimate interest" but, as I said above, that still misses the point. They could gather their own data, on their own legal basis. But what we're discussing here is that the projects transfer those data to them. That's an entirely different scenario.

projects could publish that aggregated statistics and stats sites could show that without violating the GDPR

Exactly. And that's what we do. And we add those individual users who gave their informed consent. I still think that this (the current situation) is most in line with the principles of the GDPR.

@bema-aei
Copy link
Contributor

bema-aei commented Oct 17, 2023

projects could publish that aggregated statistics and stats sites could show that without violating the GDPR

Exactly. And that's what we do.

I don't think so, at least not in the stats export. Currently this includes only host and users which have given their explicit consent, but no overall project statistics.

Einstein@Home publishes some statistics on the server status page, but this requires additional knowledge (e.g. how the computing power is actually derived from the project's RAC) and is not standard among projects.

@AenBleidd
Copy link
Member Author

AenBleidd commented Oct 17, 2023

@brevilo,
id, cpid, team_id, etc can't be used to get the personal data of any user. This is not a SSN or similar. These are just identifiers that can identify the set of data but not the person behind it

@bema-aei
Copy link
Contributor

bema-aei commented Oct 17, 2023

Regardless of the GDPR I find it nowadays questionable to require anyone to share any information that he might not want to share (for whatever reason), here in BOINC not only with the single project he is in direct contact with, but also beyond that. At least for him this is linked to him personally, and if e.g. it's a pretty unusual host, it can still be traced back to him.

Instead of opening all information that we consider not to be personal to virtually everyone, at the risk of violating GDPR or scaring people away, I'd rather like to know what exact information (OS? CPU?) on what level (project, BOINC) is lacking, and find a way to collect it in a way that is certainly compliant to GDPR and individual people's preferences.

So, what information do you think is necessary and missing? (@brevilo : What's the official equivalent for "Datensparsamkeit")?

@bema-aei
Copy link
Contributor

bema-aei commented Oct 17, 2023

I think my position boils down to: BOINC is volunteer computing, and if we want to retain volunteers, we should try to satisfy their wishes and needs before ours (here: do what we might be allowed to).

@RichardHaselgrove
Copy link
Contributor

computing power is actually derived from the project's RAC

Which is reverse engineering at its worst, because RAC (and credit as a whole) is neither controlled or normalised.

just identifiers that can identify the set of data but not the person behind it

User ID and Team ID together could be used, in most cases, to identify the user name and team name. If the team allows open joining (and many of the big ones do), a bad actor could join the same team and see that member's postings in the team message board - where, in my experience, people may feel more relaxed in disclosing personal information. My team has certainly organised "in real life" meet-ups for drinks in a pub, or weekends in the hills.

@ballen4705
Copy link

Recently there was a discussion about the stats export and the GDPR (among the others me and @davidpanderson were a part of this discussion).

I was not party to those discussions. I am confused about both the context and the details. I would be grateful if proponents could address the following.

To get global stats, exporting aggregate data is sufficient, and removes GDPR risk. Who is proposing to change this, and why?

If we go beyond exporting aggregate data, then GDPR adds complications:

  • Which part of stats data (example: CPID) is personal data? (The only way to know for sure is via a legal court decision.)

  • Relevant for defining "personal data": how might it be combined with other data (not necessarily from our projects) to identify individuals.

Since we can not say for sure which data is "personal", the sensible approach is to stay on the safe side of GDPR, which argues for opt-in rather than opt-out. This is also consistent with GDPR core principles such as transparency and data minimization. Another motivation to stay on the safe side: GDPR imposes additional requirements about data transfer to third parties (such as stats sites), especially if they are outside the EU.

Cheers,
Bruce

@brevilo
Copy link
Contributor

brevilo commented Oct 17, 2023

I don't think so, at least not in the stats export. Currently this includes only host and users which have given their explicit consent, but no overall project statistics.

@bema-aei Just to make sure, I'm talking about tables.xml which we do publish and which contains aggregate figures. Does that not include all users and hosts? For instance, nusers_total appears to match our "participants with credit" (on the SSP), so that would almost certainly include users who haven't opted-in the stats export, I think.

@brevilo
Copy link
Contributor

brevilo commented Oct 17, 2023

What's the official equivalent for "Datensparsamkeit")?

@bema-aei It's "data minimisation" in conjunction with "purpose limitation" and "storage limitation".

@bema-aei
Copy link
Contributor

bema-aei commented Oct 17, 2023

For instance, nusers_total appears to match our "participants with credit" (on the SSP), so that would almost certainly include users who haven't opted-in the stats export, I think.

Oh, you're right. I thought this was just the number of the entries in user.xml (which it probably was before user.xml was filtered by consent). My bad. So there is already some aggregated statistics that we publish.

So again: what's missing and desired? And what for?

@brevilo
Copy link
Contributor

brevilo commented Oct 17, 2023

id, cpid, team_id, etc can't be used to get the personal data of any user. This is not a SSN or similar. These are just identifiers that can identify the set of data but not the person behind it

@AenBleidd Sorry, but I beg to differ. They probably aren't readily personal data on their own but they might be used with other data, as others said before as well. With regards to the GDPR, unless I can't prove that the cpid can't be used to help identify a person, I really want to treat them as (potentially) personal data and thus as confidential. Since I obviously can't prove that, that's what I'd do.

Also, like @bema-aei just said, I think we ought to ask those questions the other way round: why should I publish id and team_id anywhere? Those are internal identifiers that have no meaning on their own. Yet they will get meaning when combined with other data in some ways - including those I didn't yet think of.

Bottom line: we should only ever store, process and transfer data that serve a purpose (for the data controller's services) and that we (as projects, a.k.a data controllers) have a legal basis for. That's the spirit and purpose of the GDPR and adhering to these will make the difference if someone sues you. We as the ones bearing the actual responsibilities, together with experts in the field, have spent a considerable amount of time on this topic over the years.

I hope BOINC does not weaken the current implementation for the sake of (smaller) projects who can't afford dealing with this matter on their own on such level of detail. Please help them to reduce their attack surface as much as possible by keeping safe defaults.

@AenBleidd
Copy link
Member Author

Ok, let's think about this:
If we take an id, what kind of personal information could you take from it?
Email address? No
Real name? No
Phone number? No
SSN? No
IP address? No
These five above are the personal and sensitive data. But not all the other information.
You have a very unique host? You can hide it, and it won't be exposed.
You need unique anonymous id to avoid duplicates.
And you can't use the aggregated statistics, because 100 users of the Project A and 100 users of the Project B doesn't give you the real number of unique users.

@bema-aei
Copy link
Contributor

bema-aei commented Oct 17, 2023

@AenBleidd could you please elaborate on what statistical information you are missing and for what purpose you need that? You will have to specify that anyway for the data policy declaration. Remember that by GDPR you are required to not collect and process (let alone publish) information that doesn't serve a legitimate and documented purpose.

After that we may discuss how to collect that without violating the GDPR or the wishes and standards of our volunteers.

Stretching and bending the regulations of the GDPR or its interpretation to suit (currently undisclosed and possibly even future) desires of some of us (which also aren't disclosed to me yet) is something I consider the wrong approach and that I don't really feel comfortable with.

@AenBleidd
Copy link
Member Author

@bema-aei, @brevilo

could you please elaborate on what statistical information you are missing and for what purpose you need that?

That is quite a clear answer, from my point of view: we need to know:

  • number of unique users
  • number of unique devices/hosts
  • number of users in a team/without a team
  • world distribution (number of users in every country)
  • age of the account (when the account was created)
  • total amount of credits of the user/host
  • average amount of credits of the user/host
  • OS type of the host
  • OS version of the host
  • CPU type of the host
  • NUmber of CPUs of the host
  • GPU types of the host (if any)
  • Number of GPUs on the host (if any)
  • BOINC version of the host
  • VirtualBox version (if installed) of the host
  • RAM available on the host
  • Hard Disk space available on the host
  • User ID (index in the database, not unique across the projects)
  • User CPID (needed to avoid data duplication, MD5, no personal data can be retrieved from it)
  • Host ID (index in the database, not uniques across the projects)
  • Host CPID (needed to avoid data duplication, MD5, no personal data can be retrieved from it)

As you can see from the list above, none of the information could be used to identify the person. What is more important, all this information (excluding User CPID and Host CPID) is publicly shown on any Project, but none of it really contains any personal data. The only field (that is not in the list above but was mentioned in the original message) is the 'name', that is actually not the real name of the user (unless they put it there) but a screen name, that could be literally anything. You can go to your profile, put there my name, but this will not make this account mine, and not will impersonate me in any way.

Remember that by GDPR you are required to not collect and process (let alone publish) information that doesn't serve a legitimate and documented purpose.

We're not gonna collect any other information but just the one that is already there and collected for years. And still, there is no any personal and/or sensitive information here.

unless I can't prove that the cpid can't be used to help identify a person

BOINC doesn't collect any personal information, so you can't use CPID to get any of it. CPID is the unique identifier that have sense only within the BOINC, but since there is no personal information in BOINC - you can't identify the user by using their CPID.

I think we ought to ask those questions the other way round: why should I publish id and team_id anywhere?

ID and TEAM_ID are just the indexes in the database, and they are even not unique across the projects. You can basically use the enumeration and load project page to get all the profiles of the users in that particular project.
Exporting this data will not disclose anything that is not available publicly now.

we should only ever store, process and transfer data that serve a purpose

That's a very correct point! We have a data that is anonymous and have a purpose. This information could be used by BOINC to get a clear picture about our userbase, and provide them a better service, also this data could be used by third-party data aggregators to show a valuable but still anonymous statistics (and possibly do some other useful stuff). Exposing this information doesn't open any vulnerabilities, and can't target any BOINC user in any way.

We as the ones bearing the actual responsibilities, together with experts in the field, have spent a considerable amount of time on this topic over the years.

At the moment of the GDPR implementation, it was read incorrectly, and all the information was treated as personal (including the posts on the forums) but eventually it was clearly defined what is the information is personal and can't be exported and what is the non-personal information. All the data I wrote above is not personal and completely anonymous.

Which part of stats data (example: CPID) is personal data?

None of it: https://europa.eu/youreurope/business/dealing-with-customers/data-protection/data-protection-gdpr/index_en.htm#shortcut-2

User ID and Team ID together could be used, in most cases, to identify the user name and team name.

You don't need exported data to get this information, you can just go to the Project web page and scrape the data by just going from '0' to 'infinity' as the ID and/or TEAM_ID.

a bad actor could join the same team and see that member's postings in the team message board

Bad actor can do that without using the exported data, because in this case it will not give any additional personal information to them. E.g. you will know that one of the users has one of the hosts running Windows 10. So what? WIll you target them with the ads to buy MacOS or what?

What I have seen from the initial discussions here on the first implementation of GDPR, is that people had no understanding what is the GDPR about. Now, years later, this topic became more clear and obvious. Even now I see that some of you are very scared about this, but if you dig a little bit deeper into the topic, you will clearly see that there is nothing scary at all, and that the GDPR is not so strict as you think about it.

And please keep in mind one very important topic: BOINC is providing software solution that is open source, and thus you should this proposal as an optional recommendation. Yes, we plan to implement this change, but if any of you thinks that it's too dangerous for you and you're too afraid of the GDPR - you may not follow it, and patch it to be exactly the same as before.

@bema-aei
Copy link
Contributor

bema-aei commented Oct 17, 2023

That is quite a clear answer, from my point of view: we need to know:

While I can guess the reason for collecting some of that data across projects (RAC, number of hosts and users), the purpose for most of these is not clear to me. Why e.g. has the internal ID of a host or user that has absolutely no meaning outside the project to be exported elsewhere? Most of these data items make sense to me in the context of the project, mostly for assigning "work" to a host, but what is e.g. the RAM or disk space available at last scheduler contact of a host needed for outside the project?

@AenBleidd
Copy link
Member Author

@bema-aei,

Why e.g. has the internal ID of a host or user that has absolutely no meaning outside the project to be exported elsewhere?

I might agree on this, but from the other hand this is a completely anonymous information, and can't make any harm. You can find a good use of it if needed (even if I currently don't see a good example how it could be used).

but what is e.g. the RAM or disk space available at last scheduler contact of a host needed for outside the project?

You can use the 'last scheduler contact' to get a list of the users that were active between two data exports, and thus you can see the dynamics (e.g. if you see that yesterday there was 100 people active and today 100 people active, that doesn't meant that these are the same 100 people, maybe it's 80 same people, 20 of them gone and there are 20 completely new people).

Speaking about the RAM and hard disk space, imagine you want to run a completely new project. And you want to understand, if your application uses 10 GB of RAM, are there will be enough users that could run your Project? Of if you want to save 100GB of data, are there any sufficient amount of the users who could ever run this Project's application?

@bema-aei
Copy link
Contributor

The GDPR requires you to collect, process and in particular export only information that is needed for documented legitimate reasons and purpose. The possibility of a purpose you may think about in the future isn't a valid justification IMHO, and the fact that you can't think of any harm that it may do certainly isn't either.

@bema-aei
Copy link
Contributor

The currently available RAM and even more so disk space doesn't help you much when deciding upon a new application or project, there are e.g. user preference settings that influence what is actually usable etc., and a lot of information that I consider even more important (CPU features, GPU properties like CUDA CompCap or OpenCL level) aren't even stored in the DB.

@robsmith1952
Copy link

First a question:
Has anyone involved i this discussion spoken to the Information Commissioner in an EU country?
Background to the question:- I spent some time talking to a couple about the use and storage of videos and found them to be very clear and helpful in establishing the local "policies" which set out the what, how and when for the recordings, these discussions also showed how many people have wrong ideas about GDPR's scope and intent.

@AenBleidd
Copy link
Member Author

@bema-aei, you are missing the point of GDPR. GDPR is trying to minimize the amount of personal information that is collected. Statistics is not a personal information.

Could you please explain to me which "service" you could not provide to the volunteers when you don't get their internal ID of a specific project?

We (BOINC) need to be able to identify every user anonymously but in a unique way. We can discuss this, and probably you might be right, and CPID would be enough for our purposes, but still: id is just an identifier, and it doesn't disclose any personal information.

Or the amount of disk space a certain host has available at a certain time?

Better understanding of the userbase. I already explained this above.

BOINC only becomes a meaningful when projects deploy it

Do you know how much projects implemented GDPR on their side during last 5 years? You will be very surprised with the numbers.

@bema-aei
Copy link
Contributor

bema-aei commented Oct 18, 2023

Better understanding of the userbase. I already explained this above.

And what service for this userbase is that detailed understanding required for? Facebook also has a vital interest in understanding its userbase, which doesn't make that desire legal or beneficial to the users.

@bema-aei
Copy link
Contributor

bema-aei commented Oct 18, 2023

In my understanding your "proposal" would violate the GDPR in at least two aspects:

  1. opt-out (for publishing and in particular transferring) any data is not a legal option by the GDPR. Giving no option would be, but that would be worse.
  2. You are collecting more data that is required to provide the service that you offer and document.

@bema-aei
Copy link
Contributor

I'm talking here about the default behavior of the BOINC software that in the current implementation makes more harm than something useful and protective for our users.

What harm exactly does this "current implementation" do to which "users" here?

@bema-aei
Copy link
Contributor

And if you change something for the projects that already implemented the GDPR in the current way, they will need to gather consent again for that change from every volunteer. I don't think that this will ultimately achieve what you want, at least for quite some time you will rather get less data than more.

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

You asked for our opinions on this and I tried to my make case, referencing the GDPR to contextualize the issues I see. We apparently continue to have a fundamentally different understanding on the subject and I don't need to convince anyone. Rest assured, we as a BOINC project and data controller, together with our data protection officer, have spent a considerable amount of time on this topic over the years - and continue to do so.

In this discussion I tried to focus on data transfers and the legal bases for those, in a context where the personal nature of data is at least (obviously) debatable. We, as a project and (sole) data controllers, reached our conclusion on how to implement those in accordance with (our interpretation of) the GDPR and in the interest of our users and their data. We don't have a reason to change that.

You are of course free to change whatever you want in BOINC itself. But again, you asked for our opinions. My opinion on this proposal is that it would be a mistake. Not for us but primarily for new and/or smaller projects without the resources to tackle the ins and outs of the GDPR on the level we have.

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

  1. opt-out (for publishing and in particular transferring) any data is not a legal option by the GDPR

To be fair, that's incorrect. The GDPR only comes into play when personal data are involved.

@AenBleidd
Copy link
Member Author

Facebook also has a vital interest in understanding its userbase, which doesn't make that desire legal or beneficial to the users.

Facebook uses the data to advertise their users with a personalized ads. BOINC doesn't do that, and the data we are collecting can't be used for this purposes.

And what service for this userbase is that detailed understanding required for?

BOINC development.

You are collecting more data that is required to provide the service that you offer and document.

No. We are not asking our users to disclose any kind of the personal information.

What harm exactly does this "current implementation" do to which "users" here?

We have no clear picture of our users. E.g. do we still need to support Windows7/Android4/etc? We can't answer this question because we don't know exactly how much users we have with this particular OSs. Yes, every Project has their own statistics, but it's distributed, and, as I described above, 100 users of the Project A and 50 users of the Project B doesn't give us an exact number of unique users.

And if you change something for the projects that already implemented the GDPR in the current way, they will need to gather consent again for that change from every volunteer.

We don't need a users' consent to show anonymized data.

Again, you're mixing personal and non-personal data and treat all the data as a personal one. This is not correct.

Rest assured, we as a BOINC project and data controller, together with our data protection officer, have spent a considerable amount of time on this topic over the years - and continue to do so.

Obviously, both you and your data protection officer have incorrect understanding of the personal information. As I already told: personal information is clearly defined. All the information, except the 'name' and 'url' that might contain personal information, is not a personal information, and it can't be used to identify the user. You can't identify your user by the CPU they used, you can't do that by the amount of RAM available, etc.

We, as a project and (sole) data controllers, reached our conclusion on how to implement those in accordance with (our interpretation of) the GDPR and in the interest of our users and their data.

Obviously, the conclusion is not correct, because you treat every piece of the information as a personal information, and this is again, not correct. And this strictness is the topic users are complaining about. User want to participate in the competition, but in order to do that, they need to manually allow statistics export. If they forgot to do that - they're fucked, and they will just lose their points.

We don't have a reason to change that.

I see the clear reason: we (BOINC) want to make BOINC better for our users, and the incorrect implementation of the GDPR prevent us from doing our job. That is why we have initiated this discussion, that is definitely currently goes in incorrect direction.
I ask you to reconsider your decisions and align it to the law. For me it looks like you're so afraid, that you don't even want to touch this topic. This is definitely not the correct behavior.

To be fair, that's incorrect. The GDPR only comes into play when personal data are involved.

And this is a very correct point. But do you clearly understand what data is personal and what is not?

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

And if you change something for the projects that already implemented the GDPR in the current way, they will need to gather consent again for that change from every volunteer.

It's not just that. I see an even greater danger here. Above I wrote "strictly speaking, I could even argue that the current proposal might loosen important legal obligations for projects downstream, potentially without them being fully aware." What I meant by that is, that, depending on how this proposal gets implemented, projects downstream could pull those changes (as part of their regular updates) and have their opt-in policy changed without being fully aware of that. Yet only them are legally accountable for what happens on their site. That means any such change must not alter the current opt-in in policy when pulled downstream. IOW, projects must always switch to opt-out explicitly, if they they choose to.

@bema-aei
Copy link
Contributor

Do you know how much projects implemented GDPR on their side during last 5 years? You will be very surprised with the numbers.

I don't think the pure number of projects matters much here. When we first implemented the GDPR, the five largest projects (SETI, CPDN, WCG, LHC/Cern, E@H) were actively involved and did implement that changes. Three of these were based in the EU (at that time), IBM (WCG) had a vital interest in conforming to EU regulations, and SETI doesn't exist anymore. These are the main projects that "aggregators" get by far the most data from, and all these are GDPR compliant for their own vital interest. From your point I would care more about the number of users than the number of projects.

@AenBleidd
Copy link
Member Author

AenBleidd commented Oct 18, 2023

From your point I would care more about the number of users than the number of projects.

You know what is the most interesting? I don't know how much users decided to not export their data. We just don't have this information. Yes, you as a Project have this information, but we (BOINC) simply have no way to get this.
And this is a problem with the current implementation of GDPR. We (BOINC) just have no access to the vital information, and since this information is not personal one, that makes no sense to keep the current implementation, and thus it need to be changed.

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

We have no clear picture of our users. E.g. do we still need to support Windows7/Android4/etc? We can't answer this question because we don't know exactly how much users we have with this particular OSs. Yes, every Project has their own statistics, but it's distributed, and, as I described above, 100 users of the Project A and 50 users of the Project B doesn't give us an exact number of unique users.

Well, as @bema-aei just said, the vast majority of the BOINC ecosystem can be represented by a few projects and those could be asked to report their statistics to a get a statistically significant overview. I don't see why you need the "exact number of unique users" where relative shares of, say, OSs or histograms of disk sizes should be sufficient.

Obviously, both you and your data protection officer have incorrect understanding of the personal information

As I said, that's a pretty bold statement. I won't engage in this part any further.

And this strictness is the topic users are complaining about. User want to participate in the competition, but in order to do that, they need to manually allow statistics export.

I'm not aware that this is a thing in our project's community. Also, every user had been made aware of the opt-in prior to the roll out and it's written in our privacy policy. There's also no loss of data, since all absolute values are exported as soon as the user opted in to export those.

@bema-aei
Copy link
Contributor

We (BOINC) just have no access to the vital information,

You don't have that information because people decided not to share it (with you). I can hardly find anything wrong with that. If you think that BOINC development relies on that information, then tell that to the people that you claim to develop for, and base BOINC development on the information you got.

In my research on E@H I understood that the people that contribute by far the most to the project are competitive and like to see their hosts published. There might be others, but whatever their reasons are for hiding their hosts, these don't contribute much to the productivity of the project.

@AenBleidd
Copy link
Member Author

Well, as @bema-aei just said, the vast majority of the BOINC ecosystem can be represented by a few projects and those could be asked to report their statistics to a get a statistically significant overview.

I don't agree with this statement even if this is currently correct.
SETI@home was probably to biggest project, but it's gone. WCG was probably the second, after it was moved to Krembil and started to have constant issues, I see a significant loss of their users.
If tomorrow E@H or LHC@H will stop operating - you will not represent the majority anymore.
And this is the issue. BOINC is a distributed system we can't rely on the numbers of any particular Project.

I'm not aware that this is a thing in our project's community

Maybe this is a communication problem on your side. Because I saw a lot of complains.

You don't have that information because people decided not to share it (with you).

Please clarify: is this the decision of your users or this is a decision that you made on their behalf?

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

Do you know how much projects implemented GDPR on their side during last 5 years? You will be very surprised with the numbers.

Not sure what you're trying to tell me here and I fail to see how this could be relevant. As @bema-aei said, you should specifically ask CPDN, WCG, LHC/Cern for their takes in this topic. We were all involved in the original implementation. Which of those (most relevant, in terms of total participants) projects has reverted its stance on opt-in vs. opt-out or plans to do so? This isn't to disrespect the many smaller projects doing very important research but they're frankly not that relevant with regards to your goal of gathering of representative figures for the BOINC ecosystem as a whole. If those projects above don't move to opt-out, your figures are arguably not that meaningful and this whole proposal can't instill a real benefit.

@AenBleidd
Copy link
Member Author

you should specifically ask CPDN, WCG, LHC/Cern for their takes in this topic

And this is what I'm currently doing.

If those projects above don't move to opt-out, your figures are arguably not that meaningful and this whole proposal can't instill a real benefit.

This is a correct point. And again that is why I initiated this discussion.
We (BOINC) develop a software, but we can't dictate you (Projects) how to use it.
But from my side, we need to reevaluate the current GDPR implementation to make it more usable for everyone but still keeping the users of the safe side.

Yes, you're a big project, but you can't dictate either how the whole network should operate.
Your opinion is important and very appreciated, however I'm asking you to make some efforts and to take just another look on the current situation.
If you made some decision in the past, that doesn't mean that it's now written in stone, and can't be changed. That also doesn't mean that you decision was correct, because at the very early stage when GDPR was announced, people had completely wrong understanding of it, especially about the definition of the personal data.

@bema-aei
Copy link
Contributor

Obviously, both you and your data protection officer have incorrect understanding of the personal information

Unfortunately it's the lawyers and experts that we will ultimately have to deal with in court, not you who certainly knows it better than all of them.

@AenBleidd
Copy link
Member Author

@bema-aei, I see your point. This is not a healthy attitude, but thank you for sharing it with me

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

you can't dictate either how the whole network should operate.

Where exactly did I do that? I can only recite myself.

I tried to be constructive by getting an understanding of the actual problem and asked questions that could have lead to alternative approaches but got ignored. Now your responses are getting more and more personal and hostile. I won't put up with this and I have nothing more to add.

@AenBleidd
Copy link
Member Author

Where exactly did I do that?

I haven't said you did that. I said that you should not act in this way.

@brevilo, I clearly described what is the actual problem. You continue to insist, that from your point of view there is no problem. I can understand this, but you're not correct telling that since you're the bigs projects, and you represent the majority, we (BOINC) can rely on you only.
I got no particular alternative approaches from you that could solve the current situation.
Also, you insist that your interpretation of the GDPR is correct. I don't say, you are completely wrong. I just want to say that we have definitely a misunderstanding of the definition of the personal information. Definition of the personal information is written in law. I tell you that none of the information we collect is a personal information, but in return I heard just "you don't need this information". I gave an answer to every (or almost every) piece of the information that we collect, and how it could be used. But both you and @bema-aei tells me that this information is useless. I don't agree with you, and you should respect this as well.
Don't get me wrong, but in order to provide a better service for our users, we all need to leave the silo we are in, and review the existing implementation of the GDPR with a fresh look.
I propose a change. You don't want to change because you're completely sure that the decisions made in the past were correct. I know a lot of examples, when GDPR was implemented too strict, and later was reimplemented in a correct way still keeping the users safe. I can't disclose these cases, of course.
But we all need to understand, that the current implementation treats non-personal information as a personal, and this is definitely not correct.
I see that you're not open to this discussion. That is fine, I can understand this. As I mentioned couple of time before, we (BOINC) will not dictate how you must operate. If you don't agree with the change - you are free to not implement it (e.g. as with the Credits 2.0 which also started a lot of fire in the past).

Previously you said that BOINC doesn't exist in vacuo. But this applies to you as well. Please think about it.

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

I haven't said you did that. I said that you should not act in this way.

Doesn't make sense. I have no say over BOINC, let alone other projects.

from your point of view there is no problem

Nope, I do in fact understand the problem you see with the current implementation.

you're not correct telling that since you're the bigs projects, and you represent the majority, we (BOINC) can rely on you only.

I'm not saying we're any better or more important than other projects. My statement only concerned the statistical significance for what you're trying to do.

I got no particular alternative approaches from you that could solve the current situation.

First paragraph for instance.

you insist that your interpretation of the GDPR is correct.

Nope, I said it's our GDPR interpretation and only that is relevant to us because only we are legally accountable for what we do with the data entrusted to us.

we all need to leave the silo

It's solely up to the projects to leave the silo you're seeing them to be in. BOINC doesn't have to leave any silo since it doesn't control any data, so "we" doesn't apply. Also, opt-in doesn't constitute a silo. That would be no exports whatsoever. Opt-in is proactive freedom of choice by user-empowerment.

I can't disclose these cases, of course.

So why mention them at all...

I see that you're not open to this discussion

I'm not open for discussions that resort to personal attacks and where your view is definite and others' expertise is incorrect beyond doubt. I'm not even questioning your expertise on this complex subject matter since that is in fact (legally) irrelevant for this discussion. But I repeat myself and I should really stop that.

we (BOINC) will not dictate how you must operate

I know, and neither do I. I provided my mere opinion, on your request. If it doesn't fall in line with yours, it can't be helped. That's normal. That's life. Take from it what you want or leave it. Consider it as food for thought or not. That's entirely up to you.

Bye


PS:

Credits 2.0 which also started a lot of fire

I don't recall any smoke, let alone fire. If you're looking at us, then we can only say that it didn't and doesn't work for us. Nothing more, nothing less. But please let's not move there now.

@bema-aei
Copy link
Contributor

I think there are two very different approaches:

You were used to collect a certain amount of information, that may or may not be useful and necessary. Now you find you don't get that much information anymore, and identified the implementation of GDPR in some projects as a reason. Rest assured this implementation has been developed over months with advise from lawyers and experts. You're now looking for a way to get that same information again, and try to find reasons why it could be collected, e.g. you now judge this as not personal and thus not relevant to the GDPR. You take it for granted that your understanding of personal data is the only correct one, and everyone else therefore must be wrong. Thus you request to try to change the GDPR implementation, intentionally for all projects. I'd call that risky at least, and the problem is that ultimately it's not you taking that risk, but the projects.

Our approach (as I wrote initially) was to find out what information is actually relevant and needed and in what level of detail, and then find a way to gather that information. We proposed a couple of solutions, but you aren't open to that approach, you just rejected all that.

If you think you have the only solution to what you think is your problem, and anyone is free to implement that or get nothing, why did you ask for opinions at all?

@AenBleidd
Copy link
Member Author

#5398 (comment) for instance.

This is exactly my point: you are saying that you (big Projects) are representing the vast majority, and thus we (BOINC) should rely on your data only:

the vast majority of the BOINC ecosystem can be represented by a few projects and those could be asked to report their statistics to a get a statistically significant overview.

And I don't agree on that, because in this case we are ignoring all the others, and this not the correct behavior from my point of view.

So why mention them at all...

What I wanted to say there, is that you can't say for sure that the decision made 5 years ago are still valid and correct, because at that period of time GDPR interpretation was sometimes wrong, and there is a room for improvements, if we are willing to take another look at the situation.

I'm not open for discussions that resort to personal attacks and where your view is definite and others' expertise is incorrect.

I apologize, if my words were understood in this way. I have nothing personal against anyone. I am open to the discussion and collaboration, but I see a block from your side at almost the very beginning of the discussion:

Rest assured, we as a BOINC project and data controller, together with our data protection officer, have spent a considerable amount of time on this topic over the years - and continue to do so.

By saying this you literally telling that your opinion is the one that is correct. What I'm saying is that your interpretation of the personal information is not correct, because every time I told you the data we are collecting is non-personal, you still continue the discussion in a way like this is indeed a personal information. You haven't provided any arguments why you think this information is personal, and instead questioning me why I need this information. I provided my arguments on why the information in non-personal, and you haven't said anything against, so I assume you agree that this information is non-personal, then why we have this discussion? GDPR is about personal information that directly identifies the user, or mixed information, that could be used to identify the user when used together, but this is not the case we have, because none of the information can identify a user in any way (except the 'name' and 'url'). All the other information (except IDs and CPIDs) is publicly available, and you can agree with me, that if this information would be a sensitive and/or personal one - you need a users' consent to show it on the web page as well. Data export we're talking here about is just the another representation of this information (collected in one place).
We do not transfer this information to any other company in a direct way.

So the my question is: why you think that it's safe to show it (for example, if you take my name, you will find my profile on different BOINC projects indexed by the search engines) but not to export it? Yes, exported data contains some additional information like credits, etc, but this information is not personal either, and you can't identify the user just using these numbers.
The same applies to the hosts' information, but here we have an exception: I directly pointed in the original message that is the user hides their hosts, they will not be exported. I don't agree that you can identify anyone using the hardware information, but it's completely ok to me if the user would like to hide it. All the other information is either put by users completely voluntary (we don't force to put none of it except the name, but it's not the real name of the user but the screen name aka nickname, and in this case we proposed to not export this data if the user doesn't want).

Here's the position of you I don't understand: you see it safe to export the users' data to search engines but completely against exporting it as an XML file. For me it looks like you contradict yourself.
And that's exactly what I meant by saying that you have an incorrect understanding of the term 'personal information'.

I'm not hostile, but your position is very strange from my point of view.
If you don't see the contradiction - ok, that's completely your choice, and I respect your opinion.
I ask you to take another look on the situation and clarify: do you really think that the original implementation was done correctly?
From my points of view - definitely not.
You haven't gave me any argument why your position is correct. You just presented it as a fact.
I don't think this is correct.

@AenBleidd
Copy link
Member Author

You take it for granted that your understanding of personal data is the only correct one, and everyone else therefore must be wrong.

I refer to the law: https://gdpr.eu/eu-gdpr-personal-data/#:~:text='Personal%20data'%20means%20any%20information,location%20data%2C%20an%20online%20identifier

Our approach (as I wrote initially) was to find out what information is actually relevant and needed and in what level of detail, and then find a way to gather that information.

We're not talking about any kind of new information to collect. This is still the same information that is already collected and could be just seen by using any search engine.
So again: you want to tell me that it's safe to expose this information to the search engines but is risky to expose the same information in an XML file?

@AenBleidd
Copy link
Member Author

This is just from the search engine (and I even don't logged in, so this is completely open):
image
image
Aren't this the same information we export to the XML?

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

Rest assured, we as a BOINC project and data controller, together with our data protection officer, have spent a considerable amount of time on this topic over the years - and continue to do so.

By saying this you literally telling that your opinion is the one that is correct.

For hopefully the last time: all I have said pertains to our well-informed interpretation of the GDPR that is relevant to us and to our project only. I'm not here to sell my view to you and, again, I'm certainly not the sole keeper of the universal truth - and neither are you. No one is. Please understand and respect that.

you see it safe to export the users' data to search engines but completely against exporting it as an XML file. For me it looks like you contradict yourself.

That's because we have a different understanding of the GDPR. Please at least respect that there can be different views and interpretations on any given law. Once again, we, for us and for our project only treat the exports as a data transfer. GDPR differentiates between data processing and data transfers, I presume you know. This particular treatment of exports is our considered choice based on our assessment, since it's our responsibility. We chose to employ our users' consent via opt-in to establish the legal basis for those transfers for our project. As such, it follows that this as well isn't a universal truth or rule. Again, it's up to the individual projects to deploy what they see fit for their individual situation.

We're not going to reconcile our views on that one. And that's fine.

It's time for other projects to chime in...

@AenBleidd
Copy link
Member Author

Once again, we, for us and for our project only treat the exports as a data transfer.

Thank you for the clarification. It doesn't contradict what I said before (including the search engines) but it's very important to say it loud and clear to understand the opinions of each other.

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

Sorry, but "data transfer" was mentioned six times in four different comments before (and that doesn't include my citations of my main summary). I presumed that to be "loud and clear". Good that's settled now.

@AenBleidd
Copy link
Member Author

@brevilo, no, I mean, exposing data for the search engines for me and according to the GDPR is the data transfer as well, but looks like you have a different opinion on that. But you treat the XML export as a data transfer, and the most important, is:

we, for us and for our project only

For me this explains why you are against all this changes.

I will not argue about the definition of the 'data transfer' because this is clearly defined in the law. You have your own interpretation of it - I'm ok with that.
I don't want to start another circle of discussion about the understanding of this particular term.

That doesn't mean I agree with you, but again, it's up to you to decide how you interpret the GDPR.

@brevilo
Copy link
Contributor

brevilo commented Oct 18, 2023

Sounds good. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

7 participants