-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EU-GDPR - Right to Erasure #2447
Comments
One major question...
What's the definition of "all of their data"? According to the page
links, it appears that "personal data" is what is covered. Which fields in
the user, host, thread, post and result tables belong to a user? Is a
userid "their data" once any link to an email_address or cpid has been
removed? I would tend say no, yet I would also tend to think that a CPID
is "their data" as is directly identifies a user across projects.
Similarly, in host I would expect that ip_addr, external_ip_addr and
domain_name belong to the users, but nothing else is personal or user owned
information. Most of that information is created by the project for
internal use. There may be projects which require access to other
information in the host table. host_app_version is also, IMHO, not
information that belongs to a user, although its not much use to the
project once a user has left. Posts and threads, I can see that deleting
them all is probably required.
Then there's science data. If host.m_cache belongs to the user, doesn't
that also mean any science results returned are the property of the users
and need to be deleted as well? After all they link back to result.userid.
I think this proposal goes way too far. To delete all of the personal data
for a user...
1. randomize the personal fields (name, email address, cpid, url, etc.) in
user, forum_preferences
1a. Delete any profile images.
2. delete all threads and posts for the user.
3. randomize the IP addresses for the user's hosts.
At that point, all the personal information is gone, unless a project app
is sniffing and storing personal information. No deleted tag is
necessary, dump the randomized strings.
…On Tue, Apr 3, 2018 at 3:09 PM, Kevin Reed ***@***.***> wrote:
For example - I would like the opinion of @brevilo
<https://github.com/brevilo> and @lfield <https://github.com/lfield> if
this implementation is sufficient: #2445
<#2445>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2447 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKXcsniFrxQl5G-sc0hsTrV2iQ9FS4JOks5tk_M0gaJpZM4TF2yB>
.
--
Eric Korpela
korpela@ssl.berkeley.edu
AST:7731^29u18e3
|
@SETIguy you're questions are all relevant and I think I overall agree with your assessments. The problem is, we don't really know for sure. The GDPR isn't fully fleshed out (as most legal text) and certain questions can only be answered after the first legal/court cases got settled. This is particularly true since the ePrivacy regulation was meant to become effective in parallel to the GDPR but won't until 2019.
Any data that relates to an identifiable (directly or pseudonymized) data subject (e.g. via Also, keep in mind that these data are affected by the data subject's right to "data portability" as well. You need to be prepared to hand those data over on request (within a month), in a "structured, commonly used and machine-readable format".
If the
Most of this boils down the question of the lawfulness of the data processing you do. This can be established via different means, the two directly applicable ones in our case should be the "data subject's consent" and "data controller's legitimate interest". The latter can override the former if justified but it's of course much easier to describe your data deletion/retention policy in your privacy policy and include it in what the data subject gives its consent to. Most importantly: whatever you do, do it transparently and document it in your "records of processing activities" (another mandatory GDPR requirement).
Yes, but that's harder than it sounds. What do you do with threads opened by the data subject to be deleted? What do you do with quotes of the data subject's comments? Again, there could be "legitimate interest" to retain those as the discussion would lose coherence (i.e. "for archiving purposes in the public interest, scientific or historical research purposes"), but that's all not entirely clear yet (according to our data protection officer).
This is not about property but data subject rights pertaining to the data subject's data. As soon as you anonymize the tasks/results, e.g. by NULLing the
That might be true but keep in mind external data (see above) that might still allow to derive the original data subject (e.g. via the If in doubt, delete whatever you can. That's probably what we're going to do anyway - cleans/speeds up the DB as a nice side-effect. HTH |
Couple of quick points. When a user was recently deleted (user request), several of us noticed that their private messages were deleted from our inboxes. In a recent check of dates/ID numbers at SETI, I was surprised to find that BOINC users appear to write roughly the same number of private messages to each other, as public messages on the message boards. Whatever David did in that case (probably related to #2445, rather than the GDPR) needs to be included in this discussion too. And what effect will the GDPR have on the "Wayback Machine" internet archiving project? I sometimes refer to that to check on the previous history of a BOINC project. |
@RichardHaselgrove we're going to delete private messages.
That's part of the "erasure notifications" (GDPR Art. 17.2) issue as well as the lawful processing for "archiving purposes in the public interest, scientific or historical research purposes" (GDPR Art. 17.3d) I alluded to above. The former might affects projects (not yet clear) but the latter only affects the Internet Archive itself. |
@TheAspens I'm in the process of reviewing the proposal. Whatever get's done: please separate frontend ( Thanks |
@TheAspens My comments on the proposal:
|
I'm wondering though. Private messages that I received, are as far as I see it mine, no longer owned by the sender. So when the sender wants his account erased, the PMs I got from him have to be left alone, as they're no longer his, but mine. Perhaps if the project has an outbox, that sent PMs in there have to be removed. But most projects just have an inbox and a write PM option. Compare it to text messages, Whatsapp, snail mail. Once the sender sent it, it's no longer his. When he wants his account deleted at a service provider, they won't delete all the text messages he ever sent from other people's devices. When the person stops Whatsapp, only the local account is deleted, but all sent messages will still be on other people's devices. When you mail a handwritten letter to some other person, it's no longer yours as soon as you drop it in the mailbox. So why handle private messages differently? |
Here are my thoughts regarding both BOINCstats and BAM!: For BOINCstats (the stats section) it's enough to just remove the user/hosts from the XML export. During the next import users/hosts no longer existing in the XML will be deleted from the stats. Other stats sites may work differently. BAM! is a little bit more complicated and may also require more to be done on the project side. When a user deletes his account at a project, should that also delete his BAM! data for that project (please keep in mind that BAM! data is not stats data!)? The project doesn't necessarily know that a BAM! account with data for that project exists. If this data should also be deleted, the project should call a (non-existing) BAM! API to do so. Then the other way around: When a BAM! user deletes his account, should it also delete all the linked project accounts? I think this should be a choice by the user. If he chooses yes, BAM! must call a project API (RPC) to notify the project to do so. Then the project can do one of two thing: A) Trust BAM! and delete the account or B) start the deletion process as outlined here. And lastly, the big issue: Sometimes I get requests to remove stats data. Most of the time these emails contain a link to one or more pages on BOINCstats with the request to remove them. It's impossible for me to be 100% sure that the person requesting the deletion is the true owner of that data. It can also be someone trying to get some competition out of the way. I refer these people to the project sites to delete/anonymize their account there. This only works when the project is still up and the admins responding. So far I have refused all requests to delete stats data on my side, however, this may lead to some issues with these new rules. I'm not sure how to handle this. |
I agree that data portability needs to be seperate from the data exports. My proposal does not address the data portabilty requirement and that will need to be addressed in a seperate issue. |
@Ageless93 I doubt that. I don't think the data subject has ownership on any kind of data by default, let alone on data provided by others. The controller provides a service and unless otherwise stated (by the contractual basis, e.g the terms of use) can legally remove any such data.
In that case you might have a physical (or cached) copy but even that doesn't constitute ownership. If all messages were server-based, which they are in BOINC, the service provider (controller) can simply choose to shut down the service immediately, without your consent. Regarding WhatsApp: have you read the terms of use you agreed to? Would be interesting to know what they say on data ownership. |
Projects are required to make sure any upstream/downstream services delete any of the published data as well (GDPR Art. 17.2). In case of BAM! the situation is more complicated, though, as the whole matter of opt-in consent to a given project's terms of use (or privacy policy) would shift to BAM! itself, according to our data protection officer. However, since we'd have to distinguish BAM! accounts from locally created accounts to have actual proof of consent, we might effectively be forced to shut down BAM! support until we have a GPDR-compliant end-to-end solution to this. The same might be true for the stats exports...
You have to get consent for that data processing already, even if it's dealing with pseudonyms only. That means you already have a bigger challenge at your hands, not just for data removal requests. We're all in the same situation. |
@SETIguy - I am not a lawyer, my following statements have no legal weight so take them for 0 value. As I have tried to understand how to comply with GDPR as it pertains to BOINC* I have come to the following understanding of the intent behind the law. I believe that GDPR seeks to make information about an individual a fundemental right of that individual and that they get to control where that information is retained. This right supersedes any other agreements that they might of have entered into. Specifically, they can grant consent to a site to utilize data that they provide and that the site might collect about them. However, they also have the right revoke that consent and have the information they provided or was collected about them removed. They also have the right to review what information a system current retains about them. This second bit is what makes this law such a new and fundementally different thing than what existed before. It means that we have to think of user data and assocaited data that we collect about them as something that is only loaned to us, but is not ours to keep. Systems will have to keep track of personal information and where it flows to ensure that if consent is withdrawn they can ensure that it can be removed.
Since the science results can be seperated from any notion of the user (i.e. when the result record is deleted from the database and after the result has been assimilated there is no longer any connection between the result and the user) and because they are part of the legitate purpose of the system seperate from the user, then GDPR does not apply to these records. If information about the user (for example the os it ran on and other such factors that might be needed to determine what happened during the execution of a particular task are retained) then it gets more complicated (I believe you still can, but you need to get into details about the lower levels of the law).
|
https://www.whatsapp.com/legal/ |
@TheAspens - I read through your RightToErasure document as well. Thanks for writing it up. For the Drupal-BOINC implementation I have already written some code that deletes a user for the Drupal-side of the code. This was pre-GDPR (or before I learned of GDPR). The user is presented with a 'delete account?' Web page a description of what will happen the account is deleted. If confirmed the account is flagged for deletion. There is no email confirmation. But the account is not deleted until two weeks later (adjustable by the admin). If the user logs in anytime within this two-week period, the delete action is canceled - i.e., the account is un-flagged. After two weeks, the account is acted upon by a Drupal queue which deletes the Drupal user data, but keeps much of the data in the BOINC project database (tables: user, host, etc.). There is no pressing reason for BOINC would have to implement a similar wait-period before deletion; this is just my $0.02. |
I don't think that's necessarily true. Einstein (certainly) and I think SETI retain records of who processed which bit of the science - that's held in their master Science databases, long after the transactional processing records are purged from their BOINC databases. Einstein have - very publicly - awarded discovery certificates and named finders in press releases, and as (IIRC) co-authors in published scientific papers. That public recognition of participation will, of course, have been subject to secondary and very specific consent, far beyond any consent granted as part of the process of joining the BOINC project on day 1. But the user ID associated with the computation must have been maintained rigorously intact for the attribution to be possible. |
Given that the data has been available for download by the general public,
upstream/downstream deletion can't be guaranteed for anyone except well
behaved upstream/downstream partners with resources. The guy who has been
extracting and archiving data for all his team members to create graphs for
his web site has been under no obligation to create a means for deleting
data from his archive and probably will not do so. Do we need to stop
providing public stats dumps? Which gets us back to the definition of
"personal data". Are stats personal data to begin with?
And then there's gridcoin. A cpid/gridcoin address link beacon can't be
deleted from the blockchain. I don't know if a username is stored with
that or not. Probably not.
…On Wed, Apr 4, 2018 at 7:36 AM, Oliver Bock ***@***.***> wrote:
@Willy0611 <https://github.com/Willy0611>
When a user deletes his account at a project, should that also delete his
BAM! data for that project (please keep in mind that BAM! data is not stats
data!)? The project doesn't necessarily know that a BAM! account with data
for that project exists. If this data should also be deleted, the project
should call a (non-existing) BAM! API to do so.
Projects are required to make sure any upstream/downstream services delete
any of the published data as well (GDPR Art. 17.2). In case of BAM! the
situation is more complicated, though, as the whole matter of opt-in
consent to a given project's terms of use (or privacy policy) would shift
to BAM! itself, according to our data protection officer. However, since
we'd have to distinguish BAM! accounts from locally created accounts to
have actual proof of consent, we might effectively be forced to shut down
BAM! support until we have a GPDR-compliant end-to-end solution to this.
The same might be true for the stats exports...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2447 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKXcsl7CmW4lEDGIyfjnvZ96HFyDQ_7rks5tlNpYgaJpZM4TF2yB>
.
--
Eric Korpela
korpela@ssl.berkeley.edu
AST:7731^29u18e3
|
The explanation I have come to understand and that I am operating under is that if the clear consent on the BOINC site states what information is public then as long as a mechanism exists to communicate the users intent to have their information removed which consumers of the public data can monitor, then the BOINC site will be in the clear. However, if the consumer of the public data does not follow the delete instructions, then the consumer of the public data could be at risk of violating GDPR. I am also operating under the assumption that stats data that is tied to a user name, user id or cross project id is personal data and needs to be cleared as well. As far as any blockchain tech goes - I have no idea how they will comply since the two are somewhat at odds with each other. I want to be clear again that GDPR is not clear and that the interpretation I am operating under could be incorrect. We are trying to craft the technical changes that will be minimally impactful to BOINC provide the best understanding of what it takes to be compliant. This is why I really want the review of the people who are also trying to comply with the law to articulate their understanding as well since I am not an authoritiative in this matter. |
WCG doesn't do this so I hadn't considered the impact of that. |
What is the appropriate action? I assume that when an account at a project
is deleted, the account manager disconnects the user from that project.
When the user deletes an account manager account, what is the appropriate
action? Do nothing? Delete every account associated with that user at
every project? Have a user selectable option?
…On Mon, Apr 23, 2018 at 3:31 PM, David Anderson ***@***.***> wrote:
I think we'll need to add an am_delete_account RPC.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2447 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKXcsvQKyJCQ_3Rg37LcuHP5ztkyXaBNks5trlZPgaJpZM4TF2yB>
.
--
Eric Korpela
korpela@ssl.berkeley.edu
AST:7731^29u18e3
|
Hi,
My opinions:
1. /get_project_config.php should be extended with an extra tag
specifying if the project must meet EU-GDPR and a extra tag indicating that
the projects supports deleting accounts via RPC. BAM! reads that file daily
so it wil then know if extra actions are required.
2. When a BAM! user signs up for a project it will show any extra
options required to meet EU-GDPR for the project (for example, a checkbox
to comply with the EU-GDPR). The values of the extra options will be added
to the AMS RPC to the project.
3. The legacy problem is knowing whether or not a user created a project
account through BAM! (solution under 1.3.1)
4. When a user deletes his account at BAM! an option will be shown to
also delete any associated project account or a selection of projects,
indicating the EU-GDPR status of the project.
5. When a user deletes his account at a project an option should be
shown to delete the associated account at the AMS. Since the project
probably doesn't know which AMS (if any) created the account it should send
the delete request to all know AMS.
1. This will *not* delete the BAM! account itself since this was not
created by the project, it will only delete the project account under the
BAM! account.
2. Problem: what's the identifier?
3. API needed at the AMS.
4. Projects should store which AMS created the account.
1. Problem: User can switch AMS, so project should probably store
the last used AMS.
There's probably more but nothing comes to mind at the moment.
Willy.
…On 24 April 2018 at 05:34, SETIguy ***@***.***> wrote:
What is the appropriate action? I assume that when an account at a project
is deleted, the account manager disconnects the user from that project.
When the user deletes an account manager account, what is the appropriate
action? Do nothing? Delete every account associated with that user at
every project? Have a user selectable option?
On Mon, Apr 23, 2018 at 3:31 PM, David Anderson ***@***.***>
wrote:
> I think we'll need to add an am_delete_account RPC.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2447 (comment)>, or
mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AKXcsvQKyJCQ_
3Rg37LcuHP5ztkyXaBNks5trlZPgaJpZM4TF2yB>
> .
>
--
Eric Korpela
***@***.***
AST:7731^29u18e3
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2447 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ARDUNQ1Ks3piuIpYtJCIKFeAWDCLfT6mks5trp0ogaJpZM4TF2yB>
.
|
According to our DPO the upstream account manager (AM) is required to handle the opt-in/consent problem in that scenario. However, informed consent can only be given to an actual statement/policy so that the AM has to present the project-specific text for that purpose, presumably mimicking the client's terms of use feature.
I agree but this needs an augmented RPC. Anyhow, these account-creation-related issue should be discussed separately. Other than that I agree that account deletion needs to be taken into account by AMs as well. I recommend to focus on AM -> project account deletion first (e.g. via a new |
Here is my 2 cents:
My biggest concern about the new RPC is that the only authentication used by these RPC's is the authenticator . This will allow anyone who can obtain someone's authenticator to be able to delete someones account. Any thoughts about how to secure this? Barring the issues around security of the new RPC- will these two points resolve the most critical questions? |
Ok thinking longer on it. I think that the following might become necessary:
Thoughts on this approach? Note that I do not have the bandwidth before the May 25th date to implement either the new RPC or this extra security step so if someone else could take this on that would be good. |
Hi,
I agree with all the points.
Willy.
…On 2 May 2018 at 16:15, Kevin Reed ***@***.***> wrote:
Ok thinking longer on it. I think that the following might become
necessary:
- Projects will have a list of trusted Web RPC users (i.e.
https://boincstats.com/, https://scienceunited.org/,
https://www.gridrepublic.org/)
- The Web RPC's (need to decide which) will be updated to include a
section that contains:
<signature>
<signer>https://boincstats.com/</signer>
<hash>1234afd123asdf1234asdf134asdf.....</hash>
</signature>
- The Web RPC users will provide a public key at a standard location
like /public.key (i.e https://boincstats.com/public.key
- The Web RPC user will use their private key to sign the message and
send the signature with the request
- The RPC will verify that the signer is a trusted signer and will
then obtain the public key (either from local cache or from the remote
server - but if the signature fails, it needs to refetch the public key to
allow the signer to update their key) and then verify that the signature
matches the content.
- Only after that processing is complete and successful will it
perform the actions of the RPC.
Thoughts on this approach? Note that I do not have the bandwidth before
the May 25th date to implement either the new RPC or this extra security
step so if someone else could take this on that would be good.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2447 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ARDUNexGGzJ05nL9si5UCPayJzeoEZSWks5tub-LgaJpZM4TF2yB>
.
|
I don't think that the account manager needs to be able to run the project
delete without intervention. The account manager should redirect the user
to the project delete function. Then the delete would propagate back to
the project manager in the next stats export.
…On Wed, May 2, 2018 at 6:58 AM, Kevin Reed ***@***.***> wrote:
Here is my 2 cents:
- Account managers should monitor the new user_deleted.xml that will
be exported in the stats - this will list users deleted on a project in the
past 60 days. See https://boinc.berkeley.edu/trac/wiki/RightToErasure#
DataExports and https://boinc.berkeley.edu/trac/wiki/RightToErasure#
FinalRemoval
- Account managers need to be able to invoke the delete operation on a
project. As discussed above this would the new am_delete_account RPC that
would align with the other RPC's defined here:
https://boinc.berkeley.edu/trac/wiki/WebRpc
<https://boinc.berkeley.edu/trac/wiki/WebRpc>
My biggest concern about the new RPC is that the only authentication used
by these RPC's is the authenticator . This will allow anyone who can obtain
someone's authenticator to be able to delete someones account. Any thoughts
about how to secure this?
Barring the issues around security of the new RPC- will these two points
resolve the most critical questions?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2447 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKXcsg3CUKhlntlqvoms4UCPitfbBbkfks5tubuFgaJpZM4TF2yB>
.
--
Eric Korpela
korpela@ssl.berkeley.edu
AST:7731^29u18e3
|
I have read this whole discussion. I wonder if (or how) do you handle following situation: hacker learned password(s) to user email and project X. Then he deleted user's account at project X, and removed all emails sent during this process. User was not crunching at this project at the time, so BOINC Client was not complaining, and finally (e.g. after a month) found that his account somehow disappeared. If hacker decided to use "right to erasure", project admin also may not have an idea what happened. |
@TheAspens complex, but sound. @SETIguy while I appreciate the simplicity of your approach, it would certainly defeat the whole purpose of account managers, right? That is, manage multiple downstream project-accounts via a single interface. Your example sounds like there is only one project.
|
The discussion about account manager integration should continue in issue #2507 |
I was testing the handling of results returned but not validated and ran into problems. The logic of the validator and credit is complex and trying to add proper handling for the case where we are trying to validate a result returned by a host and user that have been deleted adds a lot edge cases to this code. Since we do not expect this feature to be used often and therefore the amount of results that would be in the status will be low, I am moving forward with the following proposal for how to handle this. It is documented here: https://boinc.berkeley.edu/trac/wiki/RightToErasure#ResultTable but also included below:
Please let me know if anyone has any thoughts on this. I think that the work discarded will be extremely small and it avoids adding some signficant complexity to the code. |
I suppose there is still a race window when back-end daemon has loaded result and other records, updated them and when it goes to update the database the records are gone. With the deadline approaching fast I'm not sure if you need to handle this case perfectly for v1. If you are not going to delete results immediately then scrub stderr. stderr may sometimes contain personal data and in worst case scenarios it may take several months before the result gets removed. |
GDPR allows for the retention of data that has a legitimate purpose. stderr is often needed for various review by the project and is removed when that purpose is complete. As a result, I think that it needs to be left in place. |
The PHP code has been implemented without any concept of transactions (everything is done with autocommit for each statement). I would have to take a deep look to see if the C code handles this any differently. Without transactions and lock in place (pessimistic or optimistic) throughout the system, I don't know how I could address this. I'd be open to ideas. |
A bit of news from Reuters The pan-EU law comes into effect this month and will cover companies that collect large amounts of customer data including Facebook (FB.O) and Google (GOOGL.O). It won’t be overseen by a single authority but instead by a patchwork of national and regional watchdogs across the 28-nation bloc. Seventeen of 24 authorities who responded to a Reuters survey said they did not yet have the necessary funding, or would initially lack the powers, to fulfill their GDPR duties. and Many watchdogs lack powers because their governments have yet to update their laws to include the Europe-wide rules, a process that could take several months after GDPR takes effect on May 25. |
I don't have any better ideas either. |
Maybe it would suffice to start transaction in backend daemon, and use Edit: see also https://stackoverflow.com/questions/6066205/when-using-mysqls-for-update-locking-what-is-exactly-locked |
I added some additional text about the new project config |
#2472 has been merged to master which closes this issue. |
@TheAspens Since this now got merged I'm wondering about the periodic cleanup scripts that are mandatory for this to cover the process end to end. As far as I can tell these are Thanks |
Never mind, just found your "recommendations" and I'm presumbly not missing any either 👍 |
I've been trying to keep https://boinc.berkeley.edu/trac/wiki/ServerUpdates updated as well |
I haven't followed all the code changes, but the delete account option went live on the BOINC forums sometime in the past days. Silently again, no notification. I wonder though, if this also works for a user whose account is (temporarily) banished. Can they also still use the delete account option, or is it locked on only active -usable- accounts? |
@Ageless93 - I would open a new issue for that. I don't know what the behavior would be. |
Another aspect of the GDPR law is the Right to Erasure. I've created a proposed implementation that might meet the requires of this provision of the law. This continues the work outlined in #2332 and #2413.
The proposal is documented here: https://boinc.berkeley.edu/trac/wiki/RightToErasure
I would appreciate a review of this and feedback on the implementation. In particular, I would like feeback from people who are doing their own compliance work to review if this is likely to be the minimum steps necessary to comply with this provision of the law or if some lessor action (like scrubbing user fields such as email address, name, ip address etc) would be permitted. In particular I would appreciate @lfield and @brevilo to take a look and provide feedback.
The text was updated successfully, but these errors were encountered: