Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

Prevent linking of user identity and uploaded diagnosis keys #147

Closed
togermer opened this issue May 25, 2020 · 32 comments
Closed

Prevent linking of user identity and uploaded diagnosis keys #147

togermer opened this issue May 25, 2020 · 32 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@togermer
Copy link

Where to find the issue

Solution Architecture, TAN exchange as described in figure 3 (9-14)

Describe the issue

In order to facilitate widespread use of the app it is of highest importance to guarantee anonymity when users upload their diagnosis keys. Users should stay anonymous even if the health authorities/laboratories and the verification server collude.

In case of collusion between the LIS and verification server the current design allows to link the user identity to their uploaded keys:

  • labs / LIS know the personal user data which is linked to the GUID
  • the verification server can link the GUID (or its hash) to the TAN
  • the verification server can link the TAN to the diagnosis keys

This is possible because the current system links the patient/GUID directly to the TAN.

Suggested change

There is a standard way to address this issue and to improve anonymity: Use blind signatures. In short:

  1. When polling for the test result the app creates a signature request and includes a random blinding factor which it encrypts this with the public key of the verification server.
  2. If the verification server finds the test to be positive it signs (request*blinding factor) with its private key and returns it to the app.
  3. If the user decides to upload their keys the app unblinds the signature by inverting the blinding factor. (Note that the blinding factor is now unencrypted.) This results in an authorization code that is signed by the verification server.
  4. The app attaches the authorization code to the upload.
  5. The verification server checks the authorization code against its private key to authorize the upload.

Because the verification server cannot link the blind signature to the authorization code it cannot link the upload to the user identity. As an additional benefit it doesn't have to store TANs.

This approach was also proposed in DP3T #210. A PoC is available at https://github.com/think-cell/corona.

@togermer togermer added bug Something isn't working documentation Improvements or additions to documentation labels May 25, 2020
@mh-
Copy link

mh- commented May 26, 2020

@togermer Looking at the current Solution Architecture, TAN exchange as described in figure 3 (9-14),
even if LIS and Verification Server colluded, they could not get access to Diagnosis Keys - would you agree?

Regarding your suggested change, how would you ensure that the authorization code can only be used once (or a defined number of times, as required by the Exposure Notifications scheme) -
(including the point "In practice, the property that signing one blinded message produces at most one valid signed messages is usually desired ..." from the Wikipedia article)?

@togermer
Copy link
Author

Maybe I was imprecise: I assume that all central components possibly collude. In particular, this also includes the Corona-Warn-App server. In fact, I think it's fair to assume that CWA server and verification server are run by the same entity. Obviously the CWA server has the uploaded diagnosis keys, because the user explicitly uploads them along with the TAN.

Regarding authorization reuse: The verification server has to keep track of used authorization codes, as described in DP3T #210. It has to check new uploads against the set of used codes and refuse double submissions.

In order to prune old authorization codes from this list the authorization request could be annotated with a timestamp which is also subject to the blind signature. Then the verification server could only accept authorizations that were created within the last x days. Therefore, it can remove authorizations that are older than x days from its list.

@mh-
Copy link

mh- commented May 26, 2020

@togermer quoting from source: https://github.com/corona-warn-app/cwa-verification-server/blob/master/docs/architecture-overview.md
"Measures to increase data privacy - Separate Operation of Verification Server and Corona-Warn-App Server - The Verification Server and the Corona-Warn-App Server are operated by different people and run in different cloud subscriptions."

Regarding authorization reuse: The verification server has to keep track of used authorization codes

ok, I was just confused that you mentioned an additional benefit of not having to store TANs.

Also, I think it's indeed required to read DP3T #210 to understand what this proposal intends to prevent:

  • Even if all central entities (including labs, Verification Server and Corona-Warn-App Server) colluded,
    1. prevent them from knowing the TEKs/Diagnosis Keys of a diagnosed user
    2. prevent them from knowing which diagnosed user uploaded their TEKs/Diagnosis Keys and which one did not

@alstiefel
Copy link

alstiefel commented May 26, 2020

The Verification Server and the Corona-Warn-App Server will run in the same subscription but in different namespaces. People are organized along those namespaces. The goal is to separate the people running the system. Separate subscriptions will not provide significant benefits.

I will update the verification server document accordingly.

@togermer
Copy link
Author

It's certainly nice that the verification server and the CWA server will be operated separately, and it's laudable if you promise that they are run by different people. But it's a matter of trust that these people don't collude and share any information. It's best if we don't require this trust from app users and instead give privacy guarantees. There's no need to rely on this "social" solution if there's a simple technical solution that gives hard guarantees.

@oiime
Copy link

oiime commented May 26, 2020

The Verification Server and the Corona-Warn-App Server will run in the same subscription but in different namespaces

For all practical privacy purposes any piece of software not under the user's control should be assumed to be compromised, otherwise you expect the user to trust your ability and intentions without being able to vet either. which is likely to limit the adoption of the app.

@alstiefel
Copy link

alstiefel commented May 26, 2020

Coming back to the initial comment.

  • The statement "the current system links the patient/GUID directly to the TAN" is not correct. The GUID is always hashed in all systems (besides the app), so compromising the system will not reveal a two way link between patient and the diagnostic keys.

  • The link between hashed GUID and TAN is only transient, never persistent.

  • Blind Signature, having a technical solution which keeps the link between (hashed) GUID and the diagnostic keys in the space controlled by the user is surely beneficial. The discussion in DP-3T is a good starting point to evaluate the use of Blind Signatures.

I will take this suggestion into the team as a potential future enhancement.

@togermer
Copy link
Author

Regarding your first point: It is enough to link the hashed GUID to the TAN, because the LIS has the patient data and links it to the hashed GUID.

@alstiefel
Copy link

I try to be more specific.

  • The testing lab/hospital/doctor can map the GUID to an person. These entities are outside the system boundaries of corona warn app (server). The hashed GUID by itself is pseudonym, not as anonymous as we would like to have it.
  • The testing lab/hospital/doctor is not part of corona warn app
  • Linking a person to a TAN requires two systems to be compromised.
  • Buy compromising corona warn app (server), you will be able to link only the GUIDs of the time frame you own the whole system, because the link between GUID and TAN is not persisted

I hope this made my point clearer about how patient (identity) and TAN is strongly linked. And of course a soon as you have a GUID, everything linked to it becomes pseudonym.

@tklingbeil
Copy link
Member

The (hashed) TANs are stored in their own DB table without a link to the hashed GUID.

@mh-
Copy link

mh- commented May 26, 2020

The proposal that was made in the DP-3T context is based on the assumption that the systems are not only compromised by hackers temporarily,
but that the owners of the systems collude to break the anonymity of RPI/TEKs that users have advertised somewhere and that the owners of the systems (presumably an evil government) have collected as well.
In this context all colluding parties already know/share the real-life identity of the user, they should just be prevented from linking that to RPIs/TEKs and the fact whether or not this user uploads them.

@togermer
Copy link
Author

Linking a person to a TAN requires two systems to be compromised.
By compromising corona warn app (server), you will be able to link only the GUIDs of the time frame you own the whole system, because the link between GUID and TAN is not persisted

I think you are talking about an external attacker compromising the system. I'm talking about the general possibility to de-anonymize traces data/diagonisis keys. If laboratories and CWA/verification server operators collude (or the government forces them to do so), they can correlate diagnosis keys and patient identities and effectively spy on the user. By opening this possibility users will rightfully get suspicious and not use the app.

@sventuerpe
Copy link

@togermer

  • Who could realistically execute such an attack?
  • What’s in it for the attacker, how can they benefit from a successful attack?

@mh-
Copy link

mh- commented May 26, 2020

users will rightfully get suspicious and not use the app.

I don't think this will happen in Germany, because the attack is completely unrealistic.
It might happen in countries where citizens feel oppressed, but in Germany only very few users will have these (in my opinion unfounded) suspicions, and these persons will not trust or use such a system anyway, regardless of the actual level of privacy that the system offers.

@togermer
Copy link
Author

  • Who could realistically execute such an attack?

As already stated, this is less about an attack but about possible collusion of health authorities / the government. It's about preserving privacy against central institutions.

  • What’s in it for the attacker, how can they benefit from a successful attack?

It could spy on the app user. If you combine this with a distributed network of beacon listeners you can trace user locations, for instance.

@oiime
Copy link

oiime commented May 26, 2020

I don't think this will happen in Germany, because the attack is completely unrealistic.
It might happen in countries where citizens feel oppressed, but in Germany only very few users will have these (in my opinion unfounded) suspicions, and these persons will not trust or use such a system anyway, regardless of the actual level of privacy that the system offers.

To be honest I first looked at this project hoping it'll do its best to go above and beyond to prove privacy would be integral to the implementation. I do not know how big the segment of the population that shares my approach but I definitely am not part of "these persons will not trust or use such a system anyway" and that approach in general is quite worrisome.

minimizing the possible use of the system outside its mandate is crucial, if there is no legitimate reason to divulge identifiable data, then don't. mindsets such as "Who could realistically execute such an attack?" when approaching security are unhealthy

@togermer
Copy link
Author

I don't think this will happen in Germany, because the attack is completely unrealistic.
It might happen in countries where citizens feel oppressed, but in Germany only very few users will have these (in my opinion unfounded) suspicions, and these persons will not trust or use such a system anyway, regardless of the actual level of privacy that the system offers.

This depends on the story the press will make of this. Even if unfounded the suspicion alone can severely limit the adoption. And why taking this risk when there is a simple technical solution?

@sventuerpe
Copy link

sventuerpe commented May 26, 2020

@togermer

  1. The government is already in a position to ruin everyone’s day. How much would your collusion scenario add to its capabilities?
  2. Our society has various safeguards in place to keep the power of our government under control. These safeguards seem to work reasonably well, we are not invading other countries or deporting people to death camps any more. Are there reasons to assume these safeguards would turn out ineffective in your scenario?
  3. More specifically, the immediate outcome in your scenario seems to be that an entity obtains information it is not supposed to use.
    1. Is this statement correct and comprehensive or am I overlooking any important aspect?
    2. What could motivate an entity to expend the necessary effort to collect this information? How would they use it?

@HolgerMayer
Copy link

HolgerMayer commented May 26, 2020

@togermer

  • Who could realistically execute such an attack?

This is the wrong question because with „realistically“ you imply there is close to nobody. This is a view on IT-Security that should be gone long ago. Ever read 2600?

  • What’s in it for the attacker, how can they benefit from a successful attack?

Fame or fortune . Question : Why does one climb on a mountain? Because its there. Same for intrusion attacks.

@sventuerpe
Copy link

@HolgerMayer With “realistically” I imply that motivation and capability vary across possible adversaries. Gone by now should be the illusion that security be achieved by addressing every remote possibility with technical safeguards. Nowadays we base security decisions on risk assessment and a realistic adversary model is an essential component of it.

Security risk assessment is not really as new an approach as I just made it sound. In real life we do it every day when we leave the house. Various deadly attacks against the human body are very well known and easy to execute, not only by the government but by everyone. Since the risk of anyone actually trying remains low, however, we do not usually wear bulletproof vests and helmets. Doing so would be too expensive in implementation and side effects in relation to the actual security gain.

There will be residual risks that we should accept rather than address and there will also be mechanisms of risk management other than technological safeguards. I believe everyone needs to acknowledge this – limits and criteria can still be debated – to have a constructive discussion. The line between valid security concerns and freewheeling conspiracy theories is unfortunately rather thin sometimes.

@oiime
Copy link

oiime commented May 26, 2020

@sventuerpe While all that is a fine theoretical argument it's not particularly pertinent. if you put full trust in the server side then it renders most of this project meaningless, the main driving principle behind anonymizing the data is to limit the government's ability to misuse it, if by design you allow them to reconstruct the data then that goes a bit beyond some far fetched theorizing, it negates a design principle. a principle that should guide the development because ignoring it is the reason people would not want to install this app

@strubbi77
Copy link

@sventuerpe While all that is a fine theoretical argument it's not particularly pertinent. if you put full trust in the server side then it renders most of this project meaningless, the main driving principle behind anonymizing the data is to limit the government's ability to misuse it, if by design you allow them to reconstruct the data then that goes a bit beyond some far fetched theorizing, it negates a design principle. a principle that should guide the development because ignoring it is the reason people would not want to install this app

For me it's clear that revealing my keys leads in the high chance that many people and white hats can create a trace from me or at least uncover me.
Especially when using video surveillance and beacon listeners. But as long I don't upload my keys no one can create a trace from me.

If this discussion is about the amount of information a White Hat gets you should get in mind he will additionally check the information from §9 IfSG and §5 (2) 5. Lichtbild PauswG (https://www.gesetze-im-internet.de/pauswg/__5.html)

If it's about a Black Hat, the effort vs. gain is very low because to get traces from people he needs the keys. And they have to be uploaded by the people voluntary. Without the keys the whole infrastructure is not useful (Key upload could be triggered by evil beacon sender together with valid TANs) But this vector is quite small, if the risk can be reduced without high effort then it should be done, otherwise not.

@sventuerpe
Copy link

@oiime First, we are not talking about full trust here, but rather about a specific concern within an architecture that nevertheless leaves a lot of control to its individual users.

Second, I do not despise trust altogether, as long as it does not degenerate into an inescapable dependency. Genuine, earned and tentative trust reduces complexity. And I do trust my state and its institutions to some extent, not blindly and unconditionally but sufficiently to abstain from fundamental opposition.

Third, as a consequence, I am not convinced that a lopsided focus solely on technological security controls is appropriate. While I see good reasons to go with the flow and use Apple’s and Google’s APIs rather than try something else, I believe institutional and legal safeguards deserve the same amount of attention as they complement the technical ones.

Can we now return to the scenario and continue risk assessment within the application context?

@oiime
Copy link

oiime commented May 26, 2020

@sventuerpe

First, we are not talking about full trust here, but rather about a specific concern within an architecture that nevertheless leaves a lot of control to its individual users.

You're allowing the operator the ability to tie an individual to their keys, that's pretty much as full as it gets. that connection should be severed as early as possible in the notification chain

Second, I do not despise trust altogether, as long as it does not degenerate into an inescapable dependency. Genuine, earned and tentative trust reduces complexity. And I do trust my state and its institutions to some extent, not blindly and unconditionally but sufficiently to abstain from fundamental opposition.

So far this implementation is more complex and less private so I'm not sure what the benefit is. if there is no good reason to trust them, then don't, it should be that simple, why introduce an attack vector when you have nothing to gain

Third, as a consequence, I am not convinced that a lopsided focus solely on technological security controls is appropriate. While I see good reasons to go with the flow and use Apple’s and Google’s APIs rather than try something else, I believe institutional and legal safeguards deserve the same amount of attention as they complement the technical ones.

The Exposure Notification API in no way mandates doing it this way, I fully support using their API for practical reasons and AFAIK nobody is objecting to using their API

@sventuerpe
Copy link

@oiime

You're allowing the operator the ability to tie an individual to their keys, that's pretty much as full as it gets. that connection should be severed as early as possible in the notification chain

I doubt that.

First, we need to bear in mind that Covid-19 is a notifiable disease and conventional contact tracing will continue in parallel to the use of CWA. The application context limits information hiding even by those not using the app at all. The point of designing privacy controls into the architecture is not to guarantee perfect confidentiality, but rather to make mission creep less likely. The most effective control in this respect is a protocol that limits what the server side can learn at all:

Unless you as an app user or any of your recent contacts has tested positive, the server side receives no information about you at all. This makes the CWA system unsuitable for general surveillance. If the server side does learn information about you, this information remains limited and of low value. In particular, your contacts and social network cannot realistically be reconstructed.

If I understand @togermer’s collusion scenario correctly – feel free to correct me if I do not – only the identity of the infected person using a TAN to upload their diagnosis keys would be revealed. I see not much of a gain here for a conspiracy of labs and server operators, hence my request to assess risks and not only technical possibility. If there is a real problem I shall be the last to protest it being solved. I am just not yet convinced there is.

Second, I see no good reason to limit analysis to an arbitrarily chosen subsystem. A comprehensive treatment should consider all involved stakeholders including, for example, platform providers Apple and Google. This takes us back to questions of trust, risk, and trade-offs.

@oiime
Copy link

oiime commented May 26, 2020

@sventuerpe You are correct, only the identity of the infected person using a TAN to upload their diagnosis keys would be revealed, the point of contention was in the risk and cost of severing the connection between the individual and the TAN at a layer closer to the user.
I guess at this point it is for you to decide if it's worth the effort or not to implement it. but bare in mind that going beyond what is necessary here is not only important from a privacy and security perspective but also from a PR perspective, maximizing adoption is key for this to have a chance of working

Second, I see no good reason to limit analysis to an arbitrarily chosen subsystem. A comprehensive treatment should consider all involved stakeholders including, for example, platform providers Apple and Google. This takes us back to questions of trust, risk, and trade-offs.

Well, you can not control Apple/Google nor are they associated with this repository, so I'm focusing on this subsystem(s) and what the people here can do to offer a more privacy conscious solution

Thanks for your time

@sev71
Copy link

sev71 commented May 26, 2020

I already implemented a first version of a blind signatures extension for the DP3T Java/Spring based backend (DP-3T/dp3t-sdk-backend/pull/73).

Feel free to have a look.

@keugens
Copy link

keugens commented May 28, 2020

In order to facilitate widespread use of the app it is of highest importance to guarantee anonymity when users upload their diagnosis keys. Users should stay anonymous even if the health authorities/laboratories and the verification server collude.

Is there any study about this, and more generally, about the reasons why people would use the app and why not?

Also, I think it's indeed required to read DP3T #210 to understand what this proposal intends to prevent:

  • Even if all central entities (including labs, Verification Server and Corona-Warn-App Server) colluded,
    i. prevent them from knowing the TEKs/Diagnosis Keys of a diagnosed user
    ii. prevent them from knowing which diagnosed user uploaded their TEKs/Diagnosis Keys and which one did not

Example 1: the user has a positive test result and a meeting with health service (HS). The user and the HS agree to do the upload to the app-server and to do a function check. Then, on the HS phone, the history of app contacts will be deleted. The user phone and the HS phone will be close together for about 10 minutes. Authorization takes place and the data is uploaded from the user phone to the app-server. Then the app of the HS phone will download data from app-server and should get a notification. If not, something went wrong.

Example 2: the user has a testing date. Before actual testing, he signs a document about rules and privacy. The person doing the testing has a work phone with the app installed and the app contact history deleted. After 10 min the test is performed and further recording on the work phone is disabled. One day later, the user is informed about his positive test result. But very shaken by this message, he forgets to start the upload process. Fortunately, one day later, he is reminded by phone to do this.

@whythecode
Copy link

Jumping in here a bit late, but what if instead of disclosing one's one rolling IDs, the IDs that have been come into contact with are disclosed instead. In that case, each app checks whether one of their own IDs has been affected instead of one of their contact's IDs.
This would still be problematic if you put beacons and cameras in place though.
Ideally, in addition to that, each phone wouldn't keep track of other rolling IDs but rather a unique token would be generated and kept track of by the two phones. That way the only way to know they were in contact is if you compromise both phones.
Not sure I was able to explain that very clearly, but maybe it's an idea.

@oiime
Copy link

oiime commented May 31, 2020

@whythecode If I understood you correctly then it'll just mean you'd need to release a larger pool of IDs (potentially), it'll have no influence over the de-anonymization of the user itself, I think the problem of having an actual interaction between the phones to generate paired tokens is that the protocol relies on using the broadcast payload on bluetooth, you can't really have a back and forth.
however it might be possible to publish a public key on that payload and have the phone store those encrypted with the key, thought I doubt it. either way this implementation relies on the Google/Apple protocol so they don't really have the option to change that layer

@whythecode
Copy link

@oiime you're totally right, scratch what I said. Releasing a pool of connections is potentially much more dangerous information and does not depend on there being beacons up at the right time in order to do some matching.
De-anonymizing the user would solve most issues, though I can't see how that's supposed to work without at least some kind of trust on the processes and frameworks put in place.
I'm looking forward to seeing how this develops - thanks for your time and effort, everyone.

@SebastianWolf-SAP
Copy link
Member

As the discussion stopped almost two weeks ago, our architects and experts already replied to the original question regarding the possibility of linking and as we received a new issue covering the topic in a much broader sense, we will close this issue here now and ask you all to follow-up in #223.

Thank you very much for your understanding!

Mit freundlichen Grüßen/Best regards,
SW
Corona Warn-App Open Source Team

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests