Prevent linking of user identity and uploaded diagnosis keys #147
Comments
@togermer Looking at the current Solution Architecture, TAN exchange as described in figure 3 (9-14), Regarding your suggested change, how would you ensure that the authorization code can only be used once (or a defined number of times, as required by the Exposure Notifications scheme) - |
Maybe I was imprecise: I assume that all central components possibly collude. In particular, this also includes the Corona-Warn-App server. In fact, I think it's fair to assume that CWA server and verification server are run by the same entity. Obviously the CWA server has the uploaded diagnosis keys, because the user explicitly uploads them along with the TAN. Regarding authorization reuse: The verification server has to keep track of used authorization codes, as described in DP3T #210. It has to check new uploads against the set of used codes and refuse double submissions. In order to prune old authorization codes from this list the authorization request could be annotated with a timestamp which is also subject to the blind signature. Then the verification server could only accept authorizations that were created within the last x days. Therefore, it can remove authorizations that are older than x days from its list. |
@togermer quoting from source: https://github.com/corona-warn-app/cwa-verification-server/blob/master/docs/architecture-overview.md
ok, I was just confused that you mentioned an additional benefit of not having to store TANs. Also, I think it's indeed required to read DP3T #210 to understand what this proposal intends to prevent:
|
The Verification Server and the Corona-Warn-App Server will run in the same subscription but in different namespaces. People are organized along those namespaces. The goal is to separate the people running the system. Separate subscriptions will not provide significant benefits. I will update the verification server document accordingly. |
It's certainly nice that the verification server and the CWA server will be operated separately, and it's laudable if you promise that they are run by different people. But it's a matter of trust that these people don't collude and share any information. It's best if we don't require this trust from app users and instead give privacy guarantees. There's no need to rely on this "social" solution if there's a simple technical solution that gives hard guarantees. |
For all practical privacy purposes any piece of software not under the user's control should be assumed to be compromised, otherwise you expect the user to trust your ability and intentions without being able to vet either. which is likely to limit the adoption of the app. |
Coming back to the initial comment.
I will take this suggestion into the team as a potential future enhancement. |
Regarding your first point: It is enough to link the hashed GUID to the TAN, because the LIS has the patient data and links it to the hashed GUID. |
I try to be more specific.
I hope this made my point clearer about how patient (identity) and TAN is strongly linked. And of course a soon as you have a GUID, everything linked to it becomes pseudonym. |
The (hashed) TANs are stored in their own DB table without a link to the hashed GUID. |
The proposal that was made in the DP-3T context is based on the assumption that the systems are not only compromised by hackers temporarily, |
I think you are talking about an external attacker compromising the system. I'm talking about the general possibility to de-anonymize traces data/diagonisis keys. If laboratories and CWA/verification server operators collude (or the government forces them to do so), they can correlate diagnosis keys and patient identities and effectively spy on the user. By opening this possibility users will rightfully get suspicious and not use the app. |
|
I don't think this will happen in Germany, because the attack is completely unrealistic. |
As already stated, this is less about an attack but about possible collusion of health authorities / the government. It's about preserving privacy against central institutions.
It could spy on the app user. If you combine this with a distributed network of beacon listeners you can trace user locations, for instance. |
To be honest I first looked at this project hoping it'll do its best to go above and beyond to prove privacy would be integral to the implementation. I do not know how big the segment of the population that shares my approach but I definitely am not part of "these persons will not trust or use such a system anyway" and that approach in general is quite worrisome. minimizing the possible use of the system outside its mandate is crucial, if there is no legitimate reason to divulge identifiable data, then don't. mindsets such as "Who could realistically execute such an attack?" when approaching security are unhealthy |
This depends on the story the press will make of this. Even if unfounded the suspicion alone can severely limit the adoption. And why taking this risk when there is a simple technical solution? |
|
This is the wrong question because with „realistically“ you imply there is close to nobody. This is a view on IT-Security that should be gone long ago. Ever read 2600?
Fame or fortune . Question : Why does one climb on a mountain? Because its there. Same for intrusion attacks. |
@HolgerMayer With “realistically” I imply that motivation and capability vary across possible adversaries. Gone by now should be the illusion that security be achieved by addressing every remote possibility with technical safeguards. Nowadays we base security decisions on risk assessment and a realistic adversary model is an essential component of it. Security risk assessment is not really as new an approach as I just made it sound. In real life we do it every day when we leave the house. Various deadly attacks against the human body are very well known and easy to execute, not only by the government but by everyone. Since the risk of anyone actually trying remains low, however, we do not usually wear bulletproof vests and helmets. Doing so would be too expensive in implementation and side effects in relation to the actual security gain. There will be residual risks that we should accept rather than address and there will also be mechanisms of risk management other than technological safeguards. I believe everyone needs to acknowledge this – limits and criteria can still be debated – to have a constructive discussion. The line between valid security concerns and freewheeling conspiracy theories is unfortunately rather thin sometimes. |
@sventuerpe While all that is a fine theoretical argument it's not particularly pertinent. if you put full trust in the server side then it renders most of this project meaningless, the main driving principle behind anonymizing the data is to limit the government's ability to misuse it, if by design you allow them to reconstruct the data then that goes a bit beyond some far fetched theorizing, it negates a design principle. a principle that should guide the development because ignoring it is the reason people would not want to install this app |
For me it's clear that revealing my keys leads in the high chance that many people and white hats can create a trace from me or at least uncover me. If this discussion is about the amount of information a White Hat gets you should get in mind he will additionally check the information from §9 IfSG and §5 (2) 5. Lichtbild PauswG (https://www.gesetze-im-internet.de/pauswg/__5.html) If it's about a Black Hat, the effort vs. gain is very low because to get traces from people he needs the keys. And they have to be uploaded by the people voluntary. Without the keys the whole infrastructure is not useful (Key upload could be triggered by evil beacon sender together with valid TANs) But this vector is quite small, if the risk can be reduced without high effort then it should be done, otherwise not. |
@oiime First, we are not talking about full trust here, but rather about a specific concern within an architecture that nevertheless leaves a lot of control to its individual users. Second, I do not despise trust altogether, as long as it does not degenerate into an inescapable dependency. Genuine, earned and tentative trust reduces complexity. And I do trust my state and its institutions to some extent, not blindly and unconditionally but sufficiently to abstain from fundamental opposition. Third, as a consequence, I am not convinced that a lopsided focus solely on technological security controls is appropriate. While I see good reasons to go with the flow and use Apple’s and Google’s APIs rather than try something else, I believe institutional and legal safeguards deserve the same amount of attention as they complement the technical ones. Can we now return to the scenario and continue risk assessment within the application context? |
You're allowing the operator the ability to tie an individual to their keys, that's pretty much as full as it gets. that connection should be severed as early as possible in the notification chain
So far this implementation is more complex and less private so I'm not sure what the benefit is. if there is no good reason to trust them, then don't, it should be that simple, why introduce an attack vector when you have nothing to gain
The Exposure Notification API in no way mandates doing it this way, I fully support using their API for practical reasons and AFAIK nobody is objecting to using their API |
I doubt that. First, we need to bear in mind that Covid-19 is a notifiable disease and conventional contact tracing will continue in parallel to the use of CWA. The application context limits information hiding even by those not using the app at all. The point of designing privacy controls into the architecture is not to guarantee perfect confidentiality, but rather to make mission creep less likely. The most effective control in this respect is a protocol that limits what the server side can learn at all: Unless you as an app user or any of your recent contacts has tested positive, the server side receives no information about you at all. This makes the CWA system unsuitable for general surveillance. If the server side does learn information about you, this information remains limited and of low value. In particular, your contacts and social network cannot realistically be reconstructed. If I understand @togermer’s collusion scenario correctly – feel free to correct me if I do not – only the identity of the infected person using a TAN to upload their diagnosis keys would be revealed. I see not much of a gain here for a conspiracy of labs and server operators, hence my request to assess risks and not only technical possibility. If there is a real problem I shall be the last to protest it being solved. I am just not yet convinced there is. Second, I see no good reason to limit analysis to an arbitrarily chosen subsystem. A comprehensive treatment should consider all involved stakeholders including, for example, platform providers Apple and Google. This takes us back to questions of trust, risk, and trade-offs. |
@sventuerpe You are correct, only the identity of the infected person using a TAN to upload their diagnosis keys would be revealed, the point of contention was in the risk and cost of severing the connection between the individual and the TAN at a layer closer to the user.
Well, you can not control Apple/Google nor are they associated with this repository, so I'm focusing on this subsystem(s) and what the people here can do to offer a more privacy conscious solution Thanks for your time |
I already implemented a first version of a blind signatures extension for the DP3T Java/Spring based backend (DP-3T/dp3t-sdk-backend/pull/73). Feel free to have a look. |
Is there any study about this, and more generally, about the reasons why people would use the app and why not?
Example 1: the user has a positive test result and a meeting with health service (HS). The user and the HS agree to do the upload to the app-server and to do a function check. Then, on the HS phone, the history of app contacts will be deleted. The user phone and the HS phone will be close together for about 10 minutes. Authorization takes place and the data is uploaded from the user phone to the app-server. Then the app of the HS phone will download data from app-server and should get a notification. If not, something went wrong. Example 2: the user has a testing date. Before actual testing, he signs a document about rules and privacy. The person doing the testing has a work phone with the app installed and the app contact history deleted. After 10 min the test is performed and further recording on the work phone is disabled. One day later, the user is informed about his positive test result. But very shaken by this message, he forgets to start the upload process. Fortunately, one day later, he is reminded by phone to do this. |
Jumping in here a bit late, but what if instead of disclosing one's one rolling IDs, the IDs that have been come into contact with are disclosed instead. In that case, each app checks whether one of their own IDs has been affected instead of one of their contact's IDs. |
@whythecode If I understood you correctly then it'll just mean you'd need to release a larger pool of IDs (potentially), it'll have no influence over the de-anonymization of the user itself, I think the problem of having an actual interaction between the phones to generate paired tokens is that the protocol relies on using the broadcast payload on bluetooth, you can't really have a back and forth. |
@oiime you're totally right, scratch what I said. Releasing a pool of connections is potentially much more dangerous information and does not depend on there being beacons up at the right time in order to do some matching. |
As the discussion stopped almost two weeks ago, our architects and experts already replied to the original question regarding the possibility of linking and as we received a new issue covering the topic in a much broader sense, we will close this issue here now and ask you all to follow-up in #223. Thank you very much for your understanding! Mit freundlichen Grüßen/Best regards, |
Where to find the issue
Solution Architecture, TAN exchange as described in figure 3 (9-14)
Describe the issue
In order to facilitate widespread use of the app it is of highest importance to guarantee anonymity when users upload their diagnosis keys. Users should stay anonymous even if the health authorities/laboratories and the verification server collude.
In case of collusion between the LIS and verification server the current design allows to link the user identity to their uploaded keys:
This is possible because the current system links the patient/GUID directly to the TAN.
Suggested change
There is a standard way to address this issue and to improve anonymity: Use blind signatures. In short:
Because the verification server cannot link the blind signature to the authorization code it cannot link the upload to the user identity. As an additional benefit it doesn't have to store TANs.
This approach was also proposed in DP3T #210. A PoC is available at https://github.com/think-cell/corona.
The text was updated successfully, but these errors were encountered: