-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single encounter problem #13
Comments
Couldn't this be solved by using a sort of bloom filter / spatio-temporal bloom filter to represent the infected ids? |
@panisson Clever use of Bloom filters would make it easier to publish ephids, this is a good idea, but functionally it is what I meant as collisions. |
I think that's the risk of the decentralized approach where everything is decentralized. There is a lot of debate whether centralized or decentralized approach is better. I think a hybrid (ok maybe just centralized - depending how you understand it) approach where most of the information gathering and exchange happens on the phones and only some centralized data is still stored (fully anonymously) makes it much less vulnerable to single-encounter case and it has an additional benefit of algorithm testability. I think we can modify the whole approach and the problem can be solved by centralizing algorithm of detection if a person is "endangered" or not. Taking bits and pieces of the ProteGO app implementation (and discussion - in Polish) ProteGO-Safe/specs#34 Some thoughts:
Of course it means that the algorithm might be manipulated potentially. However since there is full anonymity of the IDs, I think that might be a nice compromise (Like - you do not know whom you would like to compromise if you are a bad actor on the server side). On the other hand it has the benefit that the algorithm might be tested. Before deploying new algorithms it will be possibly to make a dry run on existing data and see if there is an error. Making mistake in such algorithms might be potentially disastrous, i think being able to modify and tests the algorithms on the server. WDYT ? |
I'm pretty sure that the code verification while sending the history will be implemented in the app. TBC by @jakublipinski though. I am wondering how we will know if the person should be notified. This algorithm is still in progress and we do not know how it will work - should we notify people who were in x radius and spent x time next to the person who has coronavirus? It is really important in terms of transparency I think. Or maybe I missed something, correct me if I am wrong. |
I think there should be a team of data scientists, doctors, and technologists working on it. I do not know exactly what will work but I have a strong believe it will. That's why I also thing keeping FULLY ANONYMOUS data on the server and running/testing various algorithms there is the best approach. I've been involved in Machine Learning / Data Science projects (I worked at https://nomagic.ai - robotics and AI startup) and I know that such algorithms require a lot of iterations, testing etc. Having a fully anonymous BT encounters data on the server + information who is diagnosed link to information about it (without de-anonymising even the sick people) should provide a good test/verification data for that. That's why I think keeping some data in a central location might make sense (as long as it is not de-anonymisable). The great thing about it that in order to train/try such algorithms we do not have to know at all who is who. We just have to know that given IDs have been diagnosed and learn the spread pattern from there and fine-tune the algorithms. Completely anonymously. If AI/Machine learning is involved (I will try to involve some of the best specialists I know from NoMagic) then we will not know the details of such algorithms anyway. The AI algorihms are so far mostly "black boxes" that are not easily explainable, however they might be tested and verified on real anonymous data (and when you run it for the historical data, you might verify that your algorithms provide results correlating with reality so they might predict risk much better than any "well described" algorithm. But again - I am not specialist in this area - what I would do is to provide anonymous data to the people who know what they are doing and let them work with it. |
@potiuk Providing such data for the research is one thing, but deciding if you are at risk (as now proposed in ProteGO) is another. It can't be a black box to ensure trust. |
Let's wait and see how it evolves. I think it would be great to see the algorithm but for me it is not a blocker if the data it operates is truly anonymous. Maybe I am wrong here, but I do not see risk (at least from the point of view of "infiltration", "manipulation" and "preserving privacy" - which was my main concern before for ProteGO. Assuming (and this is still big if - we have to observe and shout if not) the data will be anonymous there might be other risks involved that might make the need for the algorithm to be public. I believe for now we do not even have enough data to make any assumptions about the algorithm, its accuracy, correctness - because ... it does not exist yet (no data - > no algorithm). This algorithm will have to be worked out by data scientists not software engineers and I think it might be really complex to verify. But .. Let's see what information ProteGO provides. I think at this moment it is important that it should be anonymous by default (and you should only optionally add number), It should be opt-in not opt-out. Let's see what UX it will be.. |
And I hope the algorithm will be made public eventually. |
@cloudyfuel > agree pseoudonymous != anonymous. and I think at this step it is important to fight for anonymity. I think algorithms should come next in line. |
The documents in this repo mention this case as an remaining risk that cannot me mitigated. Even without an app, having limited contact helps to pinpoint the source of infection. I don't see how this can be solved with technology. |
@nicorikken The problem is that an attacker can simulate having limited contact by changing his own identity frequently. Maybe this is inevitable part of any proximity tracking algorithm of this kind as stated in the original doc, but I believe if we allow system to have same false positives we can at least say that there is some chance it was not a true contact (and therefore not a true source of an infection), but it was just a false positive. This may not be enough though and there might be some better ideas. |
Hi all, thanks for your very interesting inputs! The thread goes in many directions so please bear with me. I'll try to summarize: Initial problem: Single encounter problem.(please start your comment with this if you're answering this point) This is indeed a valid concern, which I believe is not solvable in the most extreme case: Even without considering such a dummy case, we believe that false positives (@jasisz first proposal) ultimately cannot prevent the attack, but they do add uncertainty to the attacker (Bob in this case). One counter-arguments is that false positives are undesirable for overall utility the system. @jasisz, I'm sorry but I'm not sure I understand your second proposition :
Bloom filtersdiscussed in #24 and in our new design (see Whitepaper), if we could move the discussion to #24 it would be great. 3rd point: Centralized algorithm(please start your comment with this if you're answering this point). Thanks for the many interesting comments on the topic. It is obviously a broad topic, just perhaps to highlight some of our past decisions: we decided that it was very hard to truly anonymize uploaded "graph" data; this is why our design uploads infected identities and not contacts (also see our FAQ, P1), hence the design we propose. Another comment is that even if it is possible, truly anonymizing uploads is costly (requires an anonymous communication network), see our FAQ P5, hence our design that avoid this by only uploading non/less-sensitive data to the backend. |
@lbarman I could've made it more clear. There is a possibility that app would re-use some EphIDs it have seen in the past and advertise with them by design. It does not fit into your Design 1, but somehow fits into Design 2. Alice has seen EphID-B belonging to Bob and it was long enough time that it is a valid contact, potentially an infectious one. Alice can present herself with EphID-B in some next epochs. In case EphID-B is reported as infected it is not clear if it was Alice own EphID or it belongs to someone Alice has seen in the past. Of course it also leads to false-positives of two kinds:
|
@jasisz Thanks for clarifying this issue. I think we are well aware in the DP-3T project of the issue you highlight, and indeed it is discussed in the whitepaper, see Section 5.3 beginning on page 27. We consider it unavoidable in a system of the type we are aiming for. Proposals for concrete ways of addressing it that we may have missed are positively encouraged. |
If in given epoch one is sure they've meet only one person and later they find this person id in the public repository of infected ids - they can be sure that person was infected.
This is a privacy concern and workaround is not trivial... but I believe this is still possible, although documents states this is not possible for any proximity tracing mechanism.
I have two ideas to fix that:
The text was updated successfully, but these errors were encountered: