-
Notifications
You must be signed in to change notification settings - Fork 344
Unnecessary delay in exposure notification due to delayed fetching of Diagnosis Keys? #466
Comments
Further on this point: retrieval of Diagnosis Keys seems to be triggered every 24 hours, which further worsens worst-case performance of the system in a situation where:
There seem to be no privacy / technical limitations to enable contact person to learn about exposure status already on the morning of 01-07 (two days earlier). Change from checking for new Diagnosis Keys every 2 hours to every 24 hours has been introduced recently in this pull request. Solution architecture document states one hour interval in Data Format, Data Transfer and Data Processing and Bandwidth Estimations. |
There seems to be one limitation that would prevent running RPI matching every hour: calls to provideDiagnosisKeys() are limited to 20 per day (for privacy protection reasons, to prevent an app from querying the API about contact with single users, and obviously there’s also the battery consumption thing...) |
@mh- thanks for additional info, it seems that original design was already taking this into account by capping calls to provideDiagnosisKeys() at 12 per day. So to recap, in original design we had:
And now we seem to have:
|
This starts to look more like Android implementation issue, since iOS updates diagnosis keys every 2 hours, and queries both days endpoint and hours endpoint for current day (function definition here, called from here). This would mean that delay in exposure notification introduced for iOS users is up to 3 hours, while for Android users up to 48 hours. |
We will provide further details in the next days, right now only some preliminary information with a big disclaimer: It will probably become more concrete or even be corrected in the next days when we have the final confirmation by the respective colleagues. It's Sunday and many colleagues take their well-deserved day off. To my knowledge, both the Android and iOS app behave consistently when it comes to updating the OS-internal Exposure Notification Framework with diagnosis keys - this is done once per day for all platforms. The code which you saw on iOS also triggers the risk calculation, which can be done more often per day (but always with the same diagnosis keys from the current day). You know, the epidemiological parameters for the risk calculation might change, so two different calculations with the same diagnosis keys might actually yield in different results... The reason for that once-per-day frequency is mainly API rate limiting as already outlined by @mh-. On Android you have 20 per day, on iOS even only 15. This rate limiting doesn't only apply to the calls per day, but also to the number of data files which you can send to the OS-internal framework per day. As we always need to present the complete, epidemiologically relevant data of all diagnosis key to the framework, we already have 14 signed data files if we retrieve the diagnosis keys once per day for the relevant period of 2 weeks. So there is no chance of updating the diagnosis keys more often than that. Of course, the backend could provide larger chunks of data (e.g. always the complete data for the respective last 2 weeks), so the number of files for the relevant period of 2 weeks is smaller, which would then also enable to call the framework more often, but that would certainly increase the overall load on the backend infrastructure, as overall more data needed to be served to clients. Probably it also have other drawbacks... Thus, the current approach is from my perspective a balance between the rate limiting of the API, efficient handling of server infrastructure and actions that make sense from an epidemiological point of view. Even the delay mentioned by @kbobrowski might still be OK when the usual incubation period is taken into account. But once again: Take this statement with a grain of salt. We will see further updates in the next days and of course keep this issue open for further discussions. Until then: Enjoy your Sunday! Mit freundlichen Grüßen/Best regards, |
@SebastianWolf-SAP thanks for quick response (on Sunday!) The thing I'm missing is why background task (whatever it is doing, even if just updating epidemiological paramteres) on iOS is triggered every 2 hours while on Android it is triggered every 24 hours, but perhaps it will explained / corrected later. Fully understand that rate limiting on iOS and Android is driving factor for refresh rate. On Android it is limited to every 1.2 hours, on iOS to every 1.6 hours. Why choose refresh rate of every 24 hours though? This seems like quite arbitrary number (obviously related to human circadian rhytm, but this seems to have no relevance here). Initial choice of refreshing every 2 hours seemed reasonable.
I'm missing something here, let me use following user story to illustrate it: I'm meeting with infected person on 01-07. I'm becoming infected on that day. This person receives positive test result on 07-07 and uploads Diagnosis Keys on that day (I'm already contagious then). Full 14 days worth of Diagnosis Keys of a person that infected me will become available at /date/2020-07-07 endpoint. These Diagnosis Keys also become available at /date/2020-07-07/hour/12. Now the problem is that I cannot yet fetch /date/2020-07-07 (it becomes available on next day) and I keep infecting my family / friends. I only learn on 08-07 about the fact that I was exposed. Why I cannot fetch fresh /date/2020-07-07/hour/12 package with 14 Diagnosis Keys of person that infected me and subsequently isolate myself? This seems like a design decision that may result in loss of health / life, as I can be spreading virus in worst case for about 45 hours more than if 2 hour refresh and fetching hourly keys was implemented. To summarize, I think this issue is in fact two seprarate but closely related issues:
Each of these issues introduces 24 hours worst-case delay in receiving exposure notification (independently, so it may result in 48 hours delay), compared to original design which had worst-case 3 hours delay. But let's leave it for working days, enjoy your Sunday as well! |
@kbobrowski If I understood @SebastianWolf-SAP correctly, he has the assumption that you cannot just feed the EN API on the device with a set of DKs that have been uploaded in the last 2 hours, but that you must feed it with all DKs that have been uploaded in the last 14 days, and that each such DK file transaction counts towards the rate limit. However, this feels like a strange restriction, and it's not really obvious from the API doc,
|
@mh- not sure if @SebastianWolf-SAP meant this but I don't believe that this is how a server bundles data. Bundle from 12-06-2020 contains 78 Diagnosis Keys which were live between 2020-05-30 and 2020-06-11, and bundle from 13-06-2020 contains 84 DKs from time period between 2020-05-31 and 2020-06-12, but there is no single DK shared between these two bundles. Hourly DKs are also signed so there is no need to combine them and sign again, and I also believe that API works as you described:
|
Yes, I think we all agree on this, and that's why you need to deliver all of them to the API in one "session" (using one token), before you can ask it for a meaningful risk scoring. |
Ok so I think I'm starting to understand what's going on here. Information about paths of downloaded DK bundles is stored in this KeyCacheDao. We have RetrieveDiagnosisKeysTransaction which fetches keys from the server using asyncFetchFiles function. This function simply checks if files for available dates have already been downloaded, if not blocks until missing days are downloaded, and returns list of files corresponding to all existing .zip files with DKs. Then these files are fed by mentioned transaction to executeAPISubmission function which then feeds all of these files one by one to provideDiagnosisKeys. There is an implicit assumption here that provideDiagnosisKeys function will be called less then 20 times a day, which is ensured by fetching only daily keys, only every 24 hours and deleting outdated files. As a result executeAPISubmission is called once a day with a list of maximum 14 files, and then provideDiagnosisKeys will be called maximum 14 times a day (in one short burst, by iterating over provided list). This seems like a strange design, I thought that it's possible to simply feed Exposure Notification framework with new DKs, they will land in internal LevelDB database and EN framework will simply recalculate risk score. Or just pass in one call a list of all files (daily and hourly bundles). There is this statement in docs of executeAPISubmission:
Why feed only single-element batches? It seems that we can call this function 20 times a day (each time with multiple files) as stated in Google's documentation:
On the other hand Apple documentation linked by @SebastianWolf-SAP states that:
This is a big difference in functionality between limiting "calls to a function" and "data files passed", is there a typo in one of these documents? Or do Apple and Google frameworks have this much discrepancy in functionality? In any case, if this is true that there is this discrepancy I don't think that this is the reason for downgrading functionality of one app in order to match lower functionality level of the other app. It is in the best interest of iPhone users that all Android users are notified as fast as possible about their exposure status, such that they have less chance of potentially infecting people around (including iPhone users). |
@SebastianWolf-SAP wrote:
I think @kbobrowski is right to point out unecessarily long delays. One main value this app provides is faster time-to-notification after positive test. But noticing symptoms (or receiving an exposure notification) and getting a test result take 2-3 days even in good times. During that time, contacts are already infectious themselves. If another 2-3 day delay is introduced, the time advantage is essentially lost and the infection has hopped to the next level of contacts. (Where second-degree notification corona-warn-app/cwa-wishlist#24 would come in handy) I hope this delay issue can be clarified in the coming days. |
Wow, there has been quite a few additional comments in the last hours. :) Let me just confirm quickly that we will definitely clarify that issue in the upcoming days with all the required details. Mit freundlichen Grüßen/Best regards, |
@SebastianWolf-SAP thanks for keeping us updated :) just one additional comment: Google's reference Android implementation provides files with Diagnosis Keys in batches and removes these files after confirming that they were submitted to Exposure Notification framework. I also can see by using apps from different countries that Diagnosis Keys are stored inside EN framework in app_en_diagnosis_keys directory (simply as export.bin files). This led me to believe that intended way of interacting with EN is to simply feed it new Diagnosis Keys (in batches, up to 20 times a day) and just let it do the job and notify app in case contact with infected person was determined. |
Some final thougths: what still confuses me is this part of @SebastianWolf-SAP response:
This is reflected in current implementation, where last 14 daily bundles of DKs are stored in local storage and provided every day to provideDiagnosisKeys function. What is really confusing is why we need to repeadedly provide 14 times the same daily bundle to the framework over the course of 14 days. The only reason which comes to my mind is that epidemiological configuration may change and DKs need to be re-evalueated, but this looks like solving right problem in a wrong place, let me explain:
Let's consider alternative approach:
The latter solution seem to provide following advantages over existing one:
|
I guess one aspect in this is also: Will the |
@mh- I don't believe that retrospective updating of transmission_risk_level is in scope because current design effectively prevents it - apps store daily bundles for 14 days and there seem to be no mechanism to trigger re-downloading of an old bundle. The apps also feed all of the bundles they have stored to provideDiagnosisKeys every day. Even if some DKs would be assigned new transmission_risk_levels and published in a new day bundle, the same DKs but with old transmission_risk_level would still be provided to EN along the updated ones. In "alternative approach" described in my previous comment there would be no problem to implement this feature though should it be needed, DKs with updated transmission risk would just be pushed to new hourly bundle (there would be no previously downloaded DKs with same TEK but different transmission risk level passed alongside to provideDiagnosisKeys) |
Both frameworks currently limit the number of files we can submit. The risk calculated for a certain exposure incident might also change over time: Not due to the data changing, but due to the amount of days it lies in the past (e.g. an exposure event from yesterday weights differently, than one of 10 days ago). To take this into consideration for the calculation of the risk (and to properly include the new attenuation buckets), we need to feed all keys into the API every day... And this is, how we already hit the API rate limit by doing the matching once per day. |
If you have direct line to the Apple and Google Exposure Notification teams, it might be worth highlighting this to them. I can't imagine that calling the API once a day at most was what they had in mind with their rate limits. |
@tklingbeil thank you for the response, I have some further questions:
Thank you in advance for your patience, I understand that you are very busy this week but it seems that current worst-case 48 hours delay is bounded to have serious consequences on someone's life / health, and this seems like something that can be at least partially avoided. |
Indeed we did reach out to them already, and they are looking into the issue & see what they can do. They also noted that when they make changes, that it will take some time until this is shipped to all the phones - so we can't expect any adjustments within the next 2 weeks or so. |
I've asked Google about it at their exposure-notifications-feedback mailing list:
And got response (also got permission to publish it here):
|
Diese Studie: https://www.csh.ac.at/wp-content/uploads/2020/06/CSH-Studie-%C3%84rztefunktdienst.pdf kommt zu dem Schluss, dass falls die Isolation 6-12 Stunden früher angefangen wird, dies die Ansteckungsrate um bis zu 58% reduzieren kann. Unter diesem Hintergrund wäre ich dafür, die Keys mehrmals täglich zu aktualisieren.. Bis zu 48 Stunden zusätzliches Delay sind hier einfach inakzeptabel, denn dann vergehen im Worst-Case 4-5 Tage zwischen Test -> Notification.. "Unsere Studie zeigt, dass Maßnahmen, welche die effektive Infektiositätsdauer auch nur um wenige Vorschlag:
|
@ChristopherSchmitz I guess you can answer this best. |
I'm surprised this issue has not gotten more attention, it seems extremely important. I can't imagine that there isn't a better way of doing this than the current implementation here. All comments are good, only one I find questionable:
Is backend load really an issue for this project? Such claims should be quantified - does a change cost 1k, 10k or 100k € extra per day? And weighed against potential benefit: earlier notification, earlier test, reduced transmission, somehow quantified (don't feel I know enough for now). Rather than just being done by gut feeling. How can a delay of 24hr, even 12hr be OK? I think getting the RKI in on this would be helpful too, @BenzlerJ, maybe? Some assumptions need to be checked here. I don't want to blame past design decisions, you had to get it out quickly and managed to do that. But that doesn't stop us from rectifying potentially subideal decisions that save lives. Given the life/death importance of this issue, it would be great to have another update from maintainers on their current views after 3 days: @SebastianWolf-SAP @tkowark @tklingbeil @cfritzsche @christian-kirschnick I come in peace. |
The currently released app version 1.0.2 makes exactly 3 attempts (BackgroundConstants.WORKER_RETRY_COUNT_THRESHOLD) in less than 1 minute, once per day (DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY), to try and download Diagnosis Keys. (@corneliusroemer thanks for the information about 1.0.2) |
@mh- Small correction: At least for Android 1.0.2 was pushed through play store yesterday/today Update: @mh- raises a point about keys only being attempted to be downloaded once a day, 3 times. The fact that no further attempts are made when the initial attempt failed may be a cause behind all sorts of issues that report lack of exposure key updates: e.g.
@mh- @kbobrowski Do you think it would make sense to open a new issue to suggest that more tries to key retrieval be made throughout the day? It's related to the issue here but can be fixed independently. |
@corneliusroemer maintainers are already pointed from the issues you cited to this issue, perhaps one issue focusing on this specific problem will be created, I won't do it for now since I have not gathered enough information myself. It's important to keep signal to noise ratio high so until I have enough info I tend not to open new issues. If you or @mh- feel you are in a position to raise new specific issue then it's up to you of course :) |
@mh- that's an interesting finding, also quite interesting is this part of the code: /**
* Get maximum calls count to Google API
*
* @return Long
*
* @see BackgroundConstants.DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY
* @see BackgroundConstants.GOOGLE_API_MAX_CALLS_PER_DAY
*/
fun getDiagnosisKeyRetrievalMaximumCalls() =
BackgroundConstants.DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY
.coerceAtMost(BackgroundConstants.GOOGLE_API_MAX_CALLS_PER_DAY) it indicates that initial intention was indeed to call provideDiagnosisKeys() multiple times a day, each time with a list of Diagnosis Keys. Right now if DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY was set to anything higher than 1, let's say 2:
so it may be not easily fixable without some deeper changes. Crucial thing regarding this issue is once again: to understand limits on provideDiagnosisKeys, and why we need to feed single-element batches. Value of DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY was changed from 12 (as we could have expected, fetching every 2 hours) to 1 in this pull request. |
Ideally, the "cwa serial interval", i.e. the delay between receiving an alert, then getting a test done, receiving the test result, uploading your keys and then finally the next "generation" of alerts being received should be less than the COVID-19 serial interval, which appears to be 4-5 days. I have some doubts about the feasibility of this, since laboratories and public authorities might not be fast enough for this to work in all cases. If the purely technical delay between uploading keys and alerts popping up is something like 24 hours on average, 48 hours in a bad case, and even more in worst case (backend or network issues during the 3 minute daily "attempt to update" interval) then it's even more unlikely.
Edit: Thank you Christian for pointing that out to me (in the next comment). It's a good approach, and I've rushed to an incorrect conclusion there. Looks like that part of my concern is invalid and we should see some keys published soon, as uploads include more and more keys. Sorry for the noise. I still feel like it's quite important to reduce the not-extremely-uncommon-bad-case of 48 hours delay as much as possible, even if that means the backend needs to be 14x as beefy. Also the "3 attempts in a 3 minute time window once a day" issue is very important, but it looks like this is already being worked on. |
CWA Version 1.7.0 (under Android: https://github.com/corona-warn-app/cwa-app-android/releases/tag/v1.7.0) and CWA Version 1.7.1 (under iOS: https://github.com/corona-warn-app/cwa-app-ios/releases/tag/v1.7.1) were released today which introduced key fetching all 4 hours (if the phone is connected to the internet via Wi-Fi). 🎉 |
Dear community, Regarding the update to CWA 1.7: Please note that it is a staged rollout. Over the next two days, 100% of the users should have received the update. We would appreciate very much your feedback regarding this issue over the next few days! Best wishes, Corona-Warn-App Open Source Team |
I am seeing background updates spaced 4 hours apart which is excellent. I didn't see yet if the hourly endpoint is checked though. Thanks for finally making this work with all parties :-) |
@cfritzsche you have Android, yes? On iPhone we can see more details in the ENF log. Thanks! |
Nope, iOS here. Sure I can see that the keys change every few hours. But I can't verify they are fetched from the hourly endpoint. |
@cfritzsche did you have a look at your ENF log file? Either export it as json, or click through it directly in the settings. What do I see here:
So the pattern is like expected for hourly files of today, and daily files for the past 14 days. |
Thanks @ndegendogo . Actually isn't it waste to keep checking the old dailies after the first check of the day? Can they still be added to? Then TSIs concerns over infrastructure load were correct after all, the same daily keys are now downloaded by the same devices much more often. |
@cfritzsche well, this is same behaviour as before 1.7. As for exact reasons why the same files are checked over and over again, I can only guess:
|
@kbobrowski can't we close this issue now with 1.7.1 out? |
Dear @kbobrowski, and community, Thanks to all for contributing here. We think that this issue has been mitigated with the latest release of CWA 1.7.1. As @cfritzsche suggests, we are about to close this issue. Thanks again. Best wishes, Corona-Warn-App Open Source Team |
Dear all, Thanks again, we will close this issue now. Best wishes, Corona-Warn-App Open Source Team |
As a (well, it's not a proof, but let's call it:) an anecdotical indication how fast the key distribution part can be now, let me refer to this ticket:
This is how it should be! |
Current API allows for querying days on which Diagnosis Keys became available:
Apps use this endpoint to check if these dates are already in cache, and if not to download Diagnosis Keys for missing days using endpoint:
Let's assume a scenario where someone had contact with infected person. Infected person uploads Diagnosis Keys on 2020-06-12, but these Diagnosis Keys are bundled into package that is only available to download on 2020-06-13. This introduces a day of delay, during which a person who had contact with diagnosed person is unaware of this contact and is not able to self-isolate, potentially endangering people around.
Interestingly it seems to be possible to get Diagnosis Keys uploaded on current day, but only using hour API endpoint:
This endpoint seems to be only used in a function which is marked to be dropped before release. Functionality to fetch hours for current day existed, but it was removed in this pull request.
Is there a reason not to continuously download Diagnosis Keys which are being made available during current day, and instead wait until package for complete day is ready?
As far as I understand this would not introduce any privacy issues since they are already handled by upload mechanism documented here, and would just let people know faster that they were exposed. Privacy of this solution can also be confirmed by looking at latest Diagnosis Key in each bundle, e.g. in a bundle from current day (2020-06-13, hour 12) latest Diagnosis Key has interval number 2653200, meaning it was generating RPIs yesterday:
Internal Tracking ID: EXPOSUREAPP-1567
The text was updated successfully, but these errors were encountered: