Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

Difference in trends for 7-day incidence and 7-day average #528

Closed
2 tasks done
nilsalex opened this issue Feb 16, 2021 · 67 comments
Closed
2 tasks done

Difference in trends for 7-day incidence and 7-day average #528

nilsalex opened this issue Feb 16, 2021 · 67 comments
Labels
bug Something isn't working mirrored-to-jira This item is also tracked internally in JIRA

Comments

@nilsalex
Copy link

nilsalex commented Feb 16, 2021

Avoid duplicates

  • Bug is not mentioned in the FAQ
  • Bug is not already reported in another issue

Technical details

  • Device name: iPhone 11
  • iOS Version: 14.4 (18D52)
  • App Version: 1.12.1 (0)

Describe the bug

As of now (16.02.2021, 17:11 CET), CWA shows a 7-day average of 7,274 confirmed infections and a 7-day incidence of 58.7/100,000. For the 7-day average, an arrow pointing towards the lower right indicates a downward trend, while for the 7-day incidence, an arrow pointing to the right indicates a stable trend. Yesterday, the difference was even higher: a downward trend vs an upward trend.

My understanding is that both numbers are related by a factor like

(7-day incidence) = (7-day average) * 7 * 100,000 / (about 83,000,000)

and therefore, the trend should always be the same. Or is there more to it?

Steps to reproduce the issue

Open the app and swipe through the widgets.

image

image

Expected behaviour

Same trend for both indicators.


Internal Tracking ID: EXPOSUREAPP-5225

@Ein-Tim
Copy link
Contributor

Ein-Tim commented Feb 16, 2021

@nilsalex

I think the cause for this is the following:

Die Anzahl der Fälle - und deren Differenz zum Vortag - und die Anzahl der Todesfälle beziehen sich auf Fälle, die dem RKI täglich übermittelt werden. Dies beinhaltet Fälle, die am gleichen Tag oder bereits an früheren Tagen an das Gesundheitsamt gemeldet worden sind. Bei den Fällen in den letzten 7 Tagen und der 7-Tage-Inzidenz liegt das Meldedatum beim Gesundheitsamt zugrunde, also das Datum, an dem das lokale Gesundheitsamt Kenntnis über den Fall erlangt und ihn elektronisch erfasst hat.

(Source: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html).

This would explain the difference, or?

(pinging @MikeMcC399 since he has a great understanding of such things)

@MikeMcC399
Copy link
Contributor

@Ein-Tim
I'm definitely not an expert on these statistics, but I can Google!

Start first by tapping the ℹ️ icon in the app for the definitions.

Then access the raw data through
https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html and a link in that page to
https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Daten.html

According to that Excel file in tab "BL_7-Tage-Inzidenz" the 7-Day Incidence on Feb 16, 2021 of confirmed new infections was 58.7 and 7 days before that on Feb 9, 2021 it was 72.8. So that was a downwards trend of 14.1 or -19% based on the Feb 9 data.

Using the tab "BL_7-Tage-Fallzahlen" I couldn't find values which matched the ones in the app, so I used the tab "Fälle-Todesfälle-gesamt" instead.
The sum of Differenz Vortag Fälle for Feb 10 to Feb 16, 2021 is 50919, divided by 7 is 7274.
The sum for Feb 3 to Feb 9, 2021 is 63839, divided by 7 is 9120.
This is a difference of 1846 or -20% compared to the Feb 9 data.


Based on that I don't understand why the app is showing Trend: Steady for the 7-Day Incidence when, according to the figure I quoted, the trend is 19% down and this is more than the 5% threshold to declare it as Trend: Downwards and mark it with a green arrow.

This needs to be looked at.

Thanks to @nilsalex for bringing this up!

@Ein-Tim
Copy link
Contributor

Ein-Tim commented Feb 16, 2021

Thank you @MikeMcC399 for checking (I can Google too, but I have to admit that you are often better in explaining (such) things than me 😅)

I assume this also affects Android, or?

If yes, please move it to the documentation repo.

@MikeMcC399
Copy link
Contributor

@Ein-Tim
Yes, this also affects Android, so it should be in the documentation repo.

I think it should be looked at urgently because the 7-Day Incidence value and the trend is the one figure that everybody, including politicians, are looking at to influence the decision about the easing of lockdown.

@MikeMcC399
Copy link
Contributor

For ease of reference here are the RKI daily reports for Feb 16, 2021 and for 7 days previously on Feb 9, 2021.

2021-02-09-en.pdf
2021-02-16-en.pdf

These show the figures

Date 7-Day Incidence per 100,000 population
Feb 9, 2021 73
Feb 16, 2021 59

which is a clear downwards trend (that I am sure we are all happy to be seeing 👏!)

@MikeMcC399
Copy link
Contributor

MikeMcC399 commented Feb 17, 2021

The value today, Feb 17, 2021, for 7-Day Incidence is 57.0 and the trend is down, which looks good.

Date 7-Day Incidence per 100,000 population
Feb 10, 2021 68
Feb 17, 2021 57

The data for yesterday should still be investigated though.

@Ein-Tim
Copy link
Contributor

Ein-Tim commented Feb 17, 2021

@dsarkar Could you take a look at this and transfer it to the correct repo?

Thanks!

@dsarkar dsarkar transferred this issue from corona-warn-app/cwa-app-ios Feb 17, 2021
@dsarkar dsarkar added bug Something isn't working in review Moderators are investigating how to best proceed with the issue labels Feb 17, 2021
@MikeMcC399
Copy link
Contributor

MikeMcC399 commented Feb 18, 2021

The value today, Feb 18, 2021, for 7-Day Incidence is 57.1 with "Trend: Steady".

Date 7-Day Incidence per 100,000 population *
Feb 11, 2021 64.2
Feb 18, 2021 57.1

The incidence has decreased by 7.1 or 11% of 64.2, so why does it show "Trend: Steady" not "Trend: Downwards"?

* Values from Fallzahlen_Kum_Tab.xlsx

@MikeMcC399
Copy link
Contributor

It looks like the trend indicator is just comparing to the value from the previous day, whereas the help text says "The trend compares the value from the previous day with the value from two days ago or, for the 7-day trends, the average value from the last 7 days with the average value from the 7 days prior to that." So the displayed comparison does not correspond to the method described in the help text. (Or I have misunderstood!)

Date 7-Day Incidence per 100,000 population
04.02.2021 80,7
05.02.2021 79,9
06.02.2021 77,3
07.02.2021 75,6
08.02.2021 76,0
09.02.2021 72,8
10.02.2021 68,0
11.02.2021 64,2
12.02.2021 62,2
13.02.2021 60,1
14.02.2021 57,4
15.02.2021 58,9
16.02.2021 58,7
17.02.2021 57,0
18.02.2021 57,1

The full help text from statistics_explanation_trend_text is:

EN

"Trend"

"The arrow direction indicates whether the trend is increasing, decreasing, or remaining steady – that is, demonstrates a deviation of less than 1% compared to the previous day or 5% compared to the previous week. The color indicates this trend as positive (green), negative (red), or neutral (gray). The trend compares the value from the previous day with the value from two days ago or, for the 7-day trends, the average value from the last 7 days with the average value from the 7 days prior to that."


DE

"Die Pfeilrichtung zeigt an, ob der Trend nach oben oder nach unten geht oder relativ stabil ist, d.h. eine Abweichung von weniger als 1% im Vortagesvergleich bzw. 5% im Vorwochenvergleich aufweist. Die Farbe bewertet diesen Trend als positiv (grün), negativ (rot) oder neutral (grau). Der Trend vergleicht den Wert vom Vortag mit dem Wert von vor zwei Tagen bzw. für die 7-Tage-Trends den Mittelwert der letzten 7 Tage mit dem der vorausgegangenen 7 Tage."

@dsarkar dsarkar added the mirrored-to-jira This item is also tracked internally in JIRA label Feb 18, 2021
@dsarkar dsarkar removed the in review Moderators are investigating how to best proceed with the issue label Feb 18, 2021
@dsarkar
Copy link
Member

dsarkar commented Feb 18, 2021

@MikeMcC399 regarding your last comment:

  • I understand these values are already 7-day averages
  • I think I can follow you, you are saying one should compare 17.2./57.0 with 10.2./68.0 which is clearly trending down.

@MikeMcC399
Copy link
Contributor

@dsarkar

I understand these values are already 7-day averages
I think I can follow you, you are saying one should compare 17.2./57.0 with 10.2./68.0 which is clearly trending down.

Correct, yes, that is what I am saying. That is how I understand the explanation in the help text. Is that the way you understand the help text as well?

@dsarkar
Copy link
Member

dsarkar commented Feb 18, 2021

@MikeMcC399 Yes, I think I can follow through. For today and today-7 days I also get -11%, for yesterday and yesterday-7 i get -16%

Even (I think that would be wrong) taking averages of the averaged values, I get averaging 11-17 Feb (59.8) and comparing average 4-10 Feb (75.8) a change of -21.1%.

@MikeMcC399
Copy link
Contributor

MikeMcC399 commented Feb 18, 2021

@dsarkar

For today and today-7 days I also get -11%, for yesterday and yesterday-7 i get -16%

Agreed! 👍

Even (I think that would be wrong) taking averages of the averaged values, I get averaging 11-17 Feb (59.8) and comparing average 4-10 Feb (75.8) a change of -21.1%.

From my hazy memory of statistics, averages of averages is not a good thing. I think you should discard those numbers and stick with the first line.

Could you pass the issue on to the originators of the statistics?

I assume that the statistics are calculated by RKI and transferred to the CWA infrastructure. I couldn't find any new documentation in https://github.com/corona-warn-app/cwa-documentation covering the statistics calculations and distribution. It looks to me like there is a binary file pulled from /version/v1/stats on the DOWNLOAD_CDN_URL which suggests that the app just has the job of displaying the data, not calculating it. So if there is an issue with what is displayed then something further upstream needs to be looked at.

@dsarkar
Copy link
Member

dsarkar commented Feb 18, 2021

@MikeMcC399 indeed, I was told that the app only displays statistical data, it does not calculate it. I created an internal ticket 5225, and additionally, I will bring this up today in a meeting.

@GisoSchroederSAP
Copy link

GisoSchroederSAP commented Feb 18, 2021

All,
due to a number of questions regarding our statistics I re-calculated all values for "Neuinfektionen" (new infections), the respective average values, the Incidence values and double-checked the trends - back until January 25.
Based on the results let me emphasize the following points:

  1. The CWA just presents the data, calculation happens on the backend side.
  2. All numbers presented in the CWA can be reproduced in MS Excel, those numbers are all correct.
  3. All arrows and the respective coloring in the App can be explained, they are correct.
  4. Still, the referenced wording above seems to lead to confusion about meaning of the value, aggregation of the value, and translation of the dynamics into the arrow indicator.
  5. The naming of each statistics tile in the CWA is clear, but still will be interpreted differently by the folks.
  6. Yes, there are days when the arrows don't follow each other; the one is rising, the other one goes down or stays. Again, this can be proven and explained statistically.

Therefore, we decided to start a new task of communication - it's not yet clear if it becomes a blog, an FAQ entry or any other kind of media. We'll try to "translate" the intention of the statistical metrics shown in the CWA and what are the key drivers for the "trend arrow" indicator.

Believe me, this will not be an easy and fast task, as it challenges us to gain trust by "translating" the statistics into consumable portions of knowledge - how to read the tiles. So, I kindly ask you to stay patient.
Furthermore, I want to encourage you to give feedback, once we provide first results in this matter.

@GisoSchroederSAP
Copy link

One more word to @MikeMcC399 and @nilsalex : I cannot comment the full issue here. But I want to let you know (and hope you can adjust your viewpoint and accept): The 7-day-Incidence is not a 7-day-trend. Instead, the 7-day-Incidence is a normalized value accurate to the current day only, but based on the sum of new infections during the last 7 days . Therefore, this value must not compared to the Incidence value of "day-7" but simply to the Incidence value of yesterday (that is, in fact, based on the new infections of those last 7 days).

@MikeMcC399
Copy link
Contributor

@GisoSchroederSAP

Thank you for the response and information!

It seems that the help text is difficult to interpret correctly concerning what falls under the category of a "7-day trend". Could you help us out so that we understand this better?

For each of the four values which have a trend arrow:

  1. Confirmed New Infections: 7-Day Average
  2. Warnings by App Users: 7-Day Average
  3. 7-Day Incidence
  4. 7-Day R Value

... could you let us know if the arrow (Upwards, Downwards or Steady) is calculated based on comparing to the corresponding number displayed the previous day or the number displayed 7 days previously?

For "7-Day Incidence" you told us in the previous post that the trend depends on the number displayed from the previous day.

@GisoSchroederSAP
Copy link

GisoSchroederSAP commented Feb 18, 2021

We are going to write that down, I promise.
The naming of "7-Day Incidence " may mislead the reader, it ist to be read as
"Today's Incidence (based on the sum of nationwide infections of the last 7 days normalized to 100.000 of all German citizens)" - but certainly, this is much longer than the initial name, and maybe even not really easier to understand, sorry.

@nilsalex
Copy link
Author

nilsalex commented Feb 19, 2021

@GisoSchroederSAP Thanks for looking into this!

The naming of "7-Day Incidence " may mislead the reader, it ist to be read as "Today's Incidence (based on the sum of
nationwide infections of the last 7 days normalized to 100.000 of all German citizens)"

I don't think there is any confusion about the definition of the 7-day incidence. And because this metric is defined as above, I really don't get how it can follow a different trend than the 7-day average, which is also -- please correct me if I'm wrong -- based on the sum of nationwide infections of the last 7 days. So I guess my question really is:

Is it not the case that both numbers are the same up to a constant relative factor (of about 7*100,000/83,000,000)? If so, a user cannot expect to see different trends for both numbers, right?

@MikeMcC399
Copy link
Contributor

@nilsalex
The number used for the population of Germany by RKI is close to the 83 Million which you assumed. In https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Daten.html => https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Kum_Tab.xlsx
Tab "Tageswerte berechnet"
Cell A36
it uses the number 83166711 (which is the number displayed on https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bevoelkerung/Bevoelkerungsstand/Tabellen/zensus-geschlecht-staatsangehoerigkeit-2019.html for the date 31.12.2019).

I would also like to understand the difference in the two trends. I agree that it is not intuitively obvious that they should be different, so I'll be waiting with interest for the details of the calculations. I take on the statement from @GisoSchroederSAP that the calculations are correct, so I expect the reasons for differences will be caused by the calculation methods used.

@nilsalex
Copy link
Author

nilsalex commented Feb 20, 2021

@MikeMcC399 Yes, I agree. Such an effect can be an explanation for the discrepancy. However, it should not be the reason. Because, the expectation is clear:

I = S/N
I'/I = S'/S

This does not change for values calculated from regional values. Any objections to this basic fact by @GisoSchroederSAP are wrong on the merits.

We cannot dispute proven mathematical facts.

Now, if there are different data sources for both values, we should settle on one of them. Absent a good reason, but which reason would that be?

Edit: Sorry, I did not see your latest comment. So it is the explanation. Thanks for digging in to this! So I would suggest to consolidate the metrics. The current state breaks expectation by any reasonable user.

@GisoSchroederSAP
Copy link

Thanks for making this double-check, @MikeMcC399 .
And yes, the "wording" of the help text was the very first I stated internally to the product owner. This is already under review.

@nilsalex
Copy link
Author

nilsalex commented Feb 20, 2021

I cannot state this enough:

The Incidence is "bound" to the weighted number of regional new infections (based on population), it is not a rolling average number across the nation.

Is just false. Assuming both metrics refer to the same set, of course---that is, both or none are correct w.r.t. symptom onset.

(To be perfectly clear: Yes, it is the weighted average of local incidences. But incidentally (pun intended), this translates into the nationwide incidence which is the ratio of nationwide totals. By multiplication with national population, you have the nationwide infections over the last 7 days.)

The hostility towards me because you disagree with this basic fact has no place here. I am truly disappointed that people are treated this way in this community.

Now, you say "won't fix" because you have a good reason for using different numbers (one corrected, one not corrected, whatever). That is kind of acceptable, although not optimal. But your entire argument and personal attacks did not revolve around this.

@GisoSchroederSAP
Copy link

GisoSchroederSAP commented Feb 20, 2021

@nilsalex ,

This does not change for values calculated from regional values. Any objections to this basic fact by @GisoSchroederSAP are wrong on the merits.

Then just explain the difference of all these number I'/I and S'/S for any given day - those are calculated directly from the only one source (in fact, the source numbers are all in the only one table above, not from different sources - and yes, these numbers are quite close together.

image

If you excuse me, I'm going to stop the discussion here.
We have a different view on this, I can live with that and will return to my task.

@MikeMcC399
Copy link
Contributor

@GisoSchroederSAP

When the facts have been checked with the product owner, we should also consider updating the FAQ https://www.coronawarn.app/en/faq/#further_details including the point about how the data movements of 7-Day Average and 7-Day Incidence are only loosely coupled with an explanation of why this is so.

Probably this has not been obvious before because the RKI daily situation reports do not show a trend for these two indicators. The press tends to use the 7-Day Incidence alone. This may be the first time that the two values have been displayed together closely and with trends. The display is likely to cause confusion to other people even though it is technically correct.

@GisoSchroederSAP
Copy link

Thanks, @MikeMcC399 , I can already state that also the FAQ is under review. We definitely will enhance this communication - over time.

@Ein-Tim
Copy link
Contributor

Ein-Tim commented Feb 20, 2021

I'm curiously reading this, and I really don't understand anything about these numbers, etc, so I won't make any statement here.

But I want to ask:

What should we do now, IIUC @nilsalex does not consider this as solved, but @GisoSchroederSAP does?
Maybe the best way is what has been proposed above by @GisoSchroederSAP:

You may address your statement of mathematical inconsistency of the data directly to the RKI and to the T-Systems data analysts. I'm happy to help you with finding the right contacts, if you wish

Would that be a good solution for all parties involved here?

@GisoSchroederSAP
Copy link

I never accused you of hostility or insults. I never used those idiom mentioned above.
I only explained what I think is right and what I think is wrong with your argumentation. Please, excuse if this threatened you - this was definitely not my intention.

Again: I offer support, getting you contacts at the source of the data and calculations. You may discuss and resolve this there.

Good evening.

@nilsalex
Copy link
Author

@nilsalex ,

This does not change for values calculated from regional values. Any objections to this basic fact by @GisoSchroederSAP are wrong on the merits.

Then just explain the difference of all these number I'/I and S'/S for any given day - those are calculated directly from the only one source (in fact, the source numbers are all in the only one table above, not from different sources - and yes, these numbers are quite close together.

image

If you excuse me, I'm going to stop the discussion here.
We have a different view on this, I can live with that and will return to my task.

Well, in fact:
table

@nilsalex
Copy link
Author

nilsalex commented Feb 20, 2021

@Ein-Tim
If all calculations are correct and the discrepancy is just due to different underlying numbers, there are two options:

  1. Decide that this is for a good reason (I'd be curious as to what this reason would be) and communicate this clearly within the app.
  2. Fix this. Use the source that is better by some metric.

If there are errors in calculations (I mean, well, the excel screenshot above clearly contains rounding errors, as pointed out in my previous comment, but I trust that this is unrelated to the actual production calculation), fix them.

So, discussing 1) or 2) may warrant getting RKI or similar involved for your discussions. For me, I don't see the need to discuss anything, as 1) or 2) really is your decision.

That the expectation any reasonable user has, which is

I = S/N
I'/I = S'/S

for comparable datasets is right is a fact, for which I don't see the need for further clarification.

Again, you may break with this expectation for a good reason (that is, consider this as solved). But I would be very curious about this reason.

@Ein-Tim
Copy link
Contributor

Ein-Tim commented Feb 20, 2021

Just to make this clear, I'm neither a Developer/Community Manager nor related to the RKI/SAP/T-Systems in any way.
I'm just a user/community member and want that everybody here is happy at the end.

The Corona-Warn-App is showing the official numbers published by the RKI, so if there is any problem regarding these numbers (or the trend indicators), I would speak to the RKI.

So IMHO the best option for you would be to talk to the experts, as offered by @GisoSchroederSAP.

@nilsalex
Copy link
Author

Just to make this clear, I'm neither a Developer/Community Manager nor related to the RKI/SAP/T-Systems in any way.
I'm just a user/community member and want that everybody here is happy at the end.

Oops, sorry :-)

The Corona-Warn-App is showing the official numbers published by the RKI, so if there is any problem regarding these numbers (or the trend indicators), I would speak to the RKI.

So IMHO the best option for you would be to talk to the experts, as offered by @GisoSchroederSAP.

Well, I don't need to do that. It may be necessary for the decision the developers have to make.

@nilsalex
Copy link
Author

nilsalex commented Feb 20, 2021

Oh, one more thing: Does anyone have the population data for federal states used by the RKI and by the App? I would very much like to know them. Is it verified that those numbers match? This may in fact be a proposal I would bring towards the RKI: Include the population data in the daily numbers or at least document the data at a prominent place.

Also, is the code where the calculations are performed publicly available? I am not able to find it.

@Ein-Tim
Copy link
Contributor

Ein-Tim commented Feb 20, 2021

@nilsalex

Oops, sorry :-)

No need to apologize, I should have made this clearer 🙂

Well, I don't need to do that. It may be necessary for the decision the developers have to make.

Okay, since @GisoSchroederSAP is one of the Developers (at least he is inside of the Development Team of CWA) the decision seem to be already made...

@GisoSchroederSAP
Copy link

GisoSchroederSAP commented Feb 20, 2021

Sorry, not a developer anymore since decades. I am just working for the Community and with the Community, trying to answer questions, to follow up on issues, provide additional input, and to translate proposals into development requests.

As mentioned earlier, I already involved other data analysts and product management in this issue. Beside the fact, the CWA just presents the values coming from the servers, I tried to explain the way of calculation here. As we disagree here, @nilsalex , again I invite you one more time to convince the experts on the source of the data.

So far, I don't see a calculation issue/bug here. However, multiple times I agreed:

  • Yes, we will review your concerns deeper.
  • Yes, we are going to provide additional communication/documentation on how to read the statistics, the numbers and trends, just to avoid misinterpretations.
  • If possible, we will even change the help text within the app (which will be the hardest task, as this needs to go through translation and three approval layers. Anyway, we try).

So, if you want to question the trend indicators, feel free to ping me and I try to connect you to the experts.
Cheers, Giso

@Ein-Tim
Copy link
Contributor

Ein-Tim commented Feb 20, 2021

@GisoSchroederSAP

Sorry, not a developer anymore since decades. I am just working for the Community and with the Community, trying to answer questions, to follow up on issues, provide additional input, and to translate proposals into development requests.

Thank you so much for this information, I did not know this 🙂

Everybody, have a good night.

@nilsalex
Copy link
Author

nilsalex commented Feb 21, 2021

As mentioned earlier, I already involved other data analysts and product management in this issue. Beside the fact, the CWA just presents the values coming from the servers, I tried to explain the way of calculation here.

Oh, that is an important clarification. Of course, the mobile app does not perform any calculations.

What I understand is:
The distribution service seems to parse a JSON. The properties relevant for this discussion are

  @JsonProperty("infections_effective_7days_avg")
  private Double infectionsReported7daysAvg;
  @JsonProperty("infections_effective_7days_avg_growthrate")
  private Double infectionsReported7daysGrowthrate;
  @JsonProperty("infections_effective_7days_avg_trend_5percent")
  private Integer infectionsReported7daysTrend5percent;

  @JsonProperty("seven_day_incidence_1st_reported_daily")
  private Double sevenDayIncidence;
  @JsonProperty("seven_day_incidence_1st_reported_growthrate")
  private Double sevenDayIncidenceGrowthrate;
  @JsonProperty("seven_day_incidence_1st_reported_trend_1percent")
  private Integer sevenDayIncidenceTrend1percent;

Now, I was under the assumption that the backend performs some calculations to provide these values---because @GisoSchroederSAP talked in great length about the bottom-up calculation, etc.

My question: What is the exact source for each of these values? Does the CWA backend perform any calculations itself?

I would be grateful to anyone who can answer this.

@GisoSchroederSAP
Copy link

I already mentioned in an early statement here with a similar summary like the last one above, that I could reproduce all the numbers and trends by the public-available data sources that we discussed here earlier.

But to detach the discussion from my personal view, I just transferred your request to the product owner and to one of the T-Systems data analysts, @nilsalex. Let's see, what we get out of there. Maybe, they forward this to the RKI directly. As soon as I get a response, I'll share it here.

All, enjoy the weekend.

@MikeMcC399
Copy link
Contributor

MikeMcC399 commented Feb 22, 2021

Checking the values and the trends today, they are consistent with what we already found out.

Statistics 2021 02 22

Using the historical data from https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Kum_Tab.xlsx the 7-Day Average of 7,420 can be confirmed. The value of the 7-Day Average 7 days before that on reporting day Feb 15, 2021 was 7,206 (50,442 / 7) - that is adding the values from Feb 9 to Feb 15, 2021 "Differenz Vortag Fälle" in "Fälle-Todesfälle-gesamt". So the 7-Day Average has gone up by 214 cases, or 3.0% of 7,206. The trend of 3% is less than the 5% hurdle, so it is categorized as a Steady trend.

From the same Excel file the value of the 7-Day Incidence 60.2 from yesterday Feb 21, 2021 can be extracted. Today's value of 61.0 is an increase of 0.7 or 1.2% of yesterday's value of 60.2. The trend hurdle for comparisons with the previous day is 1%, so this trend of 1.2% is classed as Upwards.

So the data and the display in the app agree with the base data from the Excel sheet published by RKI. 👍

Edit: Sorry about the decimal point and thousands separator in the screenshot. I had the locale on the device set to English (Germany) which produces strange results. I updated the text above to use comma as thousands separator and dot as decimal point, which is the usual way for English texts.

@MikeMcC399
Copy link
Contributor

@nilsalex

My question: What is the exact source for each of these values? Does the CWA backend perform any calculations itself?

I asked and received an answer in corona-warn-app/cwa-server#1223 (comment)

"the 'cwa-server' doesn't collect any nor calculates any statistics, but it reads in a json file coming from CWA-Analytics framework and transforms it into protobuf structure, which is then consumed by the mobile clients, when you open your app.

Unfortunately I don't have all the details where the CWA-Analytics framework gets its information from. But for sure its using the RKI as one of the data-sources."

@MikeMcC399
Copy link
Contributor

MikeMcC399 commented Feb 26, 2021

To summarize the findings:

  1. It is correct that the trends for 7-Day Average and 7-Day Incidence can be different.
  2. The information text ℹ️ is misleading regarding the 7-Day Incidence trend, which is calculated based on a comparison to the previous day's value, not the value 7 days prior.
  3. The data-set for the 7-Day Average trend is based on two sets of adjacent 7-Day periods, using the date reported to RKI, a ±5% Steady trend band and a total of 14 days of data.
  4. The data-set for the 7-Day Incidence trend is based on two sets of overlapping 7-Day periods, one day apart, using the date reported to the local Gesundheitsamt, a ±1% Steady trend band and a total of 8 days of data.
  5. Reporting chain delays mean that RKI dates and Gesundheitsamt dates can differ.

There is a more detailed write-up in corona-warn-app/cwa-website#904 which is open for review.

I hope that the information text regarding Trend will be acknowledged as a documentation bug and addressed through
Internal Tracking ID: EXPOSUREAPP-5225. This is the "Key Figures, Explanation of Statistics" text which is shown by tapping on the ℹ️ icon in any of the statistics tiles in the app. More specifically the string statistics_explanation_trend_text:

"Trend"

"The arrow direction indicates whether the trend is increasing, decreasing, or remaining steady – that is, demonstrates a deviation of less than 1% compared to the previous day or 5% compared to the previous week. The color indicates this trend as positive (green), negative (red), or neutral (gray). The trend compares the value from the previous day with the value from two days ago or, for the 7-day trends, the average value from the last 7 days with the average value from the 7 days prior to that."

@MikeMcC399
Copy link
Contributor

@nilsalex

Could we close this issue now?

The trend for Confirmed New Infections is calculated based on a comparison to the value of the 7-Day Average one week previously whereas the trend for the 7-Day Incidence is calculated using the value one day previously. So that difference on its own is enough reason that the trends will not necessarily be the same on any one day.

In your original post, you wrote under Expected Behaviour "Same trend for both indicators.". Through the research we did, we now know that it is not expected that trend will be the same, for all the reasons I gave in #528 (comment).

I made a suggestion in the open issue #550 about changing the help text to explain better. Also there is a note in #535 (comment) that the FAQs will be updated.

@nilsalex
Copy link
Author

@nilsalex

Could we close this issue now?

Sure. It is certainly not a bug because the behaviour is intended, as you explained.

Let me, however, just note: I do not expect this behaviour as user as laid out in great detail and it's weird to tell the user what to expect :-) The question should really be: How does the user benefit from seeing different numbers and trends?

But this is more an issue for the RKI as data source and the stakeholders as the ones who decide what information to present in the widgets. People have pointed out this inconsistency elsewhere (CWA is of course not the only medium where the data is published) but apparently it has been decided not to act on this.

@GisoSchroederSAP
Copy link

GisoSchroederSAP commented Mar 16, 2021

Hi @nilsalex , you are free to call it "inconsistency" - this is your opinion, I still don't agree here.
Instead, I call it "different" metrics (but indirectly related), where on a given date the trend indicators can differ.
Just saying.

I wanted to make this clear to avoid the impression, we agree your point of view. Hope you understand and accept our standpoint as well.

@MikeMcC399
Copy link
Contributor

@nilsalex

Could we close this issue now?

Sure. It is certainly not a bug because the behaviour is intended, as you explained.

Thank you very much for raising this issue. I learned a lot trying to understand it myself!

You should see a button at the bottom so you can close it yourself. I'm not a moderator, just a Contributor so I can't close it for you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working mirrored-to-jira This item is also tracked internally in JIRA
Projects
None yet
Development

No branches or pull requests

5 participants