Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Death count by reporting, instead of death date, for Israel #5339

Closed
yuval-harpaz opened this issue Feb 4, 2022 · 5 comments
Closed

Death count by reporting, instead of death date, for Israel #5339

yuval-harpaz opened this issue Feb 4, 2022 · 5 comments

Comments

@yuval-harpaz
Copy link

It seems that JH is showing deaths by reporting date (blue), and not by death date (orange).
There were changes in the dashboard's back door to the data, but now deaths (by death date) are reported in a new json.
image
The source above contains other hospital-related measures (I will open a separate issue for other fields), but not cases and tests.
Alternative sources:
@erasta's table draws data from the same source, but runs some checks to avoid duplicate rows and includes also cases and tests numbers. This table is updated every 30min by GitHub actions.
@dancarmoz also maintains a table with the most up-to-date data, here.

Since this is the data OWID visualizes, the issue is raising waves in Israel, see here, and here.

@CSSEGISandData
Copy link
Owner

Hello,

Thank you for your message. The data in our repository is by date of report rather than date of event. It appears that the Israeli COVID-19 dashboard published inflated values to the total death count the past two days that have now been resolved (there has been a drop in total deaths from 9238 to 9080 at 5AM EST this morning). We're working to correct the data errors published the past two days as quickly as we can.

@dancarmoz
Copy link

The Ministry of Health did not report 9238 at any stage (and thus there was no such drop). Rather, in the updates on Feb. 2nd and 3rd, some of the data contained duplicates of seven days (Jan. 2-7 & 9, 2021). The daily death counts were affected by this duplication, and the total death count for those seven days was 225 deaths. The reported death count on the last update on Feb. 3rd was indeed 9013, exactly 225 less than 9238: to get 9238 one would have to add up all the daily values (including duplication), but this is not the way the dashboard reported it.

The reported totals can be found in the json linked by Yuval above in the "countDeathCum" field for the last day (currently 9111, and was 9080 this morning and 9013 yesterday eve), as well as in a more specific json in the "total" field. Using these values directly should be more robust than summation over daily values, as temporary duplicated dates do occur in the data from time to time (in the past it was not uncommon for the current day to be duplicated, in particular).

Both jsons also contain the daily information (by date of event), under the "countDeath" and "amount" fields, respectively.

@CSSEGISandData
Copy link
Owner

CSSEGISandData commented Feb 4, 2022

Thank you. We can confirm the totals we had reported were accessed using the data published on the backend of the Israeli COVID-19 dashboard. We have back corrected February 2 and 3 in #5341 using the data linked on the repository managed by @erasta.

The 'countDeathCum' field in the original linked json is cumulative by date of event and not date of report, so it is not a suitable metric for this repository for a back distribution (I.e. value for Feb 3 is 9095 which is higher than than the 9013 reported). We will, however, investigate using it as validation for the other metrics accessed from the dashboard.

@erasta
Copy link

erasta commented Feb 4, 2022

Thank you. You are welcome to use all the data accumulated on my repo.
The data is collected automatically every 30 minutes, where the latest is saved in this CSV and is also saved as indexed by date in the history folder to allow traversing history.
To avoid bugs, this data is not altered and is saved in its raw form as collected, besides converting JSON to CSV format.

More than that, @yuval-harpaz and myself worked on automatic fusing the raw data from multiple tables, filtering double rows and completing missing data. Here is the latest fused table and also one of its history entrances, for comparison reasons.

@dancarmoz
Copy link

I just want to clarify that my suggestion was not to retroactively use 'countDeathCum' for each date, but to use its latest/current entry as the reported deaths for each day instead of the sum over all days (which I assume is how it is currently done, since that is the only method I see to get 9238 on Feb. 3). It should be equivalent to the current method as long as there are no duplicate days, but more correct in cases that dates are duplicated in the data, as happened on Feb 2&3.

As you can see for example here and here, this value had been 9013 for Feb. 3 on the last (and only) update of Feb. 3. It did later change to 9095 (and likely would change again in the future), but that shouldn't be an issue for you if you only look at the latest/current daily value.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants