Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The number of confirmed cases in Spain on 2020-04-24 is a negative number #52

Closed
savethecathode opened this issue May 21, 2020 · 1 comment

Comments

@savethecathode
Copy link

savethecathode commented May 21, 2020

In the CRAN version of the coronavirus package the number of confirmed cases in Spain on 2020-04-24 is -10034, which seems like a mistake.

To visualize the number of confirmed cases in Spain I used the following
coronavirus %>% filter(country=="Spain" & type=="confirmed") %>% ggplot(aes(date, cases)) + geom_line()

To identify the specific data point I used the following
coronavirus[which.min(coronavirus$cases),]

@RamiKrispin
Copy link
Owner

Hi @savethecathode

As I am using the diff of the cumulative values on the raw data, negative values will occur on the data whenever updates of the data (e.g., removing false positive, misclassification, errors, etc.) are not retroactively (e.g., removed from the day it was added), as in most of the cases the data is anonymized. Therefore, you will see a drop on the cumulative values on the raw data.

In the case of Spain as can see on the raw data the cumulative values of the confirmed cases:

  • April 23 - 213024
  • April 24 - 202990

This issue in the data related to the use of different sources of data and John Hopkins are trying to fix it. More information available on the following issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants