Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USA cases decreased by ~90,000? #2093

Open
louchios opened this issue Apr 9, 2020 · 21 comments
Open

USA cases decreased by ~90,000? #2093

louchios opened this issue Apr 9, 2020 · 21 comments

Comments

@louchios
Copy link

@louchios louchios commented Apr 9, 2020

Operations Dashboard as well as feature layers display USA cases as 363,000. It displayed well over 400,000 earlier today. Thank you.

Worldometer has 455,000.

USA

@ahnick
Copy link

@ahnick ahnick commented Apr 9, 2020

Came here to say the same thing. What's going on?

@yetzt
Copy link

@yetzt yetzt commented Apr 9, 2020

they got wrong data for NYC

Screenshot

@yetzt
Copy link

@yetzt yetzt commented Apr 9, 2020

i mean, how difficult to implement is a sanity check? "don't automatically update the data if the number goes down" is not that difficult of a rule. sorry to say so, but this jhu csse is not operating with the care required for their level of responsibility and even their very low standards are degrading lately.

@yetzt
Copy link

@yetzt yetzt commented Apr 9, 2020

Screenshot

NYC is now either 160083 or 78221. but who cares about little discrepancies.

@asleeis
Copy link

@asleeis asleeis commented Apr 9, 2020

It seems to be a global issue on the data. 4/8/2020 daily data file (at a minimum) seems to be seriously off in all regards.

@feiwsteven
Copy link

@feiwsteven feiwsteven commented Apr 9, 2020

New York's cumulative death number is wrong
4/4/20 4290
4/5/20 4951
4/6/20 4698
4/7/20 5489
4/8/20 6268

death count on 4/6 is smaller than the one on 4/5.

@asleeis
Copy link

@asleeis asleeis commented Apr 9, 2020

Just FYI. I did a fresh pull, and right now, the numbers for 4/8 seem much more reasonable now. I'm not sure when the data I had was from, but sometime yesterday evening is when I first noticed the big decreases people were reporting. But now, it is looking better.

@feiwsteven
Copy link

@feiwsteven feiwsteven commented Apr 9, 2020

New York's cumulative death number is still not right.
4/4/20 4290
4/5/20 4951
4/6/20 4698
4/7/20 5489
4/8/20 6268

death count on 4/6 is smaller than the one on 4/5.

@tchelle
Copy link

@tchelle tchelle commented Apr 9, 2020

@yetzt They can really use your help. please volunteer

@yetzt
Copy link

@yetzt yetzt commented Apr 10, 2020

@tchelle they could but they don't want it, so i stick to pointing out where their data is wrong. see the issues i've raised in this repostory in tha past.

@tchelle
Copy link

@tchelle tchelle commented Apr 11, 2020

@yetzt they definitely need some experienced people guiding them. this data is so dirty there is no way people can track daily trends with any level of confidence. I offered and haven't back yet. I see your point....very frustrating

@tchelle
Copy link

@tchelle tchelle commented Apr 11, 2020

several major networks including CNN are reporting off this data. facepalm

@jjbenes
Copy link

@jjbenes jjbenes commented Apr 11, 2020

@tchelle @yetzt Can also use data from USAFacts aside from JHU. (USAFacts slices NYC data differently.) See these two Jupyter notebooks: https://nbviewer.jupyter.org/github/jjbenes/covid19/tree/master/jupyter/. I showed cumulative and daily-new data at the country, state, and county levels in the U.S. You can run sanity checks with these notebooks, and maybe correct the data based on multiple sources.
The code at https://github.com/jjbenes/covid19 made these notebooks, which only differ in one data import line. Just need to add front-end code to deal with slightly different CSV formats. The back-end code stays the same. Maybe this code coupled with other U.S. data sources (USAFacts, NYT, your own elbow grease, and so on) would be useful for you to get the data quality to you want.

@jjbenes
Copy link

@jjbenes jjbenes commented Apr 11, 2020

Back to the title of this issue, truly cumulative data must be hard to come by.
Screen Shot 2020-04-11 at 4 55 44 AM
Screen Shot 2020-04-11 at 4 54 18 AM

@HashemDeveloper
Copy link

@HashemDeveloper HashemDeveloper commented Apr 11, 2020

Yup their data is completely wrong! The current status of COVID 19 -total case is 159,937, recovered 13,000 and death is 7,067(source WHO). And their API says confirmed cases 172348, death 7867, recovered 0. Recovered is always 0 in the entire US as far as I know...
Screenshot_20200407-212830

@tchelle
Copy link

@tchelle tchelle commented Apr 11, 2020

@jjbenes Thank you! much appreciated. I'll take a look.

@HashemDeveloper
Copy link

@HashemDeveloper HashemDeveloper commented Apr 11, 2020

@jjbenes thank you 😊

@jjbenes
Copy link

@jjbenes jjbenes commented Apr 12, 2020

Been working on per capita data and noticed that JHU's New York State population is 4174504 more than what I got from USAFacts and the U.S. Census Bureau.
Screen Shot 2020-04-11 at 6 32 09 PM

Both USAFacts (https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_county_population_usafacts.csv) and the U.S. Census Bureau have 19,453,561 as the NY State population (https://www.census.gov/quickfacts/NY), not a surprise because USAFacts says they use the 2019 Census estimates.

"New York" can be a city, a county, or a state. (The city is a collection of boroughs and spans multiple counties. Ran into a similar counting problem before with Kansas City, MO, when I was working with the NYT CSV files.) Both JHU and USAFacts/Census Bureau use FIPS in the CSV files. The FIPS code 36061 is the code for New York County (Manhattan). That happens to be the only NY county line item in the JHU population file that differs from USAFacts. 4.2 million is not a small difference. For U.S. population, I suggest we use the data directly from the U.S. Census Bureau or USAFacts. Too much hassle to clean up the JHU data.

@HashemDeveloper
Copy link

@HashemDeveloper HashemDeveloper commented Apr 12, 2020

@jjbenes I am sorry to ask, so is that mean we should not use JHU API? Or you guys planning to update JHU API by using data from U.S Census or USAFacts?

@jjbenes
Copy link

@jjbenes jjbenes commented Apr 12, 2020

@HashemDeveloper The idea behind the Python API (currently one for JHU and one for USAFacts) is to keep data sources separate. If we start hacking the JHU class to make it grab population data from the USAFacts class behind your back, we'll end up with a spaghetti data structure. There are three viable options.

  1. Switch to USAFacts. I just did that with the per capita map here and everything is now self-consistent. I used to see a hole in Queens and Bronx Counties in New York because not all the counties were covered in the JHU data. Now the holes are gone.

  2. Wait until issue 1970 fixed. This is really a data problem and shouldn't be fixed in the API.

  3. If you need good U.S. population data but you still want to use the JHU U.S. confirmed and deaths files, explicitly call get_us_population() in the USAFacts module. We intend to keep using FIPS as the unique ID for U.S. population file, so your population look-up code will be immune to other changes.

The nice thing about the JHU database is that it's global. But if you're only interested in U.S. data, Option 1 should work well. Option 3 is a viable short-term solution until the data bug is fixed.
Screen Shot 2020-04-11 at 10 04 47 PM
Screen Shot 2020-04-11 at 10 07 32 PM

@HashemDeveloper
Copy link

@HashemDeveloper HashemDeveloper commented Apr 12, 2020

@jjbenes thank you for the detailed information! I am going to to wait untill the issue 1979 is fixed. Meanwhile, option 3 seems to be good choice for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants