-
Notifications
You must be signed in to change notification settings - Fork 33
Number of US states are missing deaths/tested #546
Comments
And In the
but in "2020-3-26": {
"cases": 37258,
"growthFactor": 1.209243452013891
} Same for Texas |
They're also reported with |
@lazd sorry to poke you, but I think this is pretty severe and I'd like to make sure it doesn't escape your attention before the next update. Can you mark it with appropriate labels? |
No worries @zbraniecki. As part of covidatlas/coronadatascraper#410, I will try to get |
COVIDTracking has deaths for those states:
Will that also fix it? |
And covidatlas/coronadatascraper#410 seems to be about combining data from two sources. My suspicion is that this bug is about two sources for two different things (county vs. state) ending up conflated as two sources of the same thing and the Here's what's in report.json for "NY, USA": "NY, USA": [
{
"country": "USA",
"url": "https://covidtracking.com/api/states",
"type": "json",
"curators": [
{
"name": "The COVID Tracking Project",
"url": "https://covidtracking.com/",
"twitter": "@COVID19Tracking",
"github": "COVID19Tracking"
}
],
"aggregate": "state",
"priority": -0.5,
"timeseries": false,
"headless": false,
"certValidation": true,
"state": "NY",
"deaths": 385,
"tested": 122104,
"cases": 37258,
"ssl": true,
"rating": 0.49019607843137253
},
{
"state": "NY",
"country": "USA",
"type": "table",
"aggregate": "county",
"timeseries": false,
"headless": false,
"certValidation": true,
"priority": 0,
"url": "https://coronavirus.health.ny.gov/county-county-breakdown-positive-cases",
"cases": 37258,
"ssl": true,
"rating": 0.3137254901960784
}
], I think this is a different thing than covidatlas/coronadatascraper#410. - mainly, those counties should not be conflated with states. and "NY, USA" should be a state and collect state data. |
No, those counties cases are rolled up into state totals, which is exactly what we want to do. However, testing numbers aren't being reported on a per-county basis, so they're not getting rolled up. So what we want to do is take our rolled up case numbers and take COVIDTracking's testing numbers, which is what covidatlas/coronadatascraper#410 is about. |
Just as another data point, I'm still seeing deaths == nan for all of NY state and city.
(Personally I'm less interested in tested, except to the extent that it's caused by the same underlying issue... reports on number tested have been inconsistent across most aggregators; cases+deaths have been more reliable). Also, thanks for putting this dataset together; I've been lurking for a while and am impressed with the work y'all're putting in. Unfortunately I just migrated the daily updates I send to friends and family on cases+deaths (in the states we live in) to use this timeseries data; bad timing I guess :) Good luck with the fix, and thanks again! |
@mvanmidd we don't have a source for deaths in NY on a per-county basis: https://coronavirus.health.ny.gov/county-county-breakdown-positive-cases NYC only notes deaths for the entire city: https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary.pdf We can pull deaths for NYC from the daily update PDF, but we're out of luck for the rest of the New York counties. Until we implement covidatlas/coronadatascraper#410, we won't be pulling deaths for NY state either, unfortunately. |
Gotcha, thanks for the update. covidatlas/coronadatascraper#410 seems like a big one, good luck! y'all are going to have a fully featured generic/configurable ETL framework pretty soon :) In all seriousness, I think the auditable data aggregation is the biggest strength of this project... there's plenty of fronted work going on elsewhere (e.g. the explosion of "babby's first plotly visualizations," including my own), and on the backend, lots of data sources that are either incomplete or opaque. Keep up the good work! |
For US state/county data, how about NYT repo: https://github.com/nytimes/covid-19-data ? |
@cristipp - good find. I haven't looked at the actual data, but the README is encouraging. |
And for county level:
They also have NYC as a separate entry [empty fips]:
|
I'd be happy to. Though I see it already appearing in the https://coronadatascraper.com/#crosscheck for many [all?] counties. Perhaps you don't have it for state-level data? |
Oh, I found NY at state level too: https://coronadatascraper.com/#crosscheck:iso2:US-NY-iso1:US. It appears the scrapper prefers the arcgis dataset for some reason. FWIW, looks like most recent data + deaths + tested for iso2:US-NY is coming from https://covidtracking.com, see https://coronadatascraper.com/#crosscheck:iso2:US-NY-iso1:US.
|
I believe that @hyperknot has closed out this issue by upping the priority of the covidtracking scraper. @cristipp, what's your feeling? |
Good to have the state level data fixed. We're still lacking county level data for NY fatalities [0]. The county level fatalities can be pulled from NYT [1] or USAFacts [2], with the quirk that NYT sums the 5 counties of NYC together. Also note these sources don't report 'tested', which CoronaDataScraper does.
|
Hi @cristipp - getting back to this one after a long delay! The reports from Li at https://covidatlas.com/data merge data sources by priority. If a lower-priority source supplies a data point that no higher-pri source has, that value is preserved, and we also give the final source selected for each data point (see timeseries-byLocation.json). I believe we're doing what you've suggested. I believe this issue can be closed -- thoughts? |
Hi @cristipp and @zbraniecki - getting back to this one after a long delay! The reports from Li at https://covidatlas.com/data merge data sources by priority. If a lower-priority source supplies a data point that no higher-pri source has, that value is preserved, and we also give the final source selected for each data point (see timeseries-byLocation.json). I believe we're doing what you've suggested. I believe this issue can be closed -- thoughts? |
States with reported deaths that are not in today's data:
Compared to https://coronavirus.1point3acres.com/en
The text was updated successfully, but these errors were encountered: