Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Lookup Table with UID, ISO and FIPS Uploaded #1791

Open
CSSEGISandData opened this issue Mar 31, 2020 · 27 comments
Open

A Lookup Table with UID, ISO and FIPS Uploaded #1791

CSSEGISandData opened this issue Mar 31, 2020 · 27 comments

Comments

@CSSEGISandData
Copy link
Owner

@CSSEGISandData CSSEGISandData commented Mar 31, 2020

We newly uploaded a lookup table including UID, ISO alpha 2, ISO alpha 3, ISO 3-digit, FIPS, and location names used in our Dashboard and GitHub. The UID is a combination of ISO numeric code, FIPS, and some artificial numbers. This table will be updated along with more locations joining in. Thanks!

Update 3/31: Add UID for US states and out of state entries. Adjust UID for unassigned locations in the US.

@clyde7
Copy link

@clyde7 clyde7 commented Mar 31, 2020

Thank you so much!

@chrisdane
Copy link

@chrisdane chrisdane commented Mar 31, 2020

the longitude column is called Long_?

@greg-minshall
Copy link

@greg-minshall greg-minshall commented Mar 31, 2020

very nice -- thanks!

@gkrag
Copy link

@gkrag gkrag commented Mar 31, 2020

Padding values in FIPS code US territories looks like are flipped between daily case counts and lookup table files. For e.g. below are the FIPS codes in lookup table,
00060
00066
00069
00078
and these are the values in case counts file,
66000
60000
69000
78000

@CSSEGISandData
Copy link
Owner Author

@CSSEGISandData CSSEGISandData commented Mar 31, 2020

@gkrag Good catch and yes, they are the same. We first defined them as 66000 and then changed to the 00060 format.

@tautme
Copy link

@tautme tautme commented Mar 31, 2020

What is the FIPS code version you are using? FIPS 6-4? INCITS 631?

@oltdaniel
Copy link

@oltdaniel oltdaniel commented Apr 2, 2020

@CSSEGISandData You have uploaded the CSV table and I think this is enough, so close this one and link it better in the README. 😉

@nbuonin
Copy link

@nbuonin nbuonin commented Apr 2, 2020

I think the padding for FIPS codes with a leading zero is off for the US time series files. In the lookup table these counties do have the leading zero, but in the time series files they don't. They are also represented as floats regardless of the leading zero.

For example Autauga County has a FIPS of 01001 in the lookup table: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv#L417

But it has a FIPS of 1001.0 in this time series file: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv#L7

@payamazadi
Copy link

@payamazadi payamazadi commented Apr 4, 2020

can i ask why there is a separate table for this?.. is it because of folks who have hardcoded their systems against the existing column definitions? this just points to the need of creating an API layer for users..

@matthiasgeihs
Copy link

@matthiasgeihs matthiasgeihs commented Apr 4, 2020

Did you think about adding population values?

@greg-minshall
Copy link

@greg-minshall greg-minshall commented Apr 4, 2020

@payamazadi having skinnier daily (changing) tables makes for less data download, etc. the UID/etc. table is presumably pretty static.

@matthiasgeihs the static nature of the UID table, and its "authoritariness/arbitrariness" maybe differentiates it from things like population size, which a) varies over time, and b) varies depending on your chosen authority.

my two cents. cheers!

@tautme
Copy link

@tautme tautme commented Apr 4, 2020

Population data and more at data.census.gov

@texadactyl
Copy link

@texadactyl texadactyl commented Apr 4, 2020

@CSSEGISandData
That CSV went immediately into my database. Thanks!

@jessehamner
Copy link

@jessehamner jessehamner commented Apr 5, 2020

the longitude column is called Long_?

If, for instance, one tries to use a variable named long in an *SQL, it won't be allowed, because long is a data type and is reserved. So Long_ or lon avoids the issue.

@texadactyl
Copy link

@texadactyl texadactyl commented Apr 5, 2020

Translate, reorder, and/or filter the labels.
Good tool for Python programs: Pandas read_csv() and DataFrames .
Javascript, Java, C/C++, etc. all have equivalents.

Free unsolicited advice: Don't get tied down to CSV column headers. Make the CSV fit your design through translation.

MatMoore added a commit to andreagrandi/covid-api that referenced this issue Apr 10, 2020
This corresponds to the lookup table provided here
CSSEGISandData/COVID-19#1791
MatMoore added a commit to andreagrandi/covid-api that referenced this issue Apr 10, 2020
This stores all regions published in the lookup table (CSSEGISandData/COVID-19#1791)

This will enable us to link the daily reports to a well defined location.
@MatthiasBlume
Copy link

@MatthiasBlume MatthiasBlume commented Apr 11, 2020

The population of "New York City, New York, US" is off by about 2.7 million. The value should include all five boroughs since that is how you report cases in New York City and per Exception Type 2 in https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/README.md. The number currently in the file is 5,803,210. Per a 2016 U.S. Census Bureau report, the value is 8,537,673. I suspect that someone transposed the first two digits in your CSV file, so I am proposing the change to 8,503,210.

@Lucas-Czarnecki
Copy link

@Lucas-Czarnecki Lucas-Czarnecki commented Apr 14, 2020

It appears that a recent commit has changed how the Lookup Table encodes FIPS. Recent changes omit leading zeros for the following states (those with a FIPS between 01 and 09):

Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut

Consequently, it appears that all counties for these states are also affected (e.g., Autauga Alabama is 1001, but should be 01001). I am wonder whether this is also causing problems in JHU's workflow; for example, see issue #2183.

@leboblack
Copy link

@leboblack leboblack commented Apr 15, 2020

Thank you

@texadactyl
Copy link

@texadactyl texadactyl commented Apr 15, 2020

@Lucas-Czarnecki
True for the last 2 reports (April 13th and 14th). Technically, you have a point. But, why do you think it is causing trouble? Extracting the FIPS values with or without being filled out to 5 digits is straight-forward in any programming language (E.g. Python, R).

This file is a feed to further processing. The only time an end user would care about the value formatting is in presentation (report or plot).

@Lucas-Czarnecki
Copy link

@Lucas-Czarnecki Lucas-Czarnecki commented Apr 15, 2020

Hey @texadactyl, It's hard to say given how JHU updates and modifies its daily reports (excel, R, all of the above... who knows). How I found out about this error (and it is an error), was using the Lookup table to map FIPS codes to older daily reports.

I don't want to speculate about all the ways that folks might end up using the Lookup Table and the FIPS codes, but updating this csv with accurate information would be ideal.

@texadactyl
Copy link

@texadactyl texadactyl commented Apr 15, 2020

@Lucas-Czarnecki
The FIPS county code values are accurate (I've seen no value errors). Their format in the CSV handoff file is not suitable for reporting - agreed. My advice is to process/scrub/filter/extract using your own programs (E.g. Python Pandas read_csv) any way you like; store the results in your own database. Don't expect JHU to address format-related issues too quickly as they are up to their ears.

@Lucas-Czarnecki
Copy link

@Lucas-Czarnecki Lucas-Czarnecki commented Apr 15, 2020

@texadactyl Fair enough. Yes, that's what I did. Agreed, I'd say this is at the bottom of JHU's priorities.

@texadactyl
Copy link

@texadactyl texadactyl commented Apr 15, 2020

@Lucas-Czarnecki Hopefully, I wasn't lecture-ish. I've been programming since punched cards & paper tape and do not mind addressing any data challenge that presents itself.

@Lucas-Czarnecki
Copy link

@Lucas-Czarnecki Lucas-Czarnecki commented Apr 15, 2020

@texadactyl Not at all. You came off very nice. I always appreciate the info!

@jessehamner
Copy link

@jessehamner jessehamner commented Apr 16, 2020

Agreed this sort of issue should be near the bottom of the priorities list. Here's what I do:

For the third CSV format type (starting 2020-Mar-22), in R, one can ensure the leading zero gets included in FIPS codes with, for instance:

covid3$newfips <- sprintf("%05.0f", as.integer(covid3$FIPS))

and to separate the 5-character FIPS code into state and county:
covid3$stfips <- substr(covid3$newfips, 1,2)
covid3$cofips <- substr(covid3$newfips, 3,5)

@Lucas-Czarnecki
Copy link

@Lucas-Czarnecki Lucas-Czarnecki commented Apr 16, 2020

Thanks @jessehamner. I'm taking a slightly different approach. I modify the Lookup Table and then map the data to each csv. The reason is to avoid issues like this one #2116

I also work in R. Here is my code (which requires the tidyverse). It has the added benefit of avoiding having to create new columns.
Capture

Replaces blank cells with missing values.
Lookup_Table <- Lookup_Table %>% mutate_all(na_if,"")

Conditionally pads leading zeros (by state and county).
Lookup_Table$FIPS <- ifelse(is.na(Lookup_Table$Admin2), formatC(as.numeric(Lookup_Table$FIPS),width=2,format='f',digits=0,flag='0'), formatC(as.numeric(Lookup_Table$FIPS),width=5,format='f',digits=0,flag='0'))

The only kink I can find is having to fix missing values.
y$FIPS[y$FIPS == " NA" | y$FIPS == "NA"] <- NA

EDIT: You can find my cleaned Lookup Table HERE

@Pinkmei
Copy link

@Pinkmei Pinkmei commented Apr 20, 2020

We newly uploaded a lookup table including UID, ISO alpha 2, ISO alpha 3, ISO 3-digit, FIPS, and location names used in our Dashboard and GitHub. The UID is a combination of ISO numeric code, FIPS, and some artificial numbers. This table will be updated along with more locations joining in. Thanks!
Update 3/31: Add UID for US states and out of state entries. Adjust UID for unassigned locations in the US.

Thank you for updating. Do you have df_Counties2020.csv file, which might be in US/Data_tables/JHU_USCountymap folder? I incidentally deleted from my computer when updating the analysis. Now couldn't open my analysis without this file. Is there any possibility to help? Thank you very much! Best Regards and blessings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet