Naming Convention Moving Forward #504
Comments
|
Could you please not use any names at all but just a code system? |
|
People need names. This is a resource that's being used/watched by non-(data)scientists all over the world. @CSSEGISandData, just confirming that this is the source: |
|
Developers need to compare fields between documents. Excel managers need to do the same. Matching cells with mixed-case and character contents isn't optimal. |
|
can the intra- US State names be standardized to County? Here's the US Census FIPS listings which have County names: https://www.census.gov/geographies/reference-files/2017/demo/popest/2017-fips.html Here's a sample of first 10 places alphabetically in US: |
|
@designbyadrian That's totally fair -- No one* cares what unique ids are settled upon to deal with data in the back end. The issue causing the outrage is the display on the map/visualization that users see. Frankly, I'm not happy with their current solution, but I can deal as it helps that there's now transparency about where the names are coming from. *Not no one -- obviously people trying to model the data care |
|
Thank you @CSSEGISandData for responding promptly and with clarity to all the people invested in this topic. As a Hong Konger, I'd like to see HK listed as a separate entry. I hope my wish aligns with the US State Department naming scheme. |
|
@julietchen above The solution is to at least add a code field so we don't have to worry about free text changing. |
This will help all us developers and analyzers as they change the text of the names. Any standard naming convention, GENC 2A and GENC 3A are both used by the US State Dept and either would be a great add to both the CSVs and the the JSON feature layer(s). Unfortunately it will not help the political issues with what is labeled a separate country. |
The current CSV output seems to be the worst of all worlds. Instead of picking one system and going with it, the data seems to be using multiple systems semi-arbitrarily. The United States data seems to offer the best example. Originally, the team was using counties, such as King County in Washington. That was probably too precise for a lot of use cases (King and Snohomish are probably really one event, for example), so now the change seems to be to use states. But the data for Washington state in the same file is wrong. It says there are 0 cases since the beginning of time and then 267 cases starting yesterday. That is incorrect. I get the desire to use more useful "buckets" for data, but without migrating the previous data to the new system, the output becomes difficult to work with. Now I have to figure out how to dedupe this stuff. But wait, it gets worse: Now cities are popping up starting today. What do I do with this? Is this in addition to the overall Washington state number? Is it duplicative? What county is Kitsap in, and is it duplicative with that county's number? Unfortunately, the CSV output just became effectively useless because it contains duplicate information and incomplete history on many rows, and I can not see any good way for a consumer of this data to write code to properly sort it out. Edit: My complaint aside -- thank you all for what you are doing here. |
|
@lukesneeringer Totally agree. @CSSEGISandData What are you going to do about the overlapping data between city/state and county in the US? |
|
I'm so torn on this dataset. I love the concept of it. Providing a public-facing and near real-time dataset on a rapidly evolving pandemic is powerful and transformative. But the execution and strategic vision concerns me. Is there a project manager? Is there a database engineer? Are both of these resources only concerned with the internal host server that this public repository is an afterthought? That's really what it feels like, which is unfortunate for the thousands of developers around the world trying to leverage this repository to generate near real-time insight. I love the effort and the incredible precedent this may set for future (god help us) outbreaks, but the data management aspect of this public repository is underwhelming, to say the least. |
|
@CSSEGISandData unfortunately Hong Kong is still merged into China, while Hong Kong is presented as a State/Region ytd. |
|
Conversion to ISO-3166 alpha-3 from Johns-Hopkins list: https://github.com/AnthonyEbert/COVID-19_ISO-3166/blob/master/JohnsHopkins-to-A3.csv I intend to keep this updated. |
|
@AnthonyEbert See #470 (comment) ISO-1366 may not be the best code system to use. |
|
They're just three letters, it couldn't be less political. Some "countries" on this list are universally recognized as countries, and some are universally recognized as not-countries (e.g. Christmas Island). This means that you can't guess my politics from TWN, IRN, or ISR - which is perfect. |
|
Please, see #415 |
|
@CSSEGISandData You had to add new table (file) for name conversion. You can't just start using new naming convention everyday by changing country names in the data files while these names are used as an ID. Thanks for sharing the data, but you clearly have issues with data management. |
second. quoting someone else about the issue of ISO country codes. Unicode CLDR is more equitable.
Originally posted by @Eclipsed830 in #372 (comment) |
|
In the interest in keeping issues with data integrity in one place, I have a followup to #504 (comment): There are now duplicate rows for the same areas. The most recent commit (2033baa) added duplicative rows for all of China that I spot checked and probably some other areas too. For example, Hubei on line 14 has data through March 10, while Hubei on line 349 has the March 11 data with zeroes before that. It appears that everything from lines 349 through 405 are instances of this issue. This appears to have happened because of a rename from "Mainland China" to "China" (further illustrating the need for unique keys as many others have illustrated). |
|
Channel Islands isn't in the state department list, but is used today. The "countries" are Jersey (JEY) and Guernsey (GGY). |
|
one point to add: coding systems is essential for localization. This could save lots of developer time and enable more applications. Thanks in advance for fixing it. p.s. code name helps to generate the emoji flag too |
|
Could you also advise which is the exact list being used as "U.S. State Department names."?
Since in the list COVID-19 Country Specific Information by State Department too (Bureau of Consular affairs) |
|
@CSSEGISandData CSSEGISandDataThat's understandable, but please, be consistent with the naming and historical data. |
Oh, but then how else will we have the fun of forensically analyzing the data to figure out the new problems and then re-writing our code for the tenth time to accomodate the latest violent change? |
|
#543 – Yet another reason to have a separate code column |
|
This now seems to be fixed? |
It is better than it was before, but still not fixed. |
|
There are still all of counties, states, and cities with the latter two missing historical numbers
Sincerely,
George Sibble
…________________________________
From: Luke Sneeringer <notifications@github.com>
Sent: Thursday, March 12, 2020 12:39:19 PM
To: CSSEGISandData/COVID-19 <COVID-19@noreply.github.com>
Cc: George Sibble <gsibble@gmail.com>; Comment <comment@noreply.github.com>
Subject: Re: [CSSEGISandData/COVID-19] Naming Convention Moving Forward (#504)
This now seems to be fixed?
It is better than it was before, but still not fixed.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#504 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAH2TKR3FWPUGWQGU5C4ZODRHEF3PANCNFSM4LF53HJA>.
|
|
Czech Republic now becomes Czechnia... (Czechnya is a completely different country/region). Great work nonetheless! |
|
Would appreciate it if the cell was at the very least renamed to "Province/State/Region", if not, as others have mentioned above, Hong Kong should be listed as a separate region under "Country/Region" (1. State department lists HK as a separate entry; 2. HK has an independent COVID-19 response, healthcare system, immigration borders; 3. WHO singles out HK, Taiwan, and Macau in its reporting (WHO WPRO link); 4. singling HK out allows people to recognise the success HK so far has seen in containing COVID despite its close proximity with China, and gives academics, policy makers, general citizens, the incentive to learn from HK's practices. This adds value to your interactive map.) I understand that perhaps as engineers based in the States, the team isn't as familiar with the political nuances in China-Taiwan-HK issues, but nonetheless hope that the error with listing Hong Kong as a "Province/State" of China can be rectified to respect Hong Kong's special administrative status. Thank you for establishing this initiative early on! |
|
Hi Guys, United Kingdom data mapping appears broken. Showing 3 cases. Did naming convention changes separate United Kingdom 'GB' from Gibraltar 'G' and break something. Thanks guys you're doing a great job and providing a world service to a worried world.. :o) |
|
On place names: ...and if someone in China would like to take over more than Taiwan on your spread sheet they can easily do it by just changing the naming convention - actually they could take over the entire wold! |
|
HKSAR and Macau have different health administrations than Mainland China to report their data. Similarly, citizens are leveraging this dashboard across the globe. If you're going to stick to the US State Department guidelines on independent nations/regions, please reclassify HK and Macau as independent SARs for easier report monitoring. |
|
Merging figures of Hong Kong and Macau with China make it really hard to track and compare how these 2 administrations fair amongst other governance bodies. They both have separated administrations and medical system from mainland China. The merge also kills off the consistency. It doesn't make sense to merge. |
|
The way I see it, the CSSE team should at least (1) rectify its heading error by changing cell A1 to Province/State/Region, to respect Hong Kong’s special administrative status, or better yet, (2) list Hong Kong under Column B. The reasons for (2) are as follows: Hong Kong, as a special administrative region, has an independent COVID-19 response system, healthcare system, and immigration borders. Important to note too, is that CSSE extracts Hong Kong’s data directly from HK’s Department of Health, and not from China’s CDC. If the map is meant for surveillance purposes, putting Hong Kong’s situation in the bigger Chinese picture would dilute and misrepresent the local situation. In fact, Hong Kong (along with Taiwan and Singapore) has insofar been installing effective measures, by the government or local health experts, in containing COVID19, despite its close proximity to China. Singling out Hong Kong’s low infection rates thus far would give academics, policy makers, and civil society the incentive to learn from HK’s practices. |
|
Having worked with data from a 911 Emergency system in the past, and when working in a fast paced environment such as this, you don't get the luxury of developing a nice data model up front, nor do you have people entering data in a consistent manner. I have worked with this data set for at least a month now and have "rolled with the punches" on all the name changes etc. If other people want to collect this data from across the globe at their own granularity level and "QA" the info as best they can, go for it. I for one, am grateful for what the group is doing. |
|
First thing, thank you to everyone here. Thank |
I agree with aligning names with the World Health Organization's naming. |
From its start in January, the COVID-19 tracking map has been an open and transparent public health resource. Nearly TK billion people from nearly every nation on the planet have visited the map. During a comprehensive review of the dashboard on March 10, Professor Lauren Gardner and her team decided to align the names of nations with the World Health Organization’s naming conventions to achieve consistency in reporting. Upon reconsideration, the team is now using U.S. State Department names.
The text was updated successfully, but these errors were encountered: