Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming Convention Moving Forward #504

Open
CSSEGISandData opened this issue Mar 11, 2020 · 38 comments
Open

Naming Convention Moving Forward #504

CSSEGISandData opened this issue Mar 11, 2020 · 38 comments

Comments

@CSSEGISandData
Copy link
Owner

@CSSEGISandData CSSEGISandData commented Mar 11, 2020

From its start in January, the COVID-19 tracking map has been an open and transparent public health resource. Nearly TK billion people from nearly every nation on the planet have visited the map. During a comprehensive review of the dashboard on March 10, Professor Lauren Gardner and her team decided to align the names of nations with the World Health Organization’s naming conventions to achieve consistency in reporting. Upon reconsideration, the team is now using U.S. State Department names.

@designbyadrian
Copy link

@designbyadrian designbyadrian commented Mar 11, 2020

Could you please not use any names at all but just a code system?

@julietchen
Copy link

@julietchen julietchen commented Mar 11, 2020

People need names. This is a resource that's being used/watched by non-(data)scientists all over the world.

@CSSEGISandData, just confirming that this is the source:
US State Department - Independent States in the World

@designbyadrian
Copy link

@designbyadrian designbyadrian commented Mar 11, 2020

Developers need to compare fields between documents. Excel managers need to do the same. Matching cells with mixed-case and character contents isn't optimal.

@bbrewington
Copy link

@bbrewington bbrewington commented Mar 11, 2020

can the intra- US State names be standardized to County? Here's the US Census FIPS listings which have County names: https://www.census.gov/geographies/reference-files/2017/demo/popest/2017-fips.html

Here's a sample of first 10 places alphabetically in US:
City - Adams, IN
County - Alameda County, CA
City - Arapahoe, CO
City - Arlington, VA
County - Bennington County, VT
County - Bergen County, NJ
County - Berkshire County, MA
City - Boone, IN
County - Broward County, FL
City - Burlington, NJ

@julietchen
Copy link

@julietchen julietchen commented Mar 11, 2020

@designbyadrian That's totally fair -- No one* cares what unique ids are settled upon to deal with data in the back end. The issue causing the outrage is the display on the map/visualization that users see.

Frankly, I'm not happy with their current solution, but I can deal as it helps that there's now transparency about where the names are coming from.

*Not no one -- obviously people trying to model the data care

@duncanmak
Copy link

@duncanmak duncanmak commented Mar 11, 2020

Thank you @CSSEGISandData for responding promptly and with clarity to all the people invested in this topic.

As a Hong Konger, I'd like to see HK listed as a separate entry. I hope my wish aligns with the US State Department naming scheme. 🇭🇰

@designbyadrian
Copy link

@designbyadrian designbyadrian commented Mar 11, 2020

@julietchen above 👆 is what I'm talking about. Every time they update the names, a bunch of applications crash.

The solution is to at least add a code field so we don't have to worry about free text changing.

@rossarmer
Copy link

@rossarmer rossarmer commented Mar 11, 2020

The solution is to at least add a code field so we don't have to worry about free text changing.

This will help all us developers and analyzers as they change the text of the names. Any standard naming convention, GENC 2A and GENC 3A are both used by the US State Dept and either would be a great add to both the CSVs and the the JSON feature layer(s).

Unfortunately it will not help the political issues with what is labeled a separate country.

@lukesneeringer
Copy link

@lukesneeringer lukesneeringer commented Mar 11, 2020

During a comprehensive review of the dashboard on March 10, Professor Lauren Gardner and her team decided to align the names of nations with the World Health Organization’s naming conventions to achieve consistency in reporting. Upon reconsideration, the team is now using U.S. State Department names.

The current CSV output seems to be the worst of all worlds. Instead of picking one system and going with it, the data seems to be using multiple systems semi-arbitrarily.

The United States data seems to offer the best example. Originally, the team was using counties, such as King County in Washington. That was probably too precise for a lot of use cases (King and Snohomish are probably really one event, for example), so now the change seems to be to use states.

But the data for Washington state in the same file is wrong. It says there are 0 cases since the beginning of time and then 267 cases starting yesterday. That is incorrect. I get the desire to use more useful "buckets" for data, but without migrating the previous data to the new system, the output becomes difficult to work with. Now I have to figure out how to dedupe this stuff.

But wait, it gets worse: Now cities are popping up starting today. What do I do with this? Is this in addition to the overall Washington state number? Is it duplicative? What county is Kitsap in, and is it duplicative with that county's number?

Unfortunately, the CSV output just became effectively useless because it contains duplicate information and incomplete history on many rows, and I can not see any good way for a consumer of this data to write code to properly sort it out.

Edit: My complaint aside -- thank you all for what you are doing here.

@sibblegp
Copy link

@sibblegp sibblegp commented Mar 11, 2020

@lukesneeringer Totally agree.

@CSSEGISandData What are you going to do about the overlapping data between city/state and county in the US?

@Sitius86
Copy link

@Sitius86 Sitius86 commented Mar 11, 2020

I'm so torn on this dataset. I love the concept of it. Providing a public-facing and near real-time dataset on a rapidly evolving pandemic is powerful and transformative. But the execution and strategic vision concerns me.

Is there a project manager? Is there a database engineer? Are both of these resources only concerned with the internal host server that this public repository is an afterthought?

That's really what it feels like, which is unfortunate for the thousands of developers around the world trying to leverage this repository to generate near real-time insight.

I love the effort and the incredible precedent this may set for future (god help us) outbreaks, but the data management aspect of this public repository is underwhelming, to say the least.

@ShawTim
Copy link

@ShawTim ShawTim commented Mar 11, 2020

@CSSEGISandData unfortunately Hong Kong is still merged into China, while Hong Kong is presented as a State/Region ytd.
Merging Hong Kong into China doesnt help to prevent spread of COVID-19 becoz they have different immigration and borders, and also different policy for disease control and prevention.

@AnthonyEbert
Copy link

@AnthonyEbert AnthonyEbert commented Mar 11, 2020

Conversion to ISO-3166 alpha-3 from Johns-Hopkins list: https://github.com/AnthonyEbert/COVID-19_ISO-3166/blob/master/JohnsHopkins-to-A3.csv

I intend to keep this updated.

@designbyadrian
Copy link

@designbyadrian designbyadrian commented Mar 12, 2020

@AnthonyEbert See #470 (comment)

ISO-1366 may not be the best code system to use.

@AnthonyEbert
Copy link

@AnthonyEbert AnthonyEbert commented Mar 12, 2020

They're just three letters, it couldn't be less political. Some "countries" on this list are universally recognized as countries, and some are universally recognized as not-countries (e.g. Christmas Island). This means that you can't guess my politics from TWN, IRN, or ISR - which is perfect.

@scottmcdaniel
Copy link

@scottmcdaniel scottmcdaniel commented Mar 12, 2020

Please, see #415

@avatorl
Copy link

@avatorl avatorl commented Mar 12, 2020

@CSSEGISandData You had to add new table (file) for name conversion. You can't just start using new naming convention everyday by changing country names in the data files while these names are used as an ID. Thanks for sharing the data, but you clearly have issues with data management.

@eacvo67347
Copy link

@eacvo67347 eacvo67347 commented Mar 12, 2020

@AnthonyEbert See #470 (comment)

ISO-1366 may not be the best code system to use.

second.

quoting someone else about the issue of ISO country codes. Unicode CLDR is more equitable.

The problem with using ISO country codes is they are politicized as being involved with the ISO requires UN Membership. This is why using ISO 3166-2 country codes and naming is considered bad practice and most software developers instead use Unicode CLDR. http://cldr.unicode.org/translation/displaynames/country-names

https://github.com/unicode-cldr/cldr-localenames-full/blob/master/main/en/territories.json

Originally posted by @Eclipsed830 in #372 (comment)

@lukesneeringer
Copy link

@lukesneeringer lukesneeringer commented Mar 12, 2020

In the interest in keeping issues with data integrity in one place, I have a followup to #504 (comment):

There are now duplicate rows for the same areas. The most recent commit (2033baa) added duplicative rows for all of China that I spot checked and probably some other areas too. For example, Hubei on line 14 has data through March 10, while Hubei on line 349 has the March 11 data with zeroes before that. It appears that everything from lines 349 through 405 are instances of this issue. This appears to have happened because of a rename from "Mainland China" to "China" (further illustrating the need for unique keys as many others have illustrated).

@rtwfroody
Copy link

@rtwfroody rtwfroody commented Mar 12, 2020

Channel Islands isn't in the state department list, but is used today. The "countries" are Jersey (JEY) and Guernsey (GGY).

@vincentlaucy
Copy link

@vincentlaucy vincentlaucy commented Mar 12, 2020

one point to add: coding systems is essential for localization.
Huge population in the world don't speak English, and It will be hard to map to translations from a non-web-standard country names in English, which even changes.

This could save lots of developer time and enable more applications. Thanks in advance for fixing it.

p.s. code name helps to generate the emoji flag too
countryCode.toUpperCase().replace(/./g, char => String.fromCodePoint(char.charCodeAt(0) + 127397));

@vincentlaucy
Copy link

@vincentlaucy vincentlaucy commented Mar 12, 2020

Could you also advise which is the exact list being used as "U.S. State Department names."?

  • this as above mentioned, US State Department - Independent States in the World?

Since in the list COVID-19 Country Specific Information by State Department too (Bureau of Consular affairs)
Hong Kong has a separate entry.

@analyzewithpower
Copy link

@analyzewithpower analyzewithpower commented Mar 12, 2020

@CSSEGISandData CSSEGISandDataThat's understandable, but please, be consistent with the naming and historical data.

@soilstack
Copy link

@soilstack soilstack commented Mar 12, 2020

@CSSEGISandData You had to add new table (file) for name conversion. You can't just start using new naming convention everyday by changing country names in the data files while these names are used as an ID. Thanks for sharing the data, but you clearly have issues with data management.

Oh, but then how else will we have the fun of forensically analyzing the data to figure out the new problems and then re-writing our code for the tenth time to accomodate the latest violent change?

@designbyadrian
Copy link

@designbyadrian designbyadrian commented Mar 12, 2020

#543 – Yet another reason to have a separate code column

@ccn000
Copy link

@ccn000 ccn000 commented Mar 12, 2020

This now seems to be fixed?

@lukesneeringer
Copy link

@lukesneeringer lukesneeringer commented Mar 12, 2020

This now seems to be fixed?

It is better than it was before, but still not fixed.

@sibblegp
Copy link

@sibblegp sibblegp commented Mar 12, 2020

@phurichai
Copy link

@phurichai phurichai commented Mar 12, 2020

Czech Republic now becomes Czechnia... (Czechnya is a completely different country/region).
Standardising country names sounds good in principles, but this seems to have created more problems that it solves. I agree with using country codes.

Great work nonetheless!

@yagnes11
Copy link

@yagnes11 yagnes11 commented Mar 13, 2020

🇭🇰 To add on regarding naming issues for Hong Kong, currently in the dataset Hong Kong is listed as a "Province/State" of China. Hong Kong, under the one-country, two-systems principle, is neither a province, nor a state, of China. It is officially the Hong Kong Special Administrative Region (HKSAR).

Would appreciate it if the cell was at the very least renamed to "Province/State/Region", if not, as others have mentioned above, Hong Kong should be listed as a separate region under "Country/Region" (1. State department lists HK as a separate entry; 2. HK has an independent COVID-19 response, healthcare system, immigration borders; 3. WHO singles out HK, Taiwan, and Macau in its reporting (WHO WPRO link); 4. singling HK out allows people to recognise the success HK so far has seen in containing COVID despite its close proximity with China, and gives academics, policy makers, general citizens, the incentive to learn from HK's practices. This adds value to your interactive map.)

I understand that perhaps as engineers based in the States, the team isn't as familiar with the political nuances in China-Taiwan-HK issues, but nonetheless hope that the error with listing Hong Kong as a "Province/State" of China can be rectified to respect Hong Kong's special administrative status.

Thank you for establishing this initiative early on!

@brianmorrisuk
Copy link

@brianmorrisuk brianmorrisuk commented Mar 14, 2020

Hi Guys,

United Kingdom data mapping appears broken. Showing 3 cases. Did naming convention changes separate United Kingdom 'GB' from Gibraltar 'G' and break something.

Thanks guys you're doing a great job and providing a world service to a worried world..

:o)

@Waldo79
Copy link

@Waldo79 Waldo79 commented Mar 14, 2020

On place names:
How about on the sheets containg the data just assign a number to the row - no name.
Then on an extra sheet specify the names of the day to the numbers.
That way you can change the names everyday and not screw up people trying to use the data.

...and if someone in China would like to take over more than Taiwan on your spread sheet they can easily do it by just changing the naming convention - actually they could take over the entire wold!

@AaronForce1
Copy link

@AaronForce1 AaronForce1 commented Mar 15, 2020

HKSAR and Macau have different health administrations than Mainland China to report their data. Similarly, citizens are leveraging this dashboard across the globe. If you're going to stick to the US State Department guidelines on independent nations/regions, please reclassify HK and Macau as independent SARs for easier report monitoring.

@yookoala
Copy link

@yookoala yookoala commented Mar 16, 2020

Merging figures of Hong Kong and Macau with China make it really hard to track and compare how these 2 administrations fair amongst other governance bodies. They both have separated administrations and medical system from mainland China. The merge also kills off the consistency. It doesn't make sense to merge.

@emhaitch
Copy link

@emhaitch emhaitch commented Mar 16, 2020

The way I see it, the CSSE team should at least (1) rectify its heading error by changing cell A1 to Province/State/Region, to respect Hong Kong’s special administrative status, or better yet, (2) list Hong Kong under Column B.

The reasons for (2) are as follows:
The JHU CSSE statement on March 12 supports including Hong Kong under Country/Region. They justified including Taiwan as an entity by citing the US State Department’s naming conventions. Although the State Department does not maintain an entry on Hong Kong on its main website, it hosts a bilateral relations fact sheetbilateral relations fact shet highlighting relations with Hong Kong. On the Department’s Bureau of Consular Affairs, an independent entry is available for travel advice in Hong Kong . And so I question - what exactly are the US State Department’s naming conventions, and would Hong Kong not be filed under a separate category?
The World Health Organisation Western Pacific Regional Office (which Hong Kong is a member of) currently singles out Hong Kong, Taiwan, and Macau in its daily case updates

Hong Kong, as a special administrative region, has an independent COVID-19 response system, healthcare system, and immigration borders. Important to note too, is that CSSE extracts Hong Kong’s data directly from HK’s Department of Health, and not from China’s CDC. If the map is meant for surveillance purposes, putting Hong Kong’s situation in the bigger Chinese picture would dilute and misrepresent the local situation. In fact, Hong Kong (along with Taiwan and Singapore) has insofar been installing effective measures, by the government or local health experts, in containing COVID19, despite its close proximity to China. Singling out Hong Kong’s low infection rates thus far would give academics, policy makers, and civil society the incentive to learn from HK’s practices.

@JimBudde
Copy link

@JimBudde JimBudde commented Mar 16, 2020

Having worked with data from a 911 Emergency system in the past, and when working in a fast paced environment such as this, you don't get the luxury of developing a nice data model up front, nor do you have people entering data in a consistent manner. I have worked with this data set for at least a month now and have "rolled with the punches" on all the name changes etc. If other people want to collect this data from across the globe at their own granularity level and "QA" the info as best they can, go for it. I for one, am grateful for what the group is doing.

@ZodinDevelopment
Copy link

@ZodinDevelopment ZodinDevelopment commented Mar 16, 2020

First thing, thank you to everyone here. Thank
you to the men and women working hard to make this possible. Its clear that there is work to be done here, and its also clear that universal access is a factor here. I may be out of my reckoning, but I am interested in finding a creative solution and indexing by GPS coordinates at varying levels of granularity. If this would help I would be happy to dedicate my full time and energy to this. Words can't express how glad I was to see this repo. Lets keep up the good work everyone!

@forzagreen
Copy link

@forzagreen forzagreen commented Mar 18, 2020

From its start in January, the COVID-19 tracking map has been an open and transparent public health resource. Nearly TK billion people from nearly every nation on the planet have visited the map. During a comprehensive review of the dashboard on March 10, Professor Lauren Gardner and her team decided to align the names of nations with the World Health Organization’s naming conventions to achieve consistency in reporting. Upon reconsideration, the team is now using U.S. State Department names.

I agree with aligning names with the World Health Organization's naming.
It's not the case now. For example, check this issue: #977

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet