Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider checking the accuracy of conversion data against additional sources #25

Open
rymach opened this issue Mar 2, 2020 · 6 comments

Comments

@rymach
Copy link
Contributor

rymach commented Mar 2, 2020

http://api.geonames.org/countryInfoJSON?formatted=true&lang=en&username=demo&style=full
(may need to make an account)
https://nsgreg.nga.mil/restApi/GeopoliticalCorrelationResources.jsp

@rymach
Copy link
Contributor Author

rymach commented Mar 3, 2020

I checked our mappings.csv against then ngsgreg.nga.mil data from above - here are the geospatial discrepancies:

AS AUS AUS The correlation is part-to-whole, because the territory included under the GENC entry named "AUSTRALIA" is only a part of the territory included under the ISO 3166-1 entry named "AUSTRALIA".
CH CHN CHN The correlation is part-to-whole, because the GENC Standard contains separate top-level entries for three territories that ISO 3166-2 includes as administrative subdivisions of China. In the GENC Standard, those territories are represented exclusively by the entries named "HONG KONG", "MACAU" and "TAIWAN". The GENC Standard considers top-level entries to be territorially mutually exclusive. Therefore, the GENC Standard considers the entry named "CHINA" as specifically excluding the geospatial extent of Hong Kong, Macau, and Taiwan, and thus the GENC entry covers only a part of the territory covered by the ISO 3166-1 entry named "CHINA". Note that territorial overlaps are allowed among the entries of ISO 3166-1 (see ISO 3166-1:2006, Clause 4.3 "Overlaps"; and ISO 3166-2:2007, Clause 4.1.2 (on subdivision)). Hong Kong, Macau, and Taiwan are represented in ISO 3166-1, but are also indicated in ISO 3166-2 as administrative subdivisions of the ISO 3166-1 entry named "CHINA". Note also that the GENC Standard specifically excludes the geospatial extent of the Paracel Islands and Spratly Islands from coverage under the entry named "CHINA"; the GENC Standard represents them with distinct entries ("PARACEL ISLANDS" and "SPRATLY ISLANDS").
CU CUB CUB The correlation is part-to-whole, because the GENC Standard specifically excludes the geospatial extent of the U.S.-leased area housing the U.S. Naval Base at Guantanamo Bay from the GENC entry named "CUBA". ISO 3166 does not contain a distinct entry for the leased area around Guantanamo Bay; therefore, the GENC Standard entry named "CUBA" does not include all of the territory included under the ISO 3166-1 entry named "CUBA".
FR FRA FRA The correlation is part-to-whole, because the GENC Standard considers the geopolitical entity named "FRANCE" as excluding the geospatial extent of the dependency of Clipperton Island; the overseas departments of French Guiana, Guadeloupe, Martinique, Mayotte, and Reunion; the overseas territorial collectivities of French Polynesia, New Caledonia, Saint Barthelemy, Saint Martin, Saint Pierre and Miquelon, and Wallis and Futuna; and the overseas territory of the French Southern Lands. Each of those geographic territories is represented by a distinct entry in the GENC Standard. Note that ISO 3166-1 states that the entry named "FRANCE" includes all of those geographic territories, and ISO 3166-2 specifies code elements for them as administrative subdivisions of France.
FS ATF ATF Non-primary correlation to the ISO 3166-2 entry named ('fr'/'fra') "Terres australes françaises".
MY MYS MYS The correlation is part-to-whole, because the GENC Standard specifically excludes the geospatial extent of the Spratly Islands from the entry named "MALAYSIA", while the ISO 3166/MA may consider some territory of the Spratly Islands to be included under the ISO 3166-1 entry named "MALAYSIA".
NL NLD NLD The correlation is part-to-whole, because the GENC Standard considers the entry named "NETHERLANDS" as specifically excluding the geospatial extents of the countries Aruba, Curaçao, and Sint Maarten, and excluding the geospatial extent of the group of Caribbean special municipalities (Bonaire, Sint Eustatius and Saba); each of those four areas has a separate entry at the top-level of the GENC Standard (which assumes no overlap among areas denoted by top-level entries). In contrast, the ISO 3166/MA considers the ISO 3166-1 entry for "NETHERLANDS" to include all of those territories, because all are included in ISO 3166-2 as administrative subdivisions of the Netherlands (Aruba, Curaçao, and Sint Maarten as type "country"; and Bonaire, Sint Eustatius, and Saba individually as type "special municipality").
NO NOR NOR The correlation is part-to-whole because the GENC Standard considers the entry named "NORWAY" as specifically excluding the geospatial extent of Jan Mayen and Svalbard, each of which has a separate entry in the top level of the GENC Standard.
RP PHL PHL The correlation is part-to-whole, because the GENC Standard specifically excludes the geospatial extent of the Spratly Islands from the entry named "PHILIPPINES"; the GENC Standard represents them with a distinct entry ("SPRATLY ISLANDS"). The ISO 3166/MA may consider some territory of the Spratly Islands to be included under the ISO 3166-1 entry named "PHILIPPINES".
RI SRB SRB The correlation is part-to-whole, because the GENC Standard considers the entry named "SERBIA" as specifically excluding the geographic territory of Kosovo, which has a separate top-level entry in the GENC Standard ("KOSOVO"), while the ISO 3166/MA considers the entry named "SERBIA" to include the territory of Kosovo, with ISO 3166-2 code elements specified for an administrative subdivision of Serbia (of type "autonomous province") named ('sr'/'srp') "Kosovo-Metohija".
SV XSV SJM This primary correlation is part-to-whole because the GENC Standard includes a separate entry for Svalbard, while ISO 3166-1 contains a single composite entry named "SVALBARD AND JAN MAYEN". A non-primary correlation is also established from the GENC entry named "SVALBARD" to the ISO 3166-2 entry named ('nn'/'nno') "Svalbard".
TW TWN TWN The correlation is part-to-whole, because the GENC Standard specifically excludes the geospatial extent of the Paracel Islands and of the Spratly Islands from the GENC entry named "TAIWAN"; the GENC Standard represents both with distinct entries ("PARACEL ISLANDS" and "SPRATLY ISLANDS"). The GENC Standard does not include Taiwan as a province under the entry named "CHINA". The GENC Standard does not include Taiwan in the geospatial extent of the entry named "CHINA", while in contrast, the ISO 3166/MA considers Taiwan to be included under the entry named "CHINA" and in ISO 3166-2 specifies code elements for a corresponding administrative subdivision (of type "province").
US USA USA The correlation is geospatially part-to-whole, because the GENC Standard considers the entry named "UNITED STATES" as specifically excluding U.S. overseas areas (that is: the geospatial extent of the United States includes the 50 U.S. states and the capital District of Columbia), while ISO 3166-2 includes those territories as subdivisions of the United States.
VM VNM VNM The correlation is part-to-whole, because the GENC Standard specifically excludes the geospatial extent of the Paracel Islands and of the Spratly Islands from the GENC entry named "VIETNAM". The Paracel Islands in the South China Sea are claimed by China (including Taiwan) and by Vietnam. The Spratly Islands in the South China Sea are claimed in their entirety by China (including Taiwan) and Vietnam, and in part by Brunei, Malaysia, and Philippines.The GENC Standard represents both with distinct entries ("PARACEL ISLANDS" and "SPRATLY ISLANDS"). The ISO 3166/MA may consider some territory of the Paracel Islands or Spratly Islands to be included under the ISO 3166-1 entry named "VIET NAM".
IP CPT CPT The GENC entry named "CLIPPERTON ISLAND" is correlated as part-to-whole with the ISO 3166-1 entry named "FRANCE". The GENC Standard uses user-assigned code elements to refer specifically to the entry named "CLIPPERTON ISLAND", while ISO 3166-1 states that Clipperton Island is included under the entry named "FRANCE", and ISO 3166-2 specifies code elements for an administrative subdivision (of type "dependency") named ('fr'/'fra') "Clipperton".
JN XJM SJM This primary correlation is part-to-whole because the GENC Standard includes a separate entry for Jan Mayen , while ISO 3166-1 contains a single composite entry named "SVALBARD AND JAN MAYEN". Note: A non-primary correlation is established from the GENC entry named "JAN MAYEN" to the ISO 3166-2 entry named ('nn'/'nno') "Jan Mayen".

@rymach rymach changed the title Consider checking the accuracy of conversion data against geonames Consider checking the accuracy of conversion data against additional sources Mar 3, 2020
@bdeining
Copy link
Member

bdeining commented Mar 5, 2020

I wonder if we might be able to pull the country info file down during the build process and use it at runtime. That might eliminate the need for us maintaining our own

@bdeining
Copy link
Member

bdeining commented Mar 5, 2020

@rymach
Copy link
Contributor Author

rymach commented Mar 5, 2020

@rymach
Copy link
Contributor Author

rymach commented Mar 5, 2020

I wonder if we might be able to pull the country info file down during the build process and use it at runtime. That might eliminate the need for us maintaining our own

That's what I was thinking. I was working on something like that here:
https://github.com/rymach/countrycode-python/blob/master/process.py

right now it outputs json: https://github.com/rymach/countrycode-python/blob/master/cc.json

but I think it should output a .java file with one big enum like:
https://github.com/TakahikoKawasaki/nv-i18n/blob/master/src/main/java/com/neovisionaries/i18n/CountryCode.java

@khlj
Copy link

khlj commented Mar 31, 2020

Looks like demo has daily limit of 2000 calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants