Skip to content

colinmorris/unique-country-prefixes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is the shortest prefix that uniquely identifies the name of each country? (I'm using "prefix" in the computer science sense, so, for example, S, SP, SPA, and SPAI are all prefixes of SPAIN, as well as SPAIN itself, and the empty string, ε.)

An alternative formulation: if you had an autocomplete field for choosing a country, what is the shortest sequence of letters you would have to type before your options are narrowed down to one specific country?

Example: SWE is the shortest unique prefix for Sweden. Sweden is the only country that begins with this letter sequence, and there is no shorter prefix that has this property (because, for example, SW is shared with Switzerland).

Some possibly surprising facts:

  • There are 3 countries that are uniquely specified by their first letter
  • The 2 countries with the longest shortest unique prefixes require 13 characters (including spaces) to distinguish: REPUBLIC OF _
  • There are 2 countries whose shortest unique prefix is not "proper" - i.e. it is the whole name of the country. (3 if you count Iran - see information on data sources below.)
  • There are 3 countries that have no unique prefix!

Data

For the purposes of this experiment, I used the 192 United Nations member states as of July 2022. I used the English name listed on the UN website here, which may differ from the country's endonym or official full English name (e.g. 'Germany', rather than 'Deutschland' or 'Federal Republic of Germany'). Most of the forms used here are the recognizable ones used in everyday conversation, though there are a few exceptions (e.g. the country commonly known as Turkey has requested to be referred to as Türkiye as of May 2022).

I used the repository cristiroma/countries for two purposes:

  • For the flag icons used in the generated infographics. To load these images locally, you'll need to clone the countries repo under the root of this repo.
  • To generate countries.csv. I started from the file at countries/data/csv/countries.csv, then manually winnowed it down to just the UN member states, and manually updated the first 'name' column for a couple states to match the form currently used by the UN.

Infographic generation

The IPython notebook prefixes.ipynb generates an html file pres.html.

I then convert this into a pdf using Chrome's 'print to pdf' feature.

Then I convert that to an image using an Imagemagick invocation along these lines:

convert -density 150 -trim pres.pdf -quality 100 pres.png

(Grossly circuitous, I know.)

The notebook also generates another html file, spoilers.html, which is the 'answer' key version of the infographic with the full name of each country. It differs from the other html file in a few ways:

  • It includes the css file sstyles.css, which has some specific rules, e.g. making the "suffix" elements that show the remainder of a country's name after the MUP visible.
  • It excludes the explanatory "preamble" text under the title

There is some ad-hoc fine-tuning that goes into the image conversion process. When printing to pdf, I'll generally set margins to 'none', and may fiddle with a custom 'scale' setting. When running imagemagick, I may or may not need to do some cropping of margins (either automatically using the -trim flag, or manually with -crop).

About

Infographic ranking countries by the length of the shortest unique prefix that identifies them

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published