This is the notebook used to create the country code reference, matching the country string names in BW to the proper ISO3166-1-Alpha-3 code.

I use historic country codes here for Burma, East/West Germany, Soviet Union, etc. For a map that spans decades but shows modern borders, it might be a safe bet to group some countries under modern parallels, like `("East Germany", "DEU"), ("West Berlin", "DEU"), ("Burma", "MMR")`. I wouldn't normalize formerly joined countries (e.g. don't choose a modern country to point Serbia and Montenegro or Chekoslavakia too) except perhaps `('Soviet Union', 'RUS')`.

Currently Republic of China is under Taiwan, which is accurate in the modern day but likely misrepresentative in pre-revolution times.

In [None]:
import bwypy
import pandas as pd
bwypy.set_options(database='Bookworm2016', endpoint='https://bookworm.htrc.illinois.edu/cgi-bin/dbbindings.py')
bw = bwypy.BWQuery(verify_fields=False)

# Grab all the countries in BW
bw.counttype = ['TextCount']
bw.search_limits = { 'publication_country__id': {'$lt': 300} }
bw.groups = ['publication_country']
results = bw.run()
df = results.frame(index=False, drop_unknowns=True)
df.head()

# A pretty comprehensive list from Github
country_codes = (pd.read_csv('https://raw.githubusercontent.com/datasets/country-codes/master/data/country-codes.csv')
                 .rename(columns={'official_name_en':'publication_country', 'ISO3166-1-Alpha-3':'code'}))

# Fold the country names
a = pd.merge(df, country_codes[['name','code']], left_on='publication_country', right_on='name')[['publication_country', 'code']]
b = pd.merge(df, country_codes[['publication_country','code']])[['publication_country', 'code']]
c = pd.DataFrame([('USA', 'USA'), ('United States', 'USA'),
                  ('United Kingdom', 'GBR'),
                  ('Republic of China', 'TWN'),
                  ('Soviet Union', 'SUN'),
                  ('Democratic Republic of Congo', 'CON'),
                  ("East Germany", "DDR"),
                  ("West Berlin", "DEU"),
                  ("Republic of Georgia", "GEO"),
                  ("Burma", "BUR"),
                  ('Serbia and Montenegro', 'SCG'),
                  ('Czechoslovakia', 'CSK'),
                  ('Armenia (Republic)', 'ARM'),
                  ('Macao', 'MAC')
                  ],
                 columns=['publication_country','code'])

country_codes_df = pd.concat([a,b,c]).drop_duplicates()
remaining_uncoded = df[~df.publication_country.isin(country_codes_df.publication_country)]

In [None]:
country_codes_df.to_csv('data/country_codes.csv', index=False)