# Countries

**[Work in progress]**

This notebook creates a .csv file with country information for ingestion into the Knowledge Graph.

Data source: [GeoNames.org](https://download.geonames.org/export/dump/)

Author: Peter Rose (pwrose@ucsd.edu)

In [1]:
import os
from pathlib import Path
import pandas as pd

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [3]:
NEO4J_IMPORT = Path(os.getenv('NEO4J_IMPORT'))
print(NEO4J_IMPORT)

/Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-19636412-9e74-4bac-8a4c-c6c8b49bb9d3/installation-4.1.0/import


### Create countries

In [4]:
country_url = 'https://download.geonames.org/export/dump/countryInfo.txt'

In [5]:
names = ['ISO','ISO3','ISO-Numeric','fips','Country','Capital','Area(in sq km)','Population',
         'Continent','tld','CurrencyCode','CurrencyName','Phone','Postal Code Format',
         'Postal Code Regex','Languages','geonameid','neighbours','EquivalentFipsCode'
        ]

In [6]:
countries = pd.read_csv(country_url, sep='\t',comment='#', dtype='str', names=names)

### Add missing data

Add missing iso code for Namibia

In [7]:
index = countries.query("ISO3 == 'NAM'").index
countries.at[index, 'ISO'] = 'NA'
countries.head()

Unnamed: 0,ISO,ISO3,ISO-Numeric,fips,Country,Capital,Area(in sq km),Population,Continent,tld,CurrencyCode,CurrencyName,Phone,Postal Code Format,Postal Code Regex,Languages,geonameid,neighbours,EquivalentFipsCode
0,AD,AND,20,AN,Andorra,Andorra la Vella,468,77006,EU,.ad,EUR,Euro,376,AD,,,,,
1,AE,ARE,784,AE,United Arab Emirates,Abu Dhabi,82880,9630959,AS,.ae,AED,Dirham,971,,,"ar-AE,fa,en,hi,ur",290557.0,"SA,OM",
2,AF,AFG,4,AF,Afghanistan,Kabul,647500,37172386,AS,.af,AFN,Afghani,93,,,"fa-AF,ps,uz-AF,tk",1149361.0,"TM,CN,IR,TJ,PK,UZ",
3,AG,ATG,28,AC,Antigua and Barbuda,St. John's,443,96286,,.ag,XCD,Dollar,+1-268,,,en-AG,3576396.0,,
4,AI,AIA,660,AV,Anguilla,The Valley,102,13254,,.ai,XCD,Dollar,+1-264,,,en-AI,3573511.0,,


### Standardize column names for Knowlege Graph
* id: unique identifier for country
* name: name of node
* parentId: unique identifier for continent
* properties: camelCase

In [8]:
# https://www.iso.org/obp/ui/#iso:code:3166:BQ

In [9]:
countries['id'] = countries['ISO'] # standard id column to link nodes
countries.rename(columns={'ISO': 'iso'}, inplace=True)
countries.rename(columns={'ISO3': 'iso3'}, inplace=True)
countries.rename(columns={'ISO-Numeric': 'isoNumeric'}, inplace=True)
countries.rename(columns={'Country': 'name'}, inplace=True)
countries.rename(columns={'Population': 'population'}, inplace=True)
countries.rename(columns={'Area(in sq km)': 'areaSqKm'}, inplace=True)
countries.rename(columns={'geonameid': 'geonameId'}, inplace=True)

### Export a minimum subset for now

In [10]:
countries = countries[['id','name','iso','iso3','isoNumeric','areaSqKm','geonameId']].copy()
countries.fillna('', inplace=True)

In [11]:
countries.head(300)

Unnamed: 0,id,name,iso,iso3,isoNumeric,areaSqKm,geonameId
0,AD,Andorra,AD,AND,20,468.0,
1,AE,United Arab Emirates,AE,ARE,784,82880.0,290557.0
2,AF,Afghanistan,AF,AFG,4,647500.0,1149361.0
3,AG,Antigua and Barbuda,AG,ATG,28,443.0,3576396.0
4,AI,Anguilla,AI,AIA,660,102.0,3573511.0
5,AL,Albania,AL,ALB,8,28748.0,
6,AM,Armenia,AM,ARM,51,29800.0,
7,AO,Angola,AO,AGO,24,1246700.0,3351879.0
8,AQ,Antarctica,AQ,ATA,10,14000000.0,6697173.0
9,AR,Argentina,AR,ARG,32,2766890.0,


In [12]:
countries.to_csv(NEO4J_IMPORT / "00e-GeoNamesCountry.csv", index=False)