# First Administrative Divisions of Countries

**[Work in progress]**

This notebook creates a .csv file with first administrative divisions (State, Province, Municipality) for ingestion into the Knowledge Graph.

Data source: [GeoNames.org](https://download.geonames.org/export/dump/)

Author: Peter Rose (pwrose@ucsd.edu)

In [1]:
import os
from pathlib import Path
import pandas as pd

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [3]:
NEO4J_IMPORT = Path(os.getenv('NEO4J_IMPORT'))
print(NEO4J_IMPORT)

/Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b9d10363-6d59-4deb-9595-2cb904a99d1d/installation-4.1.0/import


### Create admin1

In [4]:
admin1_url = 'https://download.geonames.org/export/dump/admin1CodesASCII.txt'

In [5]:
names = ['code', 'name', 'name_ascii', 'geonameid']

In [6]:
admin1 = pd.read_csv(admin1_url, sep='\t', dtype='str', names=names)
admin1 = admin1[['code', 'name_ascii', 'geonameid']]

### Standardize column names for Knowlege Graph
* id: unique identifier for country
* name: name of node
* parentId: unique identifier for continent
* properties: camelCase

In [7]:
admin1.rename(columns={'code': 'id'}, inplace=True) # standard id column to link nodes
admin1.rename(columns={'name_ascii': 'name'}, inplace=True)
admin1.rename(columns={'geonameid': 'geonameId'}, inplace=True)
admin1['code'] = admin1['id'].str.split('.', expand=True)[1]
admin1['parentId'] = admin1['id'].str.split('.', expand=True)[0]

### Use "District of Columbia" to be consistent with US Census

In [8]:
admin1['name'] = admin1['name'].str.replace('Washington, D.C.', 'District of Columbia')

### Example

In [9]:
admin1.query("id == 'US.DC'")

Unnamed: 0,id,name,geonameId,code,parentId
3658,US.DC,District of Columbia,4138106,DC,US


In [10]:
admin1.query("name == 'Missouri'")

Unnamed: 0,id,name,geonameId,code,parentId
3665,US.MO,Missouri,4398678,MO,US


### Export a minimum subset for now

In [12]:
admin1 = admin1[['id','name','code','parentId', 'geonameId']]
admin1.fillna('', inplace=True)

In [13]:
admin1.head()

Unnamed: 0,id,name,code,parentId,geonameId
0,AD.06,Sant Julia de Loria,6,AD,3039162
1,AD.05,Ordino,5,AD,3039676
2,AD.04,La Massana,4,AD,3040131
3,AD.03,Encamp,3,AD,3040684
4,AD.02,Canillo,2,AD,3041203


In [14]:
admin1.to_csv(NEO4J_IMPORT / "00f-GeoNamesAdmin1.csv", index=False)