# Cities

**[Work in progress]**

This notebook creates a .csv file with city information (cities > 1000 citizens) for ingestion into the Knowledge Graph.

Data source: [GeoNames.org](https://download.geonames.org/export/dump/)

Author: Peter Rose (pwrose@ucsd.edu)

In [1]:
import os
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
from pathlib import Path
import pandas as pd

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [3]:
NEO4J_HOME = Path(os.getenv('NEO4J_HOME'))
print(NEO4J_HOME)

/Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-4af96121-2328-4e2f-ba60-6d8b728a26d5/installation-4.0.3


### Read City data

In [4]:
names = [
        'geonameid','name','asciiname','alternatenames','latitude','longitude','feature class',
        'feature code','country code','cc2','admin1 code','admin2 code','admin3 code','admin4 code',
        'population','elevation','dem','timezone','modification date'
]

Read city data (> 15,000 citizens)

In [5]:
url = 'https://download.geonames.org/export/dump/cities15000.zip'
file_name = "cities15000.txt"
resp = urlopen(url)
zipfile = ZipFile(BytesIO(resp.read()))
city_15k = pd.read_csv(zipfile.open(file_name), sep="\t", low_memory=False, names=names)

Read city data (> 5,000 citizens)

In [6]:
url = 'https://download.geonames.org/export/dump/cities5000.zip'
file_name = "cities5000.txt"
resp = urlopen(url)
zipfile = ZipFile(BytesIO(resp.read()))
city_5k = pd.read_csv(zipfile.open(file_name), sep="\t", low_memory=False, names=names)

Read city data (> 1000 citizens)

In [7]:
url = 'https://download.geonames.org/export/dump/cities1000.zip'
file_name = "cities1000.txt"
resp = urlopen(url)
zipfile = ZipFile(BytesIO(resp.read()))
city_1k = pd.read_csv(zipfile.open(file_name), sep="\t", low_memory=False, names=names)

In [8]:
# TODO read city data (> 500 citizens)

In [9]:
city = pd.concat([city_15k, city_5k, city_1k])

In [10]:
city = city[['geonameid', 'asciiname', 'country code', 'admin1 code', 'admin2 code', 'population', 'elevation']]
city = city.fillna('')

#### Remove duplicates

In [11]:
city = city.drop_duplicates('geonameid')

In [12]:
print('Number of cities', city.shape[0])

Number of cities 136582


In [13]:
def get_location_id(country, admin1, admin2):
    location = country
    if admin1 != '':
        location = location + '.' + admin1
    if admin2 != '':
        location = location + '.' + admin2
        
    return location

### Standardize column names for Knowlege Graph
* id: unique identifier for country
* name: name of node
* parentId: unique identifier for continent
* properties: camelCase

In [14]:
city.rename(columns={'geonameid': 'id'}, inplace=True)
city.rename(columns={'asciiname': 'name'}, inplace=True)
city['parentId'] = city.apply(lambda row: get_location_id(row['country code'], 
                                                         row['admin1 code'], 
                                                         row['admin2 code']), axis=1)

### Example

In [15]:
city.query("name == 'San Diego'")

Unnamed: 0,id,name,country code,admin1 code,admin2 code,population,elevation,parentId
4216,3621926,San Diego,CR,02,303.0,16991,,CR.02.303
23222,5391811,San Diego,US,CA,73.0,1394928,20.0,US.CA.073
7927,3669947,San Diego,CO,10,20750.0,8014,,CO.10.20750
52441,3590312,San Diego,GT,22,,557,,GT.22
52827,3602368,San Diego,HN,07,,1306,,HN.07
80408,3827294,San Diego,MX,17,,1463,,MX.17
80567,3973609,San Diego,MX,24,24.0,1065,,MX.24.024
81257,3987339,San Diego,MX,25,6.0,1240,,MX.25.006
82883,4024589,San Diego,MX,11,37.0,1407,,MX.11.037
83811,8858713,San Diego,MX,21,174.0,2026,,MX.21.174


### Export a minimum subset for now

In [16]:
city = city[['id', 'name', 'population', 'elevation', 'parentId']]
city.fillna('', inplace=True)

In [17]:
city.head()

Unnamed: 0,id,name,population,elevation,parentId
0,3040051,les Escaldes,15853,,AD.08
1,3041563,Andorra la Vella,20430,,AD.07
2,290594,Umm Al Quwain City,62747,,AE.07
3,291074,Ras Al Khaimah City,351943,,AE.05
4,291580,Zayed City,63482,,AE.01.103


In [18]:
city.to_csv(NEO4J_HOME / "import/00h-GeoNamesCity.csv", index=False)