## Storing the countries in Neo4j

We have a master list of ISO countries that we can use to create our base country nodes.

In [1]:
import json
import pandas as pd

In [2]:
iso_data = pd.read_json('./data/iso3166_country_codes.json')

In [3]:
iso_data.head()

Unnamed: 0,alt_name,iso3166_code,name
0,Afghanistan,AF,Afghanistan
1,Aland,AX,Aland Islands
2,Albanie,AL,Albania
3,Algerie,DZ,Algeria
4,Samoa americaines,AS,American Samoa


### Now we can loop over the data and insert each country with it's corresponding code and name

In [4]:
from neo4j.v1 import GraphDatabase
user = "neo4j"
password = "mypassword"
connection_path = "bolt://xx.yy.z.a:35240"
driver = GraphDatabase.driver(connection_path, auth=(user, password))

First we create a list of all the new nodes we wish to create and their associated properties. Then we can connect to the neo4j database and run a Cypher command to insert and create new nodes; we use MERGE in case we have any duplicate entries.

In [5]:
countryList = [{'code': row.iso3166_code, 'name': row['name'].strip().upper()} for ind, row in iso_data.iterrows()]

In [49]:
with driver.session() as session:
    session.run(("UNWIND {list} AS d "
                 "MERGE (c:Country {code: d.code, name: d.name})"),
                {"list": countryList})

### Country & Nationality mapping

In many instances we will need to be able to convert between country and nationality when connecting people and companies to countries.

We have three different source files that we can use, natively stored as JSON we will convert them to pickles for reasy use within python. The 3 files are:

- Country names to country code mappings
- Country code to country name mappings
- Nationality to country code mappings

### Country codes to names

In [12]:
country_codes_2_names = json.load(open('data/clean_country_code_map.json', 'r'))

From our data we created clean country codes, do they match with what ISO data we have?

In [26]:
for key in country_codes_2_names.keys():
    result = code2name.get(key, '**')
    if result == '**':
        print(key, country_codes_2_names.get(key))

DQ DQ
XK KOSOVO


It looks like we are missing Kosovo which has code XK. DQ seems to be a mistake as it is Dominica and has real code DM. So let's add Kosovo and then save the output to a pkl.

In [29]:
code2name['XK'] = 'KOSOVO'

In [30]:
pd.to_pickle(code2name, "data/clean_country_code_map.pkl")

Let's also fix the DQ in the country_codes_2_names file ...

In [32]:
country_codes_2_names.pop('DQ', None)
country_codes_2_names['DM'] = code2name.get('DM')
print(country_codes_2_names['DM'])

DOMINICA


### Country names to codes

We also need a reverse lookup and since there are a variety of ways of doing the lookup let's integrate all the names we have to their respective codes.

In [36]:
country_names_2_codes = json.load(open("data/country_name_2_code_map.json", "r"))

In [40]:
country_names_2_codes = {str(k).upper(): str(v).upper()for k,v in country_names_2_codes.items()}

In [42]:
pd.to_pickle(country_names_2_codes, "data/combined_country_map.pkl")

### Nationality

In [45]:
nationality_2_codes = json.load(open("data/nationality_map.json", "r"))

In [51]:
pd.to_pickle(nationality_2_codes, "data/nation_map.pkl")