# GEONOM

> Nomenclature of Countries and Territories
for the External Trade Statistics of the Community
and Statistics of Trade between Member States

GEONOM is published as a PDF document with tables giving the definitions of the alpha/numeric codes used to identify countries and territories.

In 2010, a correspondence table was published, defining the relationships between GEONOM identifiers and the ISO country codes. We'll use this for now.

In [1]:
import requests
from pathlib import Path
from io import BytesIO
from cachecontrol import CacheControl
from cachecontrol.caches.file_cache import FileCache
from cachecontrol.heuristics import LastModified
import pandas as pd

session = CacheControl(requests.Session(),
                       cache=FileCache('.cache'),
                       heuristic=LastModified())

inputURL = 'http://ec.europa.eu/eurostat/ramon/other_documents/geonom/concordances/geonom_2010-ISO.xls'
geonom = pd.read_excel(BytesIO(session.get(inputURL).content), na_values=[], keep_default_na=False)
geonom.drop(geonom.index[0], inplace=True)
geonom

Unnamed: 0,Note,ALPHA ISO,NUM ISO,ALPHA EU,NUM EU,COUNTRY,DESCRIPTION
1,*,AD,20,AD,43,Andorra,
2,*,AE,784,AE,647,United Arab Emirates,"Abu Dhabi, Dubai, Sharjah, Ajman, Umm al Qaiwa..."
3,*,AF,4,AF,660,Afghanistan,
4,*,AG,28,AG,459,Antigua and Barbuda,
5,*,AI,660,AI,446,Anguilla,
6,*,AL,8,AL,70,Albania,
7,*,AM,51,AM,77,Armenia,
8,*,AN,530,AN,478,Netherlands Antilles,"Curaçao, Bonaire, St Eustatius, Saba and south..."
9,*,AO,24,AO,330,Angola,Including Cabinda
10,*,AQ,10,AQ,891,Antarctica,Territory south of 60° south latitude; not inc...


The first part of the table is the main correspondence list, up to the line `ISO 3166 codes included in EU codes`. We'll just use these for now and remove the blank rows.

In [2]:
geonom = geonom[:geonom[geonom['Note'] == 'ISO 3166 codes included in EU codes'].index[0] -1]
geonom = geonom[geonom['COUNTRY'] != '']
geonom

Unnamed: 0,Note,ALPHA ISO,NUM ISO,ALPHA EU,NUM EU,COUNTRY,DESCRIPTION
1,*,AD,20,AD,43,Andorra,
2,*,AE,784,AE,647,United Arab Emirates,"Abu Dhabi, Dubai, Sharjah, Ajman, Umm al Qaiwa..."
3,*,AF,4,AF,660,Afghanistan,
4,*,AG,28,AG,459,Antigua and Barbuda,
5,*,AI,660,AI,446,Anguilla,
6,*,AL,8,AL,70,Albania,
7,*,AM,51,AM,77,Armenia,
8,*,AN,530,AN,478,Netherlands Antilles,"Curaçao, Bonaire, St Eustatius, Saba and south..."
9,*,AO,24,AO,330,Angola,Including Cabinda
10,*,AQ,10,AQ,891,Antarctica,Territory south of 60° south latitude; not inc...


Note that Pandas has interpreted Excel cells as numbers and so has lost the leading 0s in columns `NUM ISO` and `NUM EU`. These are three digit codes, although being a correspondence value, the `NUM ISO` column lists some codes differently, e.g. `246 + 248`.

Before we fix that, the column headers have picked up newlines and spaces, so rename.

In [3]:
geonom.columns

Index(['Note', 'ALPHA\nISO', 'NUM\nISO', 'ALPHA\nEU', 'NUM \nEU', 'COUNTRY',
       'DESCRIPTION'],
      dtype='object')

In [4]:
geonom.rename(columns={
    'ALPHA\nISO': 'ALPHA_ISO', 'NUM\nISO': 'NUM_ISO',
    'ALPHA\nEU': 'ALPHA_EU', 'NUM \nEU': 'NUM_EU'}, inplace=True)
geonom.columns

Index(['Note', 'ALPHA_ISO', 'NUM_ISO', 'ALPHA_EU', 'NUM_EU', 'COUNTRY',
       'DESCRIPTION'],
      dtype='object')

Ensure leading zeros are in `NUM_EU`

In [5]:
geonom['NUM_EU'] = geonom['NUM_EU'].apply(lambda x: "%03d" % int(x))

In [6]:
geonom

Unnamed: 0,Note,ALPHA_ISO,NUM_ISO,ALPHA_EU,NUM_EU,COUNTRY,DESCRIPTION
1,*,AD,20,AD,043,Andorra,
2,*,AE,784,AE,647,United Arab Emirates,"Abu Dhabi, Dubai, Sharjah, Ajman, Umm al Qaiwa..."
3,*,AF,4,AF,660,Afghanistan,
4,*,AG,28,AG,459,Antigua and Barbuda,
5,*,AI,660,AI,446,Anguilla,
6,*,AL,8,AL,070,Albania,
7,*,AM,51,AM,077,Armenia,
8,*,AN,530,AN,478,Netherlands Antilles,"Curaçao, Bonaire, St Eustatius, Saba and south..."
9,*,AO,24,AO,330,Angola,Including Cabinda
10,*,AQ,10,AQ,891,Antarctica,Territory south of 60° south latitude; not inc...


In [7]:
from rdflib import Graph, Literal, BNode, Namespace, RDF, URIRef, RDFS, OWL, XSD
from rdflib.namespace import SKOS
from rdflib.collection import Collection
import numpy as np

GN = Namespace('http://gss-data.org.uk/def/geonom_2012#')

g = Graph()
g.bind('gn', GN)
g.bind('skos', SKOS)

scheme = URIRef('http://gss-data.org.uk/def/geonom_2012')
g.add((scheme, RDF.type, SKOS.ConceptScheme))
g.add((scheme, RDFS.label, Literal('Geonomenclature, 2010')))

for dt in [{'subj': GN.Alpha2,
            'label': 'Alpha 2',
            'comment': 'Two letter country code',
            'pattern': '[A-Z]{2}'
           },
           {'subj': GN.Num3,
            'label': 'Numeric 3',
            'comment': 'Numeric three digit code',
            'pattern': '[0-9]{3}'
           }
          ]:
    g.add((dt['subj'], RDF.type, RDFS.Datatype))
    g.add((dt['subj'], RDFS.label, Literal(dt['label'])))
    g.add((dt['subj'], RDFS.comment, Literal(dt['comment'])))
    g.add((dt['subj'], OWL.onDatatype, XSD.String))
    restriction_resource = BNode()
    restrictions = Collection(g, restriction_resource)
    g.add((dt['subj'], OWL.withRestrictions, restriction_resource))
    pattern_resource = BNode()
    restrictions.append(pattern_resource)
    g.add((pattern_resource, XSD.pattern, Literal(dt['pattern'])))

for i, row in geonom.iterrows():
    term = GN.term(row['ALPHA_EU'])
    g.add((term, RDF.type, SKOS.Concept))
    g.add((term, RDFS.label, Literal(row['COUNTRY'].strip())))
    g.add((term, SKOS.inScheme, scheme))
    if row['DESCRIPTION'].strip() != '':
        g.add((term, RDFS.comment, Literal(row['DESCRIPTION'].strip())))
    g.add((term, SKOS.notation, Literal(row['ALPHA_EU'], datatype=GN.Alpha2)))
    g.add((term, SKOS.notation, Literal(row['NUM_EU'], datatype=GN.Alpha2)))
    
print(g.serialize(format='n3').decode('utf-8'))

@prefix gn: <http://gss-data.org.uk/def/geonom_2012#> .
@prefix ns1: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

gn:AD a skos:Concept ;
    rdfs:label "Andorra" ;
    skos:inScheme <http://gss-data.org.uk/def/geonom_2012> .

gn:AE a skos:Concept ;
    rdfs:label "United Arab Emirates" ;
    rdfs:comment "Abu Dhabi, Dubai, Sharjah, Ajman, Umm al Qaiwain, Ras al Khaima and Fujairah" ;
    skos:inScheme <http://gss-data.org.uk/def/geonom_2012> .

gn:AF a skos:Concept ;
    rdfs:label "Afghanistan" ;
    skos:inScheme <http://gss-data.org.uk/def/geonom_2012> .

gn:AG a skos:Concept ;
    rdfs:label "Antigua and Barbuda" ;
    skos:inScheme <http://gss-data.org.uk/def/geonom_2012> .

gn:AI a skos:Concept ;
    rdfs:label "Angu

In [8]:
out = Path('out')
out.mkdir(exist_ok=True, parents=True)
with open(out / 'geonom_2010.ttl', 'wb') as f:
    g.serialize(f, format='n3')