# Generation of contours of United Nations Geoscheme sub-regions + intermediate regions

United Nations divises the world into regions and sub-regions : https://en.wikipedia.org/wiki/United_Nations_geoscheme

## Main aims

> * Get the official listing of countries from the United Nations
> * Use the m49 codes and geopandas to build the contours of the UN subregions.
> * Export in GeoJson to be used within any app (mapbox, leaflet, dash, etc.)

## Generation of

> * `ungeoscheme_country_list.csv` -> list of all nations from the United Nations database, with their ISO codes and UN subregions
> * `un_subregion_contours.xlsx` -> UN subregions in geopandas format
> * `un_subregion_contours.geojson` -> UN subregions in geojson format

## TODO

N/A

## Initialisation

In [2]:
import pandas as pd
import geopandas

# import requests

## Get UN geoscheme data

The Geoscheme is based on M49 codes https://unstats.un.org/unsd/methodology/m49/ the database can be downloaded at https://unstats.un.org/unsd/methodology/m49/overview/ or accessed with the first API explained on the page https://unstats.un.org/sdgapi/swagger/

**To run after downloading** the file at https://unstats.un.org/unsd/methodology/m49/overview/ with the name `UNSD — Methodology.xlsx`

In [3]:
df_import = pd.read_excel("UNSD — Methodology.xlsx")


def keep_regions_only(df):

    df = df[
        [
            "ISO-alpha3 Code",
            "M49 Code",
            "Country or Area",
            "Region Code",
            "Region Name",
            "Sub-region Code",
            "Sub-region Name",
            "Intermediate Region Code",
            "Intermediate Region Name",
        ]
    ].copy()

    # display(df)

    # Display regions
    # display(df.groupby(by=['Region Name','Sub-region Name','Intermediate Region Name'], dropna=False).count())

    # NB: the lonely one is Antartica, which is is none of Region Name
    # NB: the Channel Island Sark has got a M49 code but not an ISO-alpha3 one

    # Add Antartica Region
    df["Region Name"] = df["Region Name"].fillna("Antartica")
    df["Sub-region Name"] = df["Sub-region Name"].fillna("Antartica")

    # Dissolve "Channel Islands" within "Northern Europe"
    to_update = df[df["Intermediate Region Name"] == "Channel Islands"].index
    df.loc[to_update, "Intermediate Region Name"] = None

    # Delete the Sark Island
    to_delete = df[df["ISO-alpha3 Code"].isnull()].index
    df.drop(to_delete, inplace=True)

    # Fill Intermediate Region with Sub Regions when empty
    df["Intermediate Region Name"] = df["Intermediate Region Name"].combine_first(
        df["Sub-region Name"]
    )

    # Keep only useful data
    df = df[["ISO-alpha3 Code", "Intermediate Region Name"]]
    df.rename(
        columns={
            "ISO-alpha3 Code": "iso-alpha3_code",
            "Intermediate Region Name": "region_name",
        },
        inplace=True,
    )
    df.set_index("iso-alpha3_code", drop=True, inplace=True)

    return df


df_geoscheme = keep_regions_only(df_import)
display(df_geoscheme)
display(df_geoscheme.reset_index().groupby(by="region_name").count())
df_geoscheme.to_csv("../data/ungeoscheme_country_list.csv")

Unnamed: 0_level_0,region_name
iso-alpha3_code,Unnamed: 1_level_1
DZA,Northern Africa
EGY,Northern Africa
LBY,Northern Africa
MAR,Northern Africa
SDN,Northern Africa
...,...
WSM,Polynesia
TKL,Polynesia
TON,Polynesia
TUV,Polynesia


Unnamed: 0_level_0,iso-alpha3_code
region_name,Unnamed: 1_level_1
Antartica,1
Australia and New Zealand,6
Caribbean,28
Central America,8
Central Asia,5
Eastern Africa,22
Eastern Asia,7
Eastern Europe,10
Melanesia,5
Micronesia,8


## Get contours of UN regions

### Source = embedded dataset in geopandas (not reliable enough)

Beware that the dataset included in geopandas is not proper enough. Despite it's sourced from Natural Earth (like many other data sets) it seems incomplete: 5 of the 175 countries don't have any iso-alpha3 code, including France! (France, Norway, N. Cyprus, Somaliland, Kosovo).

In [3]:
# List all data sets available in geopandas
# display(geopandas.datasets.available)

In [4]:
# df_world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))[['iso_a3','name','geometry']]
# display(df_world)

In [5]:
# Merge polygons with the UN geoscheme region names
# df_world = df_world.merge(df_geoscheme, how="left", left_on="iso_a3", right_index=True)
# display(df_world)

### Other sources

#### Natural Earth based

- [Natural Earth](http://www.naturalearthdata.com/downloads/) is the official source of following datasets. It's the main open source for country and region contours.
- [GeoJSON Generator with 2017 data](https://geojson-maps.ash.ms/), source Natural Earth, with the possibility to download the low resolution package (0.5Mo). On github with direct link to get a very comprehensive dataset with many data and classifications, including UN codes and regions and subregions. **<---- The one that we use below**
- [DataHub](https://datahub.io/core/geo-countries), source Natural Earth, with high resolution map (23Mo GeoJSON)
- [GeoJSON](https://github.com/datasets/geo-countries), like the DataHub one, 23Mo

#### Eurostat based

- [Eurostat](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/countries) provides many formats, like geojson, and access possible by API, but not aggregated and in large files. Last update: 2020

#### United Nations based

- [Human Data from United Nations](https://data.humdata.org/dataset/united-nations-map#), but not accessible to public

### Build the UN map

1. Merge country contours with region names
2. Aggregate areas to make region contours

In [6]:
# Load the source of the contours
df_world = geopandas.read_file(
    "https://raw.githubusercontent.com/AshKyd/geojson-regions/master/countries/110m/all.geojson"
)
df_world = df_world[["gu_a3", "name_long", "continent", "subregion", "geometry"]]
display(df_world)
# df_world.to_excel('country_contours.xlsx')

Unnamed: 0,gu_a3,name_long,continent,subregion,geometry
0,AFG,Afghanistan,Asia,Southern Asia,"POLYGON ((61.21082 35.65007, 62.23065 35.27066..."
1,AGO,Angola,Africa,Middle Africa,"MULTIPOLYGON (((16.32653 -5.87747, 16.57318 -6..."
2,ALB,Albania,Europe,Southern Europe,"POLYGON ((20.59025 41.85540, 20.46318 41.51509..."
3,ARE,United Arab Emirates,Asia,Western Asia,"POLYGON ((51.57952 24.24550, 51.75744 24.29407..."
4,ARG,Argentina,South America,South America,"MULTIPOLYGON (((-65.50000 -55.20000, -66.45000..."
...,...,...,...,...,...
172,VUT,Vanuatu,Oceania,Melanesia,"MULTIPOLYGON (((167.84488 -16.46633, 167.51518..."
173,YEM,Yemen,Asia,Western Asia,"POLYGON ((53.10857 16.65105, 52.38521 16.38241..."
174,ZAF,South Africa,Africa,Southern Africa,"POLYGON ((31.52100 -29.25739, 31.32556 -29.401..."
175,ZMB,Zambia,Africa,Eastern Africa,"POLYGON ((32.75938 -9.23060, 33.23139 -9.67672..."


In [4]:
# As the source already includes UN subregion names, no need to merge it with the UN geoscheme names.

In [8]:
# Dissolve into region contours + centers
# available help at https://geopandas.org/aggregation_with_dissolve.html

df_world_buffered = df_world.copy()

# Addition of a buffer which is needed to remove inner lines in aggregated shapes
df_world_buffered["geometry"] = df_world_buffered["geometry"].buffer(0.0001)

df_un_regions = df_world_buffered.dissolve(by="subregion")[["geometry"]]
display(df_un_regions)

Unnamed: 0_level_0,geometry
subregion,Unnamed: 1_level_1
Antarctica,"MULTIPOLYGON (((-159.208 -79.497, -159.208 -79..."
Australia and New Zealand,"MULTIPOLYGON (((143.562 -13.764, 143.922 -14.5..."
Caribbean,"MULTIPOLYGON (((-61.680 10.760, -61.680 10.760..."
Central America,"POLYGON ((-87.793 13.384, -87.904 13.149, -87...."
Central Asia,"POLYGON ((61.211 35.650, 61.211 35.650, 61.211..."
Eastern Africa,"MULTIPOLYGON (((49.544 -12.470, 49.544 -12.470..."
Eastern Asia,"MULTIPOLYGON (((110.339 18.678, 110.339 18.678..."
Eastern Europe,"MULTIPOLYGON (((143.648 50.748, 144.654 48.976..."
Melanesia,"MULTIPOLYGON (((165.780 -21.080, 166.600 -21.7..."
Middle Africa,"POLYGON ((11.094 -3.979, 10.066 -2.970, 10.066..."


In [9]:
# Removal of Antartica's lands: Antartica and Seven seas (open ocean), because we don't need them for our project
df_un_regions.drop(index="Antarctica", inplace=True)
df_un_regions.drop(index="Seven seas (open ocean)", inplace=True)

# Display the regions with plotly
# df_un_regions.plot()

In [10]:
# Export to excel
df_un_regions.to_excel("../data/un_subregion_contours.xlsx")
# Export to geojson
df_un_regions.reset_index().to_file(
    "../data/un_subregion_contours.geojson", driver="GeoJSON"
)

## OPTIONAL : Display with ipyLeaflet

In [11]:
from ipyleaflet import Map, basemaps, basemap_to_tiles, Choropleth, linear, GeoData

In [14]:
# Basemap

center = [38.128, 2.588]
zoom = 2
# basemap = basemaps.Esri.NatGeoWorldMap # Nice map with colored country contours
# basemap = basemaps.Stamen.Terrain # Nice map, but a bit too contrasted for our needs
basemap = basemaps.Esri.WorldTopoMap  # Nice map, light

m = Map(basemap=basemap, center=center, zoom=zoom)

In [15]:
# Adding the contour layer

geo_data = GeoData(
    geo_dataframe=df_un_regions.reset_index(drop=True),
    style={
        "color": "black",
        "opacity": 0.3,
        #'dashArray':'2',
        "fillColor": "#3366cc",
        "fillOpacity": 0.05,
        "weight": 2,
    },
    hover_style={
        "fillColor": "red",
        "fillOpacity": 0.2,
    },
)
m.add_layer(geo_data)
display(m)

Map(center=[38.128, 2.588], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_…