# Load Open Street Maps Data

This notebook is aimed to demonstrate how we obtain spartial data on power transmission lines. Our main data source are the Open Street Maps datasets. The `download_osm_data.py` script is used to extract OSM data for a world area requested by a user. The `config_osm_data.py` contains configuration data needed for such an extraction.

## Set working folder

In [1]:
import sys
sys.path.append('../')  # to import helpers

from scripts._helpers import _sets_path_to_root
_sets_path_to_root("pypsa-africa")

This is the repository path:  ./
Had to go 1 folder(s) up.


## Import nessesary packages

Load Python packages and set visibility options:

In [2]:
import logging
import sys
import pandas as pd
import requests
import urllib3
import time

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 70)

logger = logging.getLogger(__name__)

Load local packages written to load OSM data:

In [3]:
from scripts.config_osm_data import continent_regions
from scripts.config_osm_data import continents
from scripts.config_osm_data import iso_to_geofk_dict
from scripts.config_osm_data import world_iso
from scripts.config_osm_data import world_geofk

## Management of geographical data

OSM data are being organized by continents, macroregions and countries. Input data on country codes should correspond to ISO standard and be transformed into a valid OSM data request.

The `world_geofk` and `world_iso` Python two-levels dictionaries are used to keep data on such organization according to OSM and ISO conventions, respectivelly. Define a couple of supplementary functions to work with these data structures. The first one `list_countries()` transforms an input dictionary into a list while the second `getContinentCountryIso()` retrives the continent and country names by the country code.

In [4]:
def list_countries(w_dc):
    countries_list = []

    for continent in w_dc:
        country = w_dc[continent]
        countries_list.append(list(country.keys()))
        
    return countries_list 

def getContinentCountryIso(code):
    for continent in world_iso:
        country = world_iso[continent].get(code, 0)
        if country:
            return continent, country
    return continent, country

list_word_iso_countries = list_countries(world_iso)
list_word_geofk_countries = list_countries(world_geofk)

The ISO and OSM conventions differ in some details as can be seen from a following analysis.

Flatten each of the countries lists with `sum(a_list, [])` and keep only unique elements by `set()` transformation. Then substraction will give us differences between countries codes used by ISO and OSM:

In [5]:
iso_set = set(sum(list_word_iso_countries, []))
geofk_set = set(sum(list_word_geofk_countries, []))

iso_not_in_geofk = iso_set - geofk_set
geofk_not_in_iso = geofk_set - iso_set

Translate the codes into a human-redable tulpes and see for which **countries GeoFabrik naming differs from ISO**:

In [6]:
for cnt in list(iso_not_in_geofk):
    print(getContinentCountryIso(cnt))

('africa', 'western-sahara')
('asia', 'singapore')
('africa', 'gambia')
('asia', 'brunei')
('europe', 'san-marino')
('asia', 'malaysia')
('asia', 'israel')
('asia', 'united-arab-emirates')
('africa', 'senegal')
('asia', 'saudi-arabia')
('asia', 'kuwait')
('asia', 'macao')
('asia', 'qatar')
('latin_america', 'guyane')
('asia', 'hong kong')
('asia', 'bahrain')
('asia', 'palestine')
('asia', 'oman')


This differences between ISO and OSM is tackled by implementing a `iso_to_geofk_dict` dictionary which is used to transform ISO inputs into codes which can be processed by OSM server. Each ISO country code which is not assessible in OSM directly should be included into the `iso_to_geofk_dict` transformation dictionary otherwise this code would be lost for processing. Let's check how it works.

In [7]:
lost_codes = set(iso_to_geofk_dict.keys()) - set(iso_not_in_geofk)
    
print("Any ISO codes not resolved by GeoFbk and a transform dictionary?")
if len(lost_codes) > 0:
    print(lost_codes)
    for cnt in list(lost_codes):
        print(getContinentCountryIso(cnt))
else:
    print("...everithing seems to be allright")

Any ISO codes not resolved by GeoFbk and a transform dictionary?
...everithing seems to be allright


Look on the macro regions

In [8]:
macro_regions_list = list(dict(**continent_regions).values())
# flatten list and keep unique elements only
macro_reg_set = set(sum(macro_regions_list, []))

ISO codes not included into the macro regions dictionary:

In [15]:
print(len(macro_reg_set))
print(len(iso_set))

print(list(iso_set - macro_reg_set))

167
170
['GW', 'GY', 'SO']


...which can be translated into a plain language with `getContinentCountryIso()` transformation function:

In [10]:
for cnt in list(iso_set - macro_reg_set):
    print(getContinentCountryIso(cnt))

('africa', 'guinea-bissau')
('latin_america', 'guyane')
('africa', 'somalia')


In [11]:
print(set(continents))

print(set(continents) - iso_set)
print(set(continents) - geofk_set)

{'SA', 'OC', 'EU', 'AS', 'LA', 'CA', 'AF'}
{'OC', 'EU', 'AS'}
{'SA', 'OC', 'EU', 'AS'}


# Check Availability of OSM data

The requested OSM data should exist

Now we are interested in OSM data organization 
`getContinentCountry()` works on `world_geofk` dictionary which corresponds to OSM conventions

`build_url()` forms the url on the GeoFabrik server.

In [12]:
def getContinentCountry(code):
    for continent in world_geofk:
        country = world_geofk[continent].get(code, 0)
        if country:
            return continent, country
    return continent, country

def build_url(country_code, update, verify):
    continent, country_name = getContinentCountry(country_code)
    geofabrik_filename = f"{country_name}-latest.osm.pbf"
    geofabrik_url = f"https://download.geofabrik.de/{continent}/{geofabrik_filename}"
    return geofabrik_url


Go trough some locations

In [13]:
problem_urls = []
problem_codes = []
problem_domain = []

# flatten list
world_geofk_codes = sum(list_word_geofk_countries, [])

for cnt in world_geofk_codes[0:2]:    
    print(getContinentCountry(cnt))
    url = build_url(country_code=cnt, update=False, verify=False)
    print(url)
    time.sleep(1)
    
    with requests.get(url, stream=True, verify=True) as r:
        request = requests.head(url)
        if r.status_code == 200:
            print("URL '" + url + "' is working")
        else:
            problem_urls.append(url)
            problem_codes.append(cnt)
            problem_domain.append(getContinentCountry(cnt))
            
            if r.status_code == 429:
                print("Error code:" + str(r.status_code) + " Retry after" + r.headers["Retry-After"])
            else:
                print("There some troubles with " + url + " Error code:" + str(r.status_code))

if len(problem_urls) > 0:              
    print("There were troubles in reaching following urls:") 
    print(problem_urls) 
    print("Country codes to be checked:")
    print(problem_codes) 
    print(problem_domain)

('africa', 'algeria')
https://download.geofabrik.de/africa/algeria-latest.osm.pbf
URL 'https://download.geofabrik.de/africa/algeria-latest.osm.pbf' is working
('africa', 'angola')
https://download.geofabrik.de/africa/angola-latest.osm.pbf
URL 'https://download.geofabrik.de/africa/angola-latest.osm.pbf' is working


# Quick vizualization


# Acknowledgments

The project relies on the (OpenStreetMap)[https://www.geofabrik.de/] data provided via Geofabrik service. Many thanks to all the service contributors.