# Data model

The OSM world model consists of 3 main elements: __nodes__, __ways__, __relations__. Each of them has its own unique __ID__.
- Node represents a point on the map and is given by __coordinates__.
- Way can be either a polyline or a polygon, it is specified by a list of ID nodes that define it, and contains information about its properties in __tags__ (key-value pair). We are interested in the [keys](https://wiki.openstreetmap.org/wiki/Map_features): 'barrier', 'building', 'geological', 'military', 'natural', 'highway', 'landuse', 'railway', 'waterway', 'water'.
- Relation generally consists of nodes, ways and relations, specified by a list of their IDs, and also contains information about properties in tags. [Multipolygon relation](https://wiki.openstreetmap.org/wiki/Relation:multipolygon#Examples_in_XML) consists of ways, which can be __outer__ and __inner__. In this case, ways must be polygons, but they can be made up of multiple polylines. These polylines are listed in the order of forming the polygon. However, the list of polylines defining one polygon can be interrupted by the list of polylines defining another. At the same time, it is not specified which of the outer polygons each of the inner ones belongs to. Of the key tags of interest, multipolygon relation may contain 'building', 'natural', 'highway', 'landuse', 'waterway'.

# Storing and parsing data

To store and work with OSM data, 2 formats are mainly used: __.pbf (.osm.pbf)__ - binary format and __.xml (.osm)__ - text format.
- PBF files weigh less and faster to write and parse. Services for download: [parts of the world](https://download.geofabrik.de/), [cities](https://download.bbbike.org/osm/bbbike/), [adjustable area](https://extract.bbbike.org/) (via mail), [adjustable area](https://export.hotosm.org/en/v3/) (online), [planet](https://planet.maps.mail.ru/pbf/). There are many parsers of this format. In addition to the [mentioned ones](https://wiki.openstreetmap.org/wiki/PBF_Format#See_also) on the OSM website, there is a nice parser for python: [pyrosm](https://pyrosm.readthedocs.io/en/latest/), which solves the problem of relations described above, forming and returning [shapely](https://shapely.readthedocs.io/en/stable/manual.html) geometric objects, and at the same time it runs very quickly. The problem with this format is that I haven't found a way to download selected area at runtime.
- XML files weigh much more and take much longer to parse. However, it is possible to download selected area at runtime using curl and the OSM API (for small areas). Alternatively, you can use the Overpass API (for slightly larger areas). The main problem is that the area of 1 * 1 degree will already take about 150 MB.

In [None]:
bbox = [36.0, 56.45, 36.1, 56.5]
addr = '"https://api.openstreetmap.org/api/0.6/map?bbox=' \
    + str(bbox[0])  + ',' + str(bbox[1]) + ',' + str(bbox[2]) + ',' + str(bbox[3]) + '"'
!curl -o request_map.osm $addr

In [None]:
bbox = [36.0, 56.45, 36.1, 56.5]
addr = '"http://www.overpass-api.de/api/xapi_meta?*[bbox=' \
    + str(bbox[0])  + ',' + str(bbox[1]) + ',' + str(bbox[2]) + ',' + str(bbox[3]) + ']"'
!curl -g -o request_map.osm $addr

It is possible to customize XML parsing using [xml.etree.ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html). Here I extract all ways with selected keys to a ways dataframe, replacing node IDs with coordinates. Then I extract relations with selected keys to a relations dataframe, replacing way IDs with coordinate lists and dividing into inner and outer ways. Coincidential ways are removed from ways dataframe. __Parsing takes too long for almost any area.__

In [None]:
import xml.etree.ElementTree as ET
from pandas import DataFrame
root = ET.parse('request_map.osm').getroot()

# find node with id and return its coords
def node_coords(ID):
    node_info = root.find('node[@id="' + ID + '"]').attrib
    return (float(node_info['lat']), float(node_info['lon']))

# return tuple of coords of all nodes in a way
def way_coords(way):
    if way is None:
        return None
    coords = list()
    for node in way.findall('nd'):
        coords.append(node_coords(node.attrib['ref']))
    return tuple(coords)

# extract all ways to a dataframe
def process_way():
    ways_df = DataFrame(columns=['geometry', 'tag', 'type'])
    keys = ('barrier', 'building', 'geological', 'military', 'natural', 'highway', 'landuse', 'railway', 'waterway', 'water')
    for key in keys:
        ways = root.findall('way/tag[@k="' + key + '"]/..')
        for way in ways:
            ways_df.loc[way.attrib['id']] = (way_coords(way), key, way.find('tag[@k="' + key + '"]').attrib['v'])
    return ways_df

# find way with id
def find_way(ID):
    way = root.find('way[@id="' + ID + '"]')
    return way_coords(way)

# extract relations and its inner & outer ways from ways
def process_relations():
    relations_df = DataFrame(columns=['inner', 'outer'])
    keys = ('building', 'natural', 'highway', 'landuse', 'waterway')
    for key in keys:
        relations = root.findall('relation/tag[@k="' + key + '"]/..')
        for relation in relations:
            relation_id = relation.attrib['id']
            relations_df.loc[relation_id] = [[], []]
            outer = relation.findall('member[@role="outer"]')
            outer = [way.attrib['ref'] for way in outer]
            for way in outer:
                if way in ways_df.index:
                    relations_df.loc[relation_id].outer.append(ways_df.loc[way].geometry)
                    ways_df.drop(way, inplace=True)
                else:
                    relations_df.loc[relation_id].outer.append(find_way(way))
            inner = relation.findall('member[@role="inner"]')
            inner = [way.attrib['ref'] for way in inner]
            for way in inner:
                if way in ways_df.index:
                    relations_df.loc[relation_id].inner.append(ways_df.loc[way].geometry)
                    ways_df.drop(way, inplace=True)
                else:
                    relations_df.loc[relation_id].outer.append(find_way(way))
    return relations_df

ways_df = process_way()
relations_df = process_relations()

# Combination of XML and PBF

Since is is preferable to parse PBF files, there is a task of obtaining selected area at runtime. There are 2 solutions:
- extraxting a bounding box from a larger PBF file (e.g. part of the world or planet)
- downloading an XML file (shown above) and converting it to PBF format  

These tasks can be solved using [Osmosis](https://wiki.openstreetmap.org/wiki/Osmosis) and [Osmconvert](https://wiki.openstreetmap.org/wiki/Osmconvert).

#### Converting source.osm to target.osm.pbf

In [None]:
!osmosis --read-xml source.osm --write-pbf target.osm.pbf

In [None]:
!osmconvert source.osm -o=pbf_map.target.pbf

#### Extraxting a bounding box from planet.osm.pbf

In [None]:
!osmosis --bounding-box top=56.5 left=36.0 bottom=56.45 right=36.1 --write-pbf target.osm.pbf  # NWSE format

In [None]:
!osmconvert planet.osm.pbf -b=36.0,56.45,36.1,56.5 -o=target.osm.pbf  # WSEN format

#### Table of nodes

In [None]:
# extract a csv table of all node's IDs and coordinates
!osmconvert source.osm.pbf --drop-ways --drop-relations --csv="@id @lon @lat" --csv-headline -o=nodes.csv

#### Filtering tags

Tag filtering can be extremely useful to speedup parsing

In [None]:
# extract natural, landuse, highway relations and only their ways and nodes
!osmosis --read-pbf source.osm.pbf --tf accept-relations natural=* landuse=* highway=* --used-way --used node --write-pbf target1.osm.pbf

# extract natural, landuse, highway ways and only their nodes
!osmosis --read-pbf source.osm.pbf --way-key keyList="natural,landuse,highway" --used node --write-pbf target2.osm.pbf

# merge 2 files
!osmosis --read-pbf target1.osm.pbf --read-pbf target2.osm.pbf --merge --write-pbf target.osm.pbf

In [None]:
# extract relations and ways except boundary, place, public_trancport, route and only their nodes
!osmosis --read-pbf source.osm.pbf --tf reject-ways boundary=* place=* public_trancport=* route=* --used node --write-pbf target.osm.pbf

Filtering can also be done using [Osmfilter](https://wiki.openstreetmap.org/wiki/Osmfilter), which may well be even faster. All these tasks and even more can be accomplished using [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API). Python package [OSMPythonTools](https://github.com/mocnik-science/osm-python-tools) provides easy access to Overpass API ([examples](https://github.com/mocnik-science/osm-python-tools/blob/master/docs/overpass.md)) and OSM API.