# OpenStreetMap parser definition

We use [`pyosmium`](https://github.com/osmcode/pyosmium), a open tool designed for parsing OSM objects within a Python environment (doc [here](http://docs.osmcode.org/pyosmium/latest/)).

In [1]:
import pandas as pd
import osmium as osm

## OSM history parsing

First we define a class that inherits from a Pyosmium handler.

In [2]:
class TimelineHandler(osm.SimpleHandler):
    def __init__(self):
        osm.SimpleHandler.__init__(self)
        self.timeline = [] # Dictionnary of OSM elements
        
    def add_element(self, elem, elem_type):
        self.timeline.append([elem_type,
                              elem.id,
                              elem.version,
                              elem.visible,
                              pd.Timestamp(elem.timestamp),
                              elem.uid,
                              elem.changeset])
        
    def node(self, n):
        self.add_element(n, "node")

    def way(self, w):
        self.add_element(w, "way")
                                     
    def relation(self,r):
        self.add_element(r, "relation")

Then we design an instance of this class, and call the `apply_file` function by passing the name of an OSM input file as a parameter. If this file is in the `osh.pbf` (respectively `osm.pbf`) format, we can have all local OSM history (resp. an up-to-date image of the API).

In [3]:
tlhandler = TimelineHandler()

In [4]:
tlhandler.apply_file("/home/rde/data/osm-history/raw/bordeaux-metropole.osh.pbf")

As we get the OSM objects under a file format, we can easily transform it into a `pandas` DataFrame, and manage it how we like.

In [5]:
colnames = ['elem', 'id', 'version', 'visible', 'ts', 'uid', 'chgset']
elements = pd.DataFrame(tlhandler.timeline, columns=colnames)
elements = elements.sort_values(by=['elem', 'id', 'version'])

In [6]:
elements.head(10)

Unnamed: 0,elem,id,version,visible,ts,uid,chgset
0,node,21457126,2,False,2008-01-17 16:40:56+00:00,24281,653744
1,node,21457126,3,False,2008-01-17 16:40:56+00:00,24281,653744
2,node,21457126,4,False,2008-01-17 16:40:56+00:00,24281,653744
3,node,21457126,5,False,2008-01-17 16:40:57+00:00,24281,653744
4,node,21457126,6,False,2008-01-17 16:40:57+00:00,24281,653744
5,node,21457126,7,True,2008-01-17 16:40:57+00:00,24281,653744
6,node,21457126,8,False,2008-01-17 16:41:28+00:00,24281,653744
7,node,21457126,9,False,2008-01-17 16:41:28+00:00,24281,653744
8,node,21457126,10,False,2008-01-17 16:41:49+00:00,24281,653744
9,node,21457126,11,False,2008-01-17 16:41:49+00:00,24281,653744


In [7]:
elements.sample(10)

Unnamed: 0,elem,id,version,visible,ts,uid,chgset
2482539,node,2362532957,1,True,2013-06-26 23:24:47+00:00,886721,16720292
2644759,node,2811684380,1,True,2014-04-23 18:40:15+00:00,1238664,21892916
2714459,node,3228658095,2,True,2014-12-17 15:50:32+00:00,1137406,27533125
1011908,node,1668572387,2,False,2012-09-12 20:00:45+00:00,53048,13088145
2173158,node,2242138810,2,True,2017-01-30 15:48:13+00:00,107257,45649120
203403,node,289234631,1,True,2008-08-18 22:16:03+00:00,54026,319632
659039,node,1257357197,1,True,2011-04-23 23:34:28+00:00,354363,7948485
598472,node,1249161571,1,True,2011-04-17 21:53:44+00:00,354363,7892049
647014,node,1257341824,1,True,2011-04-23 23:22:03+00:00,354363,7948262
2933009,way,78350407,1,True,2010-09-20 19:21:37+00:00,53048,5831957


If desired, this structure may even be stored on the file system!

In [8]:
elements.to_csv("bordeaux-metropole.csv", date_format='%Y-%m-%d')    

## OSM tag set parsing

As previously we design a parsing class, the only difference being in the appending function, in which we consider tag keys and values.

In [9]:
class TagGenomeHandler(osm.SimpleHandler):
    
    def __init__(self):
        osm.SimpleHandler.__init__(self)
        self.taggenome = []

    def tag_inventory(self, elem, elem_type):
        for tag in elem.tags:
            self.taggenome.append([elem_type,
                                   elem.id,
                                   elem.version,
                                   tag.k,
                                   tag.v])

    def node(self, n):
        self.tag_inventory(n, "node")

    def way(self, w):
        self.tag_inventory(w, "way")

    def relation(self, r):
        self.tag_inventory(r, "relation")

In [10]:
taghandler = TagGenomeHandler()

In [11]:
taghandler.apply_file("/home/rde/data/osm-history/raw/bordeaux-metropole.osh.pbf")

In [12]:
tag_genome = pd.DataFrame(taghandler.taggenome)
tag_genome.columns = ['elem', 'id', 'version', 'tagkey', 'tagvalue']
tag_genome = tag_genome.sort_values(['elem', 'id', 'version'], ascending=False)

In [13]:
tag_genome.head(10)

Unnamed: 0,elem,id,version,tagkey,tagvalue
2171854,way,475339500,1,access,customers
2171855,way,475339500,1,amenity,parking
2171852,way,475166883,1,oneway,yes
2171853,way,475166883,1,highway,cycleway
2171850,way,475166882,1,name,Rue de la Morandière
2171851,way,475166882,1,highway,tertiary
2171848,way,475166377,1,oneway,yes
2171849,way,475166377,1,highway,cycleway
2171844,way,475166376,1,name,Avenue de Magudas
2171845,way,475166376,1,oneway,yes


In [14]:
tag_genome.sample(10)

Unnamed: 0,elem,id,version,tagkey,tagvalue
2018295,way,226893947,1,wall,no
1657217,way,161740470,1,building,yes
1896841,way,189221699,6,highway,tertiary
1059425,way,100851548,2,source,cadastre-dgi-fr source : Direction Générale de...
1610661,way,159274734,1,source,cadastre-dgi-fr source : Direction Générale de...
1510843,way,155724987,1,source,cadastre-dgi-fr source : Direction Générale de...
295814,node,2242256716,1,taxon,Apitol
1195711,way,109951919,2,wall,no
894777,way,52111280,1,highway,secondary_link
1646171,way,161531863,1,building,yes


In [15]:
tag_genome.to_csv("bordeaux-metropole-tags.csv")