# README

There are a few things we need to notice before we go on:

1. **Dataset download**: https://www.geofabrik.de

* To accsess the OpenStreetMap hitorical data, please follow the link suggested above, the region could be selected upon download. For example, if you would like to explore London, you may simply download the data for London and save it in Local/Google drive.

* The file dowloaded would be in osh.pbf format, which will be used for our data extraction, it includes all information about nodes, ways and relations within a specific region. In this example, we will focus on how to do the data extraction by looking at nodes only.

* Specifically, we will look at all nodes contain the information about amenity, and retrieve the entire editing history to understand how amenity has evolved from time to time in a city (here we look at London).

2. **Data Extraction**:

* After extrating the data, it is highly recommended to save the extracted tabular data for further analysis, as the compuational time for retrieving is relatively expensive.

# Connect to Google Drive

In [1]:
from google.colab import drive
drive.mount('/content/gdrive/')

Mounted at /content/gdrive/


# Set-up

In [3]:
# Install required packages
!pip install osmium

Collecting osmium
  Downloading osmium-3.3.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.3 MB)
[?25l[K     |▎                               | 10 kB 14.7 MB/s eta 0:00:01[K     |▌                               | 20 kB 19.9 MB/s eta 0:00:01[K     |▊                               | 30 kB 23.6 MB/s eta 0:00:01[K     |█                               | 40 kB 24.2 MB/s eta 0:00:01[K     |█▎                              | 51 kB 20.5 MB/s eta 0:00:01[K     |█▌                              | 61 kB 21.8 MB/s eta 0:00:01[K     |█▉                              | 71 kB 23.1 MB/s eta 0:00:01[K     |██                              | 81 kB 21.2 MB/s eta 0:00:01[K     |██▎                             | 92 kB 22.4 MB/s eta 0:00:01[K     |██▌                             | 102 kB 23.9 MB/s eta 0:00:01[K     |██▉                             | 112 kB 23.9 MB/s eta 0:00:01[K     |███                             | 122 kB 23.9 MB/s eta 0:00:01[K     |███▎             

# Data Extraction

## Defining the osm Handler for feature extration

In [5]:
import osmium as osm
import pandas as pd

# Defining the Handler to extract all the wanted informations
# (Type; id; Version; Timestamp; Latitude/Longitude; Amenity type.)
# If we want to extract other type of features(such as office, aeroway), 
# we may change the 'amenity' within the tag part to the wanted feature.



class TimelineHandler(osm.SimpleHandler):
    """
    This TimeLineHanler will provide the entire editing history about every
    node that represents an amenity within the input file.
    """
    def __init__(self):
        osm.SimpleHandler.__init__(self)
        self.elemtimeline = []
        
    def node(self, n):
        # amenity has a name
        if 'amenity' in n.tags and 'name' in n.tags:
            self.elemtimeline.append(["node",
                                      n.id,
                                      n.version,
                                      pd.Timestamp(n.timestamp),
                                      n.location.lat,
                                      n.location.lon,
                                      n.tags["amenity"],
                                      n.tags["name"]])
        # amenity has no name
        elif 'amenity' in n.tags:
            self.elemtimeline.append(["node",
                                      n.id,
                                      n.version,
                                      pd.Timestamp(n.timestamp),
                                      n.location.lat,
                                      n.location.lon,
                                      n.tags["amenity"],
                                      "N/A"])

## Extract the info and Transform into Tabular Format

In [7]:
tlhandler = TimelineHandler()

# change file name to the OSM data download for the specific place under study
file_name = "./gdrive/MyDrive/Target Folder/greater-london-internal.osh.pbf"

tlhandler.apply_file(file_name)

# Transforming the extracted data into a dataframe for further manipulations
colnames = ['type','id','Version','TS',"Lat","Lon",'amenity','name']
elements = pd.DataFrame(tlhandler.elemtimeline, columns=colnames)
elements = elements.sort_values(by=['type','TS'],ascending=False)
elements = elements.reset_index(drop=True)

In [9]:
elements.head()

Unnamed: 0,type,id,Version,TS,Lat,Lon,amenity,name
0,node,185743749,7,2021-05-09 23:14:56+00:00,51.550833,-0.138445,post_box,
1,node,303198052,3,2021-05-09 23:14:56+00:00,51.550804,-0.14039,bicycle_parking,
2,node,8715968899,1,2021-05-09 23:14:56+00:00,51.550761,-0.1356,public_bookcase,Leighton Road Community Book Swap
3,node,8716017943,1,2021-05-09 23:14:56+00:00,51.550776,-0.140567,bicycle_parking,
4,node,8716017952,1,2021-05-09 23:14:56+00:00,51.550088,-0.140706,waste_basket,


In [10]:
elements.to_csv("./gdrive/MyDrive/Target Folder/London_Extracted.csv",header = False)