# Open Street Map Data

This notebook is used to extract hiking route data from the Overpass API ([Link](https://overpass-turbo.eu/)).

First, we request hiking routes from the API using Overpass QL (short for "Overpass Query Language").
In OpenStreetMap, hiking routes are defined as relations. We search for relations with specific signage and the tags "hiking routes," "local walking network",
within an area slightly larger than Switzerland. Using "Center" as Output, OpenStreetMap calculates the central location of each route.
Since the "name" tag is often missing, we interpolate the name by concatenating the start and end points of each hiking route.
Finally, we retrieve the ID, name, latitude, and longitude as data points. 

The data is then converted into a DataFrame object, and a table is created in an SQL database (hosted on Microsoft Azure).

In [50]:
# Import required libraries
import os
import json
import overpy
import pyodbc
import urllib
import pymssql
import pandas as pd 
from sqlalchemy import Integer, String, Float, DATETIME, create_engine

### Connect to API

In [51]:
# Initialize the Overpass API with a custom URL
api = overpy.Overpass(url="http://overpass.osm.ch/api/interpreter")

# Overpass query for hiking trails within Switzerland. Using 'center', we obtain the coordinates in the middle of a hiking trail
query = """
[out:json];
relation
["route"="hiking"]
//["name"!~"fixme", i]
["network"="lwn"]
["osmc:symbol"~"yellow::yellow_diamond|red:white:red_bar|yellow:white:yellow_diamond|blue:white:blue_bar"]
(45.8899, 6.0872, 47.8085, 10.4921);
out center tags;
"""

# Execute the request
result = api.query(query)

### Save Data to DataFrame

In [53]:
# Add time and datestamp of API call to dataframe
timestamp_apicall = pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S")

# List to store the extracted information
list = []

# Iterate over all relations
for relation in result.relations:

    # Extract relevant data
    org_name = relation.tags.get('name')
    org_to = relation.tags.get('to')
    org_from = relation.tags.get('from')
    org_symbol = relation.tags.get('osmc:symbol')
    
    # Center is a tuple with latitude and longitude, we want only a single value
    lat = getattr(relation, 'center_lat')
    lon = getattr(relation, 'center_lon')
    
    # If the original name is not available, construct it from 'from' and 'to'

    dict = {    
    'id': relation.id,
    'name': org_name,
    'von': org_from,
    'bis': org_to,
    'lat': lat,
    'lon': lon,
    'symbol': org_symbol,
    # 'distance': distance,
    # 'ascent': ascent,
    'timestamp_apicall': timestamp_apicall}

        # Each tuple is now saved in the list as a new row
    list.append(dict)

# Once all data is processed, create the DataFrame
df_wanderwege = pd.DataFrame(list)

# Print the DataFrame
print(df_wanderwege.head())

       id                                          name        von  \
0   22614  Nationalpark Wanderroute 15 (Munt la Schera)       None   
1  103607                                 Wanderwege SG       None   
2  112830                                          None  Uetliberg   
3  112831                                          None  Folenweid   
4  112833                                          None  Felsenegg   

                  bis         lat         lon                  symbol  \
0                None  46.6501430  10.2301984       red:white:red_bar   
1                None  47.4309774   9.6201700  yellow::yellow_diamond   
2  Uetliberg Uto Kulm  47.3511680   8.4897796  yellow::yellow_diamond   
3             Baldern  47.3291235   8.5007261  yellow::yellow_diamond   
4            Balderen  47.3152439   8.5050559  yellow::yellow_diamond   

     timestamp_apicall  
0  2024-11-17 15:11:16  
1  2024-11-17 15:11:16  
2  2024-11-17 15:11:16  
3  2024-11-17 15:11:16  
4  2024-11-17 1

In [5]:
# optional: Store data in csv
df_wanderwege.to_csv("../data/processed/overpass.csv")

In [54]:
# Convert lat and lon to numeric, timestamp to datetime
df_wanderwege['lat'] = pd.to_numeric(df_wanderwege['lat'], errors='coerce')
df_wanderwege['lon'] = pd.to_numeric(df_wanderwege['lon'], errors='coerce')
df_wanderwege['timestamp_apicall'] = pd.to_datetime(df_wanderwege['timestamp_apicall'], errors='coerce')

### Connect to SQL Server

In [55]:
# Load configuration from config/db_config.json
with open('../config/db_config.json', 'r') as f:
    db_config = json.load(f)

# Get database credentials
server = db_config['server']
database = db_config['database']
db_user = db_config['db_user']
db_password = db_config['db_password']

### Create empty SQL table

In [56]:
# Create table if it doesn't exist
table_name = "OVRP_HikingRoutes"
query = f"""
    IF OBJECT_ID(N'dbo.{table_name}', N'U') IS NULL
    BEGIN
        CREATE TABLE {table_name} (
            id                      INT         NOT NULL,
            name                    VARCHAR(255) NULL,
            von                     VARCHAR(255) NULL,
            bis                     VARCHAR(255) NULL,
            lat                     FLOAT       NOT NULL,
            lon                     FLOAT       NOT NULL,
            symbol                  VARCHAR(255) NULL,
            timestamp_apicall       DATETIME    NULL,
            PRIMARY KEY (id)
        );
    END
    """

conn = pymssql.connect(server, db_user, db_password, database)
cursor = conn.cursor()
cursor.execute(query)

conn.commit()
conn.close()

In [57]:
# Create connection string for SQLAlchemy
connection_string = f"mssql+pymssql://{db_user}:{db_password}@{server}/{database}"
engine = create_engine(connection_string)

In [58]:
# Ingest data to tabledatabase table
df_wanderwege.to_sql(table_name, con=engine, if_exists='replace', index=False)
print("DataFrame erfolgreich in die MSSQL-Datenbank geladen!")

DataFrame erfolgreich in die MSSQL-Datenbank geladen!
