# OSM - Charging Station Data

## 0 Imports and Paths

The following packages will be needed for this notebook.

In [1]:
# Imports:
import overpy
import geopandas as gpd
from shapely.geometry import Point
import os

Paths for in- and output files:

In [2]:
OUT_DIR = os.path.abspath("./data/")
DATA_FILENAME = "osm_cs.geojson"

## 1 Introduction

This notebook aims at retrieving OSM data for all charging stations in Munich. To retrieve custom OSM data, the overpass api - which you already know from the theory of this labcourse - is used. Overpy, a module used in this notebook is capable of sending overpass queries to the overpass api and convert the server response to a python object. From this response object, a GeoDataFrame can be built and processed for further use. Hence, the following steps are executed in this notebook: 

    1. Writing an overpass query, capable of fetching all charging station data in Munich (Section 2)
    2. Using overpy to send the query to the overpass api (Section 2)
    3. Converting the response to a GeoDataFrame (Section 3)
    4. Saving the GeoDataFrame for later use (Section 3)

## 2 Retrieving OSM Data via Overpass

To retrieve charging stations from OSM, one has to find the relevant OSM tags that define charging stations. A good resource to find tags is: https://taginfo.openstreetmap.org/. 
Use this website to identify the tags (key/value pairs) associated with charging stations and build an overpass turbo query, that fetches all nodes that represent charging stations in Munich. Only fetch nodes for this exercise (no ways or relations!). You can test your overpass query using https://overpass-turbo.eu/. 
Once you are sure you have the right query, you can build a python string containing that query and save it in a variable called 'overpass_query'. This query can then be sent to the overpass api through python, to obtain a python data structure that we can use for further data processing.

For further information on the overpass query language see the theory slides of this labcourse, use google or have a look at the following resource: https://wiki.tum.de/display/smartemobilitaet/Overpass

In [3]:
# Build a python String that contains the overpass query
# overpass_query = ....

#<<solution>>
# The query that is going to be sent to overpass as a String
overpass_query = '[out:json]; area["name"="München"]->.muc; ( node["amenity"="charging_station"](area.muc); node["amenity"="charging station"](area.muc); node["parking_space"="charging"](area.muc); node["capacity:charging"](area.muc););out geom;'
#<</solution>>

To send the query to overpass, we use overpy (https://pypi.org/project/overpy/), which is a python wrapper class for the overpy api. Use the link above, to create an Overpass instance and send the query to the overpy api using the Overpass instance you created. Save the server response to a variable called 'response'.

Note: This step may fail due to a huge number of request via the same IP (you are essentially all working on the same machine). If that happens ask one of the labcourse supervisors for advice.

In [4]:
# Send the overpass query to the overpass server using overpy. Store the result in a variable called "response"

#<<solution>>
api = overpy.Overpass()
response = api.query(overpass_query)
#In case overpy api temporarily doesnt work due to a huge load of requests via your api:
#response = fetch_overpy_backup()
#<</solution>>

## 3 Converting and Saving Data

The overpass.Result stored in the 'response' variable contains the OSM data that results from the query sent to the overpass server. You can easily have a look at the data (remember: We only asked for nodes)

In [5]:
# Nodes fetched from the server
response.nodes

[<overpy.Node id=401301237 lat=48.1589335 lon=11.5739817>,
 <overpy.Node id=1271358134 lat=48.1330014 lon=11.6896930>,
 <overpy.Node id=1330522157 lat=48.1326571 lon=11.5460300>,
 <overpy.Node id=1475946405 lat=48.1532363 lon=11.5375666>,
 <overpy.Node id=1895124498 lat=48.1333465 lon=11.5280605>,
 <overpy.Node id=2409259910 lat=48.1553706 lon=11.5751999>,
 <overpy.Node id=2413225221 lat=48.1446564 lon=11.5577649>,
 <overpy.Node id=2415163823 lat=48.1560591 lon=11.5549218>,
 <overpy.Node id=2427086060 lat=48.1875182 lon=11.5531779>,
 <overpy.Node id=2590211683 lat=48.1750570 lon=11.5656546>,
 <overpy.Node id=2610743040 lat=48.1857331 lon=11.5728826>,
 <overpy.Node id=2617872045 lat=48.0883280 lon=11.5050421>,
 <overpy.Node id=2647411854 lat=48.1890971 lon=11.5738285>,
 <overpy.Node id=2703924735 lat=48.1893184 lon=11.5703479>,
 <overpy.Node id=2787301365 lat=48.1594532 lon=11.5568430>,
 <overpy.Node id=2888806085 lat=48.1859274 lon=11.5581572>,
 <overpy.Node id=2888806086 lat=48.176881

Apparently, overpy comes with a dedicated overpy.Node class to store OSM nodes. Let's have a look at the non-private fields of one of the nodes in the result:

In [6]:
# Simple hack to display the non-private fields of a python variable
[field for field in dir(response.nodes[0]) if field[0]!='_']

['attributes',
 'from_json',
 'from_xml',
 'get_center_from_json',
 'get_center_from_xml_dom',
 'id',
 'lat',
 'lon',
 'tags']

As you can see, an overpass.Node-object comprises the node's OSM id, its latitude, its longitude and its OSM tags. The other fields are not relevant for this notebook. We already know that we want to build a GeoDataFrame from the OSM data we fetched. 
Since each GeoDataFrame has to have a geometry column, we will use the latitude and longitude information to generate shapely Points that we can then use to generate the geometry column. It is also apparent that the id may be used as another column of the GeoDataFrame to be created,  but what do we do with the tags?

First, let's see how tags are stored in overpy.responses:

In [7]:
# Let's see what kind of structure a tag is
type(response.nodes[0].tags)

dict

The tags of a OSM node/way/relation are stored as a dictionary object. As such, it is easy to access the tags of a node.

In [8]:
# Show all the tags of one of the nodes
example_tags = response.nodes[0].tags
print("The tags of this charging station are:\n" + str(example_tags))
# Access the value of one of the tags
example_key = list(example_tags.keys())[0]
example_val = example_tags[example_key]
print("\nOne key and value combination in the tags of this charging station is: {}:{}".format(example_key, example_val))

The tags of this charging station are:
{'amenity': 'charging_station', 'fee': 'yes', 'network': 'ladenetz.de', 'opening_hours': '24/7', 'operator': 'Stadt München', 'payment:coins': 'yes', 'payment:debit_cards': 'ladenetz.de', 'ref': 'DESWME010101'}

One key and value combination in the tags of this charging station is: amenity:charging_station


In contrast to "id", "lat" and "lon" tags can have arbitrary keys and values in OSM and do not need to be set at all. Hence, we should not save all node tags to the GeoDataFrame to avoid creating a large number of columns in the GeoDataFrame that we may not need. Insead, it is better to have a look at the tags present in the data and decide which information is needed and which is not.

In [9]:
# Display the tag keys present in the data we fetched
set(sum([list(n.tags.keys()) for n in response.nodes] ,[]))

{'access',
 'addr:city',
 'addr:country',
 'addr:housenumber',
 'addr:postcode',
 'addr:street',
 'amenity',
 'amperage',
 'authentication:app',
 'authentication:membership_card',
 'authentication:membership_card:types',
 'authentication:nfc',
 'authentication:none',
 'authentication:phone_call',
 'authentication:short_message',
 'bicycle',
 'brand',
 'brand:wikidata',
 'capacity',
 'capacity:car',
 'capacity:charging',
 'car',
 'cars',
 'charging_station:output',
 'contact:phone',
 'contact:website',
 'covered',
 'description',
 'description:en',
 'disused',
 'fee',
 'fixme',
 'layer',
 'level',
 'maxstay',
 'maxstay:conditional',
 'motorcar',
 'name',
 'network',
 'note',
 'opening_hours',
 'operator',
 'parking:fee',
 'payment:coins',
 'payment:debit_cards',
 'payment:membership_card',
 'power',
 'ref',
 'scooter',
 'service:bicycle:charging',
 'service:bicycle:pump',
 'socket:chademo',
 'socket:chademo:output',
 'socket:schuko',
 'socket:schuko:current',
 'socket:type2',
 'socket:t

The list above comprises all tag keys present in the response nodes. After careful consideration of the available options the relevant keys are:

    - "operator": The operator of the charging station (company)
    - "capacity": The number of charging points at the charging station 
      (number of cars that can charge simultaneously)
    - "amperage": The amperage of the charging points
    - "voltage": The voltage of the charging points
    
We can now create a GeoDataFrame with the following columns and fill it with data from response.nodes: 

    1. "id"
    2. "geometry"
    3. "operator"
    4. "capacity"
    5. "amperage"
    6. "voltage"

In [10]:
# Create an empty GeoDataFrame and call it "osm_CS"
# osm_CS = ....

#<<solution>>
osm_CS = gpd.GeoDataFrame()
#<</solution>>

# Create an empty column for each piece of information that shall be stored in the GeoDataFrame:
# "operator", "capacity", "amperage", "voltage"

#<<solution>>
osm_CS["id"] = None
osm_CS["geometry"] = None
osm_CS["operator"] = None
osm_CS["outlets"] = None
osm_CS["amperage"] = None
osm_CS["voltage"] = None
#<</solution>>

In [11]:
# Fill the GeoDataFrame with data from the response. If a tag is not present in a response.node, fill in a None value.
# Important hint: Always start filling a row with the geometry column, otherwise it won't work, because GeoPandas
# does not accept rows without geometries!

#<<solution>>
for i, node in enumerate(response.nodes):
    
    # Retrieve the node tags
    tags = node.tags

    # Write the information to the DataFrame
    osm_CS.loc[i ,"geometry"] = Point(node.lon, node.lat)
    osm_CS.loc[i, "id"] = node.id
    osm_CS.loc[i, "operator"] = tags["operator"] if "operator" in tags.keys() else None
    osm_CS.loc[i, "outlets"] = tags["capacity"] if "capacity" in tags.keys() else None
    osm_CS.loc[i, "amperage"] = tags["amperage"] if "amperage" in tags.keys() else None
    osm_CS.loc[i, "voltage"] = tags["voltage"] if "voltage" in tags.keys() else None
#<</solution>>

Let's have a short look at the result to make sure, the GeoDataFrame generated is plausible:

In [12]:
osm_CS

Unnamed: 0,id,geometry,operator,outlets,amperage,voltage
0,401301237,POINT (11.5739817 48.1589335),Stadt München,,,
1,1271358134,POINT (11.689693 48.1330014),,,32,400
2,1330522157,POINT (11.54603 48.1326571),E.On,,,
3,1475946405,POINT (11.5375666 48.1532363),Stadtwerke München,2,,
4,1895124498,POINT (11.5280605 48.1333465),,,,
5,2409259910,POINT (11.5751999 48.1553706),Stadtwerke München,2,,
6,2413225221,POINT (11.5577649 48.1446564),Stadtwerke München,1,,
7,2415163823,POINT (11.5549218 48.1560591),Stadtwerke München,,,
8,2427086060,POINT (11.5531779 48.1875182),,,,
9,2590211683,POINT (11.5656546 48.175057),Stadtwerke München,2,,


After the GeoDataFrame has been filled successfully, it needs to be saved to the filesystem. Most of the time, a GeoJSON file is suitable for storing GeoDataFrames. It is especially handy, since it can be read by humans. Only when large amounts of data need to be stored, binary formats like ESRI Shapefiles are better suited to avoid large file sizes.

You are already familiar with the way data is loaded into GeoDataFrames. Writing files from GeoPandas is as easy and works analogously to reading files. The syntax for writing files is: 

my_df.to_file("<target_path>", driver = "<driver>", **kwargs)
    
'encoding' is one of the keyword arguments (https://stackoverflow.com/questions/1419046/normal-arguments-vs-keyword-arguments) that is understood by the .to_file method. It can be passed to explicitly define the encoding of the text file generated by the function. We will set this encoding to 'UTF-8' during the labcourse, because we can ensure that "Umlaute" (ä,ö,ü) will be handled correctly by this encoding. 

Note: The 'encoding' keyword is also understood by the .read_file method. 
Best Practice: Always supply an encoding to .to_file and .read_file and make sure both fit one another

In [13]:
osm_CS.to_file(OUT_DIR + "/" + DATA_FILENAME, driver='GeoJSON', encoding='UTF-8')

## Appendix  

In [14]:
import json

def fetch_overpy_backup():
    
    BACKUP_FILE = os.path.abspath("./checkpoint_data/charging_stations_overpy_response.json")
    
    with open(BACKUP_FILE, "r", encoding="utf-8") as backup_file:
        data = json.load(backup_file)
    return overpy.Result().from_json(data)
    