<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction-to-Serialization-formats" data-toc-modified-id="Introduction-to-Serialization-formats-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction to Serialization formats</a></span><ul class="toc-item"><li><span><a href="#JSON" data-toc-modified-id="JSON-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>JSON</a></span><ul class="toc-item"><li><span><a href="#Exercise" data-toc-modified-id="Exercise-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li><li><span><a href="#CSV" data-toc-modified-id="CSV-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>CSV</a></span><ul class="toc-item"><li><span><a href="#Exercise" data-toc-modified-id="Exercise-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li><li><span><a href="#XML" data-toc-modified-id="XML-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>XML</a></span><ul class="toc-item"><li><span><a href="#Exercise" data-toc-modified-id="Exercise-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li></ul></li><li><span><a href="#Introduction-to-GeoData-formats:" data-toc-modified-id="Introduction-to-GeoData-formats:-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Introduction to GeoData formats:</a></span><ul class="toc-item"><li><span><a href="#GeoJSON" data-toc-modified-id="GeoJSON-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>GeoJSON</a></span></li><li><span><a href="#GeoPackage" data-toc-modified-id="GeoPackage-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>GeoPackage</a></span></li></ul></li></ul></div>

# Introduction to Serialization formats 

## JSON
Let's talk about JSON - JavaScript Object Notation. It is a syntax for storing and exchanging data as text, therefore machine and human-readable.  
There a few packages in Python for working with JSON files.  

1. One package is called [json](https://docs.python.org/3/library/json.html)  
2. Then there is [pandas](https://pandas.pydata.org/), you will see pandas can read a lot of formats, as it is a package specialized for data analysis 

The following code shows how to read a JSON file (to a DataFrame) using Python:

In [None]:
import json
import pandas as pd 
import geopandas as gpd
from pandas.io.json import json_normalize
#loading the file prepared in data
grillpl = gpd.read_file(r'./data/grillplaetze.json')
#doing some sclicing, as we just want to have specific information about the mueums
#grillpl = grillpl['name']
grillpl.head(10)

We just printed the first 5 lines (grillpl.head(5)) if you change the 5 to any other number you will get the amount of lines.

### Exercise
Now it is your turn. We provided a JSON file called "markante_bauw.json". Load the file and extract the building height and number of floors. After extracting the data, use the **mathplotlip** library and visualize your data in dependence as graph. 

## CSV

The CSV format (comma separated values) describes a text file for the exchange of structured data. As the name already says the information are separated by a specific separator, mostly **;**. 

How to work with CSV in Python, well there are lots of possibilities:  

1. The already integrated [csv](https://docs.python.org/3/library/csv.html) module 
2. Another package you can use to read CSV files is [numpy](https://numpy.org/doc/stable/user/how-to-io.html?highlight=csv)  
3. The one we use here is [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)  

So you see, we are not really bounded to a specific library, well at least not as long as we are using basic operations. SO let's have a look how to actually load a CSV file. If you load the file you, have got lots of options to add. To get an overview have a look in the documentation. Run the following code to see what happens.

In [None]:
import pandas as pd
import os
import matplotlib.pyplot as plt 
bauwerk_ac = pd.read_csv(r"./data/bauwerksliste_aachen.csv", sep=';', engine='python')
ort = bauwerk_ac.loc[:, 'Ort:']
bauwerksart = bauwerk_ac.loc[:, 'Bauwersart:']
print (bauwerksart)

The code above not just load a CSV file but does some slicing as well. If you change the print output to **Ort** you will get the output for the **Ort** column.  


In [None]:
ort = bauwerk_ac.loc[:, 'Ort:']
myset = set(ort)
x = {}
for elem in myset:
    for i in ort:
        if elem == i:
            x[elem] = x.get(elem, 0)+1

fig1, ax1 = plt.subplots()
ax1.pie(x.values(), labels=x.keys(), autopct='%1.1f%%', startangle=90)
ax1.axis('equal')
plt.show()

### Exercise

1. Exercise  
The code above reads a CSV file and filters how often a place is called. This information is visualized in a diagram. The diagram can hardly be read, check out matplotlip and try to visualize the output in different ways.
 
2. Exercise  
Take the CSV file **bauwerksliste_aachen.csv** slice the file and extract information about building year just for buildings with "Ort" = Aachen.

## XML

XML the Extensible Markup Language encodes documents in a format that is readable for human and machine. At least in theory, but you are welcome to try and read a XML file containing above 10000 lines, there are much more interesting things to read. Anyway, XML files are a little harder to work with as the are based on the parent-child prinzip and contain **namespace**. 

As for the other two there are a few XML modules in Python as well.  

1. There is an integrated package calles [xml](https://docs.python.org/3/library/xml.html)  
2. The [ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html) package   
3. **lxml** "toolkit is a Pythonic binding for the C libraries libxml2 and libxslt" and is actually compatible but superior to the ElementTree API. 

Well let's move on and have a look at an XML in Python using ElementTree:

In [None]:
import xml.etree.ElementTree as ET
tree = ET.parse("./data/markante_bauw_wuppert.xml")
html = tree.getroot()
#print(html.tag)
html.attrib
#print(html)
ns = {"gmd": "http://www.isotc211.org/2005/gmd", "gco": "http://www.isotc211.org/2005/gco", "csw" : "http://www.opengis.net/cat/csw/2.0.2"}
for child in html.findall(".//gmd:EX_GeographicBoundingBox", namespaces = ns):
    point = child.find('./gmd:westBoundLongitude/gco:Decimal', namespaces = ns).text
    print(point)
#print(html)

### Exercise
1. Exercise  
Try to get a list of the buildings mentioned in this XML file by using ElementTree. It is possible, that you will have to open the XML and read through it. 
2. Exercise  
Try and write a XML file yourself. We already had a file with museums in Düsseldorf, let's create one for Aachen as XML by using Python. Search the museums and add for each of them a little summary of their focus.

# Introduction to GeoData formats:


## GeoJSON
GeoJSON is, after all, a JSON file, so it can be read as one. Therefore, geo-spacialized python packages such as GeoPandas are able to read GeoJSON.
As the following example shows, sometimes files are not correct done and need to be improved. When using the following points as they are right now, one end up on the ocean.

In [None]:
import geopandas as gp
import matplotlib
import matplotlib.pyplot as plt
new_geoj = gp.read_file('./data/knotenpunkte-wald_ac.geojson')
#new_geoj = new_geoj.set_crs(epsg=25832)
new_geoj = new_geoj.to_crs(epsg=4326)
new_geoj.head()

In [None]:
for elem in new_geoj['geometry']:
    print (elem)

In [None]:
t = new_geoj.plot(figsize=(6, 6))
plt.show()

In [None]:
type(new_geoj.geometry[0])

In [None]:
ax = new_geoj.set_geometry('geometry')\
                .plot('knotennr', 
                      markersize=10)

In [None]:
new_geoj.plot(
    column='id', 
    legend=True, 
    edgecolor='none', 
    figsize=(12, 12)
)

## GeoPackage
To work with special data such as GeoPackages we need geo-specialised python packages such as GeoPandas. First we could analyse the data, by checking the Indexes, Shape, and we can plot the table here. The 'aachen_network' file contains a node network and an edge network, make sure to set the layer you want correctly, by default it is 'node'.

In [None]:
import geopandas as gp
import contextily
file_gb = gp.read_file("./data/aachen_network.gpkg", layer="nodes")
file_gb.columns

In [None]:
file_gb.shape

In [None]:
file_gb.head(10)

In [None]:
file_gb.plot(markersize=0.1)

In [None]:
file_gb= file_gb.to_crs(epsg=3857)
file_gb.head()

In [None]:
ax = file_gb.plot(figsize=(10, 10), alpha=0.5, edgecolor='k')
contextily.add_basemap(
    ax,
    crs=file_gb.crs.to_string(), 
    source=contextily.providers.Stamen.Toner
)