<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction-to-Serialization-formats" data-toc-modified-id="Introduction-to-Serialization-formats-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction to Serialization formats</a></span><ul class="toc-item"><li><span><a href="#JSON" data-toc-modified-id="JSON-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>JSON</a></span><ul class="toc-item"><li><span><a href="#JSON-and-package-JSON" data-toc-modified-id="JSON-and-package-JSON-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>JSON and package JSON</a></span></li><li><span><a href="#JSON-and-package-pandas" data-toc-modified-id="JSON-and-package-pandas-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>JSON and package pandas</a></span></li><li><span><a href="#Exercise" data-toc-modified-id="Exercise-1.1.3"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li><li><span><a href="#CSV" data-toc-modified-id="CSV-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>CSV</a></span><ul class="toc-item"><li><span><a href="#CSV-and-package-CSV" data-toc-modified-id="CSV-and-package-CSV-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>CSV and package CSV</a></span></li><li><span><a href="#CSV-and-package-pandas" data-toc-modified-id="CSV-and-package-pandas-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>CSV and package pandas</a></span></li><li><span><a href="#Exercise" data-toc-modified-id="Exercise-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li></ul></li><li><span><a href="#To-Do:-XML" data-toc-modified-id="To-Do:-XML-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>To-Do: XML</a></span><ul class="toc-item"><li><span><a href="#XML" data-toc-modified-id="XML-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>XML</a></span><ul class="toc-item"><li><span><a href="#Exercise" data-toc-modified-id="Exercise-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li></ul></li><li><span><a href="#Resaving-files" data-toc-modified-id="Resaving-files-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Resaving files</a></span></li></ul></div>

# Introduction to Serialization formats 

## JSON
Let's talk about JSON - **J**ava**S**cript **O**bject **N**otation. It is a data-interchange format written in plain test. Furthermore it is used to store and send data between computers (machine and human-readable) and therefore language independent.  
The JSON format is similar to the code for creating JavaScript objects. Therefore, a JSON data can easily be converted into JavaScript objects by using a JavaScript program. But we are going to talk about JSON and Python.

There a few packages in Python for working with JSON files.  

1. One package is called [json](https://docs.python.org/3/library/json.html)  
2. Then there is [pandas](https://pandas.pydata.org/), you will see pandas can read a lot of formats, as it is a package specialized for data analysis 

### JSON and package JSON  
The following code shows how to read a JSON file (to a Dictionary) using Python package **JSON**:

In [None]:
#import JSON using the JSON package
import json
with open(r'./data/markante_bauw.json') as jsonFile:
    jsonObject = json.load(jsonFile)
    
print(jsonObject)

In the first code cell we have loaded the JSON file with the python package JSON. in the following we access a specific information in this file. in this case it is the "id". Using the ID we get the first element which is *grillplaetze.1*.

In [None]:
features = jsonObject['result']
ids = jsonObject['result']['records']
print(ids)

### JSON and package pandas
Another possibility to read a JSON file is offered by python **GeoPandas**. Here, the read information is transferred into a DataFrame. We print the first 10 rows of the DataFrame to the screen. The following code shows how to read a JSON file (to a DataFrame) using Python package **pandas**:

In [None]:
#import JSON using Pandas
import pandas as pd
#loading the file prepared in data
grillpl = pd.read_json(r'./data/markante_bauw.json')
grillpl.head(10)

JSON file image example and DataFrame

When loading a JSON file it is important to check the file itself. There is a connection between the files structure and the DataFrame. To get an idea of the connection we provided a JSON file opened in a text editor and opened using Python Pandas.  
One thing you will see, when turning to the geodata formats, is the difference between loading the files. For a normal JSON Pandas is absolut fine, but to meet the requirements of a GeoJSON, you will need GeoPandas. 

You can convert Python to JSON. Python objects are converted into JSON as the following table shows:

|Python    |JSON    |
|:---      |    ---:|
|dict      |Object  |
|list      |Array   |
|tuple     |Array   |
|str       |String  |
|int       |Number  |
|float     |Number  |
|True      |true    |
|False     |false   |
|None      |null    |

### Exercise
Now it is your turn. Take the JSON file called "markante_bauw.json". Load the file and extract the building height and number of floors. After extracting the data, use the **mathplotlip** library and visualize your data in dependence as graph. 

## CSV

The CSV format (comma separated values) describes a text file for the exchange of structured data. As the name already says the information are separated by a specific separator, mostly **;**. 

How to work with CSV in Python, well there are lots of possibilities:  

1. The already integrated [csv](https://docs.python.org/3/library/csv.html) module 
2. Another package you can use to read CSV files is [numpy](https://numpy.org/doc/stable/user/how-to-io.html?highlight=csv)  
3. The one we use here is [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)  

So you see, we are not really bounded to a specific library, well at least not as long as we are using basic operations. SO let's have a look how to actually load a CSV file. If you load the file you, have got lots of options to add. To get an overview have a look in the documentation. Run the following code to see what happens.

### CSV and package CSV
First of all, let us have a look, how to load a CSV file by using the Python package **csv**:

In [None]:
import csv

In [None]:
with open (r"./data/bauwerksliste_aachen.csv") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=';')
    for elem in csv_reader: 
        print(elem)

### CSV and package pandas
It is a little bit easier to load/ parse a csv file by using Python package **pandas**. You just need one line of code and the pandas method **.read_csv()**:

In [None]:
import pandas as pd
import os
import matplotlib.pyplot as plt 
bauwerk_ac = pd.read_csv(r"./data/bauwerksliste_aachen.csv", sep=';', engine='python')
display(bauwerk_ac)

Let us take a look, how a CSV file open in a text editor looks in comparison to a DataFrame in pandas.  
![csv_dataframe](./img/csv_dataframe_show.png)

In the following code cell we access a sepcific group of rows and columns by label(s). Pythons pandas provides a method **.loc[ ]** to work with DataFrames. You could change the label from *Ort* to another headline.

In [None]:
ort = bauwerk_ac.loc[:, 'Ort']
bauwerksart = bauwerk_ac.loc[:, 'Bauwersart']
print(bauwerksart)

One piece of information can also be selected on the basis of another, e.g. buildings are selected on the basis of their associated location. Take a look at the following code cell, you could change the place from *Vetschau* to *Aachen* and see how the output will change.

In [None]:
col = ['Ort', 'Bauwersart']
new = bauwerk_ac[col]
new_tab = new[new["Ort"] == "Aachen"]
print(new_tab)

The code above not just load a CSV file but does some slicing as well. If you change the print output to **Ort** you will get the output for the **Ort** column.  


In [None]:
ort = bauwerk_ac.loc[:, 'Ort']
myset = set(ort)
x = {}
for elem in myset:
    for i in ort:
        if elem == i:
            x[elem] = x.get(elem, 0)+1

fig1, ax1 = plt.subplots()
ax1.pie(x.values(), labels=x.keys(), autopct='%1.1f%%', startangle=90)
ax1.axis('equal')
plt.show()

### Exercise

1. Exercise  
The code above reads a CSV file and filters how often a place is called. This information is visualized in a diagram. The diagram can hardly be read, check out matplotlip and try to visualize the output in different ways.
 
2. Exercise  
Take the CSV file **bauwerksliste_aachen.csv** slice the file and extract information about building year just for buildings with "Ort" = Aachen.

# To-Do: XML

## XML

XML the Extensible Markup Language encodes documents in a format that is readable for human and machine. At least in theory, but you are welcome to try and read a XML file containing above 10000 lines, there are much more interesting things to read. Anyway, XML files are a little harder to work with as the are based on the parent-child prinzip and contain **namespace**. 

As for the other two there are a few XML modules in Python as well.  

1. There is an integrated package calles [xml](https://docs.python.org/3/library/xml.html)  
2. The [ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html) package   
3. **lxml** "toolkit is a Pythonic binding for the C libraries libxml2 and libxslt" and is actually compatible but superior to the ElementTree API. 

Well let's move on and have a look at an XML in Python using ElementTree:

In [None]:
import xml.etree.ElementTree as ET
tree = ET.parse("./data/markante_bauw_wuppert.xml")
html = tree.getroot()
#print(html.tag)
html.attrib
#print(html)
ns = {"gmd": "http://www.isotc211.org/2005/gmd", "gco": "http://www.isotc211.org/2005/gco", "csw" : "http://www.opengis.net/cat/csw/2.0.2"}
for child in html.findall(".//gmd:EX_GeographicBoundingBox", namespaces = ns):
    point = child.find('./gmd:westBoundLongitude/gco:Decimal', namespaces = ns).text
    print(point)
#print(html)

### Exercise
1. Exercise  
Try to get a list of the buildings mentioned in this XML file by using ElementTree. It is possible, that you will have to open the XML and read through it. 
2. Exercise  
Try and write a XML file yourself. We already had a file with museums in DÃ¼sseldorf, let's create one for Aachen as XML by using Python. Search the museums and add for each of them a little summary of their focus.

# Resaving files
A loaded and parsed file does not necessariliy has to stay in it initial format. The DataFrame or GeoDataFrame format allows to resave the data in a different format. Let us take a look at an example.

In [None]:
import geopandas as gpd
example_shape = gpd.read_file("./data/aachen/StatistischeBezirkeAachen.shp")
example_shape.head()

In [None]:
example_shape.to_csv('example_csv.csv')

In [None]:
import pandas as pd
bauwerk_ac = pd.read_csv(r"./data/bauwerksliste_aachen.csv", sep=';', engine='python')
display(bauwerk_ac)

In [None]:
bauwerk_ac.to_json('example_json.json')