# Introduction to Serialization formats 
## JSON
Let's talk about JSON - **J**ava**S**cript **O**bject **N**otation. It is a data-interchange format written in plain test. Furthermore, it is used to store and send data between computers (machine and human-readable) and therefore language independent.  
The JSON format is similar to the code for creating JavaScript objects. Therefore, a JSON data can easily be converted into JavaScript objects by using a JavaScript program. But we are going to talk about JSON and Python.

There are a few packages in Python for working with JSON files, let us have a look at two of them.  

1. One package is called [json](https://docs.python.org/3/library/json.html) 
2. Then there is [pandas](https://pandas.pydata.org/), you will see pandas can read a lot of formats, as it is a package specialized for data analysis. 

### JSON and package JSON 
The following code shows how to read a JSON file (to a dictionary) using Python package **JSON**:

In [None]:
#import JSON using the JSON package
import json
with open(r'./data/markante_bauw.json') as jsonFile:
    jsonObject = json.load(jsonFile)
    
print(jsonObject)

In the first code cell, we have loaded the JSON file with the python package JSON. In the following, we access specific information in this file. In this case, it is the *id*. Using the ID we get the first element which is *grillplaetze.1*.

In [None]:
features = jsonObject['result']
ids = jsonObject['result']['records']
print(ids)

You can convert Python to JSON. Python objects are converted into JSON as the following table shows:

|Python    |JSON    |
|:---      |    ---:|
|dict      |Object  |
|list      |Array   |
|tuple     |Array   |
|str       |String  |
|int       |Number  |
|float     |Number  |
|True      |true    |
|False     |false   |
|None      |null    |

In [None]:
print(json.dumps({"city":"Aachen", "Uni":"RWTHAachen"}))
print(json.dumps(True))
print(json.dumps(42))

In [None]:
test = {
    "city":"Aachen",
    "Uni":"RWTHAachen"
}
new = json.dumps(test)
print(new)

### JSON and package pandas
Another possibility to read a JSON file is offered by python **GeoPandas**. Here, the read information is transferred into a DataFrame. We print the first 10 rows of the DataFrame to the screen. The following code shows how to read a JSON file (to a DataFrame) using Python package **pandas**:

In [None]:
#import JSON using Pandas
import pandas as pd
#loading the file prepared in data
grillpl = pd.read_json(r'./data/markante_bauw.json')
grillpl.head(10)

Let us take a look, how a JSON file open in a text editor looks in comparison to a DataFrame in pandas.  
<div>
<img src="./img/json_fig.png" width="1000"/>
</div>

When loading a JSON file, it is important to check the file itself. There is a connection between the files structure and the DataFrame. To get an idea of the connection, we provided a JSON file opened in a text editor and opened using Python Pandas.  
One thing you will see, when turning to the geodata formats, is the difference between loading the files. For a normal JSON Pandas is absolute fine, but to meet the requirements of a GeoJSON, you will need GeoPandas. 

### Exercise
Now it is your turn. Take the JSON file called "markante_bauw.json". Load the file and extract the building height and number of floors. After extracting the data, use the **mathplotlip** library and visualize your data in dependence as a graph. 

## CSV
The CSV format (comma separated values) describes a text file for the exchange of structured data. As the name already says, the information is separated by a specific separator, mostly **;**. 

How to work with CSV in Python, well there are lots of possibilities: 

1. The already integrated [csv](https://docs.python.org/3/library/csv.html) module 
2. Another package you can use to read CSV files is [numpy](https://numpy.org/doc/stable/user/how-to-io.html?highlight=csv)
3. The one we use here is [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)  

So you see, we are not really bounded to a specific library, well at least not as long as we are using basic operations. SO let's have a look how to actually load a CSV file. If you load the file you, have got lots of options to add. To get an overview, have a look in the documentation. Run the following code to see what happens.

### CSV and package CSV
First, let us have a look, how to load a CSV file by using the Python package **csv**:

In [None]:
import csv

In [None]:
with open (r"./data/bauwerksliste_aachen.csv") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=';')
    for elem in csv_reader: 
        print(elem)

### CSV and package pandas
It is a little easier to load/ parse a CSV file by using Python package **pandas**. You just need one line of code and the Pandas method **.read_csv()**:

In [None]:
import pandas as pd
import os
import matplotlib.pyplot as plt 
bauwerk_ac = pd.read_csv(r"./data/bauwerksliste_aachen.csv", sep=';', engine='python')
display(bauwerk_ac)

Let us take a look, how a CSV file open in a text editor looks in comparison to a DataFrame in pandas.  
<div>
<img src="./img/csv_dataframe_show.png" width="1250"/>
</div>

In the following code cell, we access a specific group of rows and columns by label(s). Pythons pandas provide a method **.loc[ ]** to work with DataFrames. You could change the label from *Ort* to another headline.

In [None]:
ort = bauwerk_ac.loc[:, 'Ort']
bauwerksart = bauwerk_ac.loc[:, 'Bauwersart']
print(bauwerksart)

One piece of information can also be selected on the basis of another, e.g. buildings are selected on the basis of their associated location. Take a look at the following code cell, you could change the place from *Vetschau* to *Aachen* and see how the output will change.

In [None]:
col = ['Ort', 'Bauwersart']
new = bauwerk_ac[col]
new_tab = new[new["Ort"] == "Aachen"]
print(new_tab)

The code above not just load a CSV file, but does some slicing as well. If you change the print output to **Ort** you will get the output for the **Ort** column.  

In [None]:
ort = bauwerk_ac.loc[:, 'Ort']
myset = set(ort)
x = {}
for elem in myset:
    for i in ort:
        if elem == i:
            x[elem] = x.get(elem, 0)+1

fig1, ax1 = plt.subplots()
ax1.pie(x.values(), labels=x.keys(), autopct='%1.1f%%', startangle=90)
ax1.axis('equal')
plt.show()

### Exercise
1. Exercise 
The code above reads a CSV file and filters how often a place is called. This information is visualized in a diagram. The diagram can hardly be read, check out Matplotlip and try to visualize the output in different ways.
 
2. Exercise 
Take the CSV file **bauwerksliste_aachen.csv** slice the file and extract information about building year just for buildings with "Ort" = Aachen.

## XML
XML the Extensible Markup Language encodes documents in a format that is readable for human and machine. At least in theory, but you are welcome to try, and read an XML file containing above 10000 lines, there are much more interesting things to read. Anyway, XML files are a little harder to work with as they are based on the parent-child principle and contain **namespace**. 

As for the other two, there are a few XML modules in Python as well.  

1. There is an integrated package called [xml](https://docs.python.org/3/library/xml.html)  
2. The [ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html) package   
3. **lxml** "toolkit is a Pythonic binding for the C libraries libxml2 and libxslt" and is actually compatible but superior to the ElementTree API. 

Well, let's move on and have a look at an XML in Python using *ElementTree*:

In [None]:
import xml.etree.ElementTree as ET

In [None]:
tree = ET.parse("./data/markante_bauw_wuppert.xml")
html = tree.getroot()
html

You just saw the output, and it hardly says anything about the information. Well, you need to understand the XML file and how it is set up. What you get when asking for the root element is exactly that, the very first element, every other element is indented and therefore a child element. Let us have a look at the actual XML file. Check, if you can find the output from above with the following part of the XML file, can you find it?

Let us use some methods to get specific information. What you see in the curly braces is the namespace, the word afterwards is the first tag's name, compare with the example part above.

In [None]:
html.tag

In [None]:
for elem in html:
    print(elem.tag, elem.attrib)

If you want to get some specific value out of that XML file, and you do not want to dog through the child elements, use the *.findall* method and the right tag names. That can be tricky, so the best thing to do is, take the XML file and open it using a good text editor (Visual Studio Code or Notebad++) and read it. Yes, an XML file is meant to be read by humans, so if you are board, feel free to try. Usually nobody wants to read a few thousand lines of XML, but you need to get an idea about how it is structured to use it in Python.  
What we do next is we extract a specific coordinate value.

In [None]:
#Namespace
ns = {"gmd": "http://www.isotc211.org/2005/gmd", "gco": "http://www.isotc211.org/2005/gco", "csw" : "http://www.opengis.net/cat/csw/2.0.2"}
#Getting value of bounding box
for child in html.findall(".//gmd:EX_GeographicBoundingBox", namespaces = ns):
    point = child.find('./gmd:westBoundLongitude/gco:Decimal', namespaces = ns).text
    print(point)

### Exercise
1. Exercise 
Try to get a list of the buildings mentioned in this XML file by using ElementTree. It is possible, that you will have to open the XML and read through it. 
2. Exercise  
Try to write an XML file yourself. We already had a file with museums in Düsseldorf, let's create one for Aachen. Search the museums and add for each of them a little summary of their focus.  

# Resaving files
A loaded and parsed file does not necessarily have to stay in it initial format. The DataFrame or GeoDataFrame format allows resaving the data in a different format. Let us take a look at an example.

In [None]:
import geopandas as gpd
example_shape = gpd.read_file("./data/aachen/StatistischeBezirkeAachen.shp")
example_shape.head()

In [None]:
example_shape.to_csv('example_csv.csv')

In [None]:
import pandas as pd
bauwerk_ac = pd.read_csv(r"./data/bauwerksliste_aachen.csv", sep=';', engine='python')
display(bauwerk_ac)

In [None]:
bauwerk_ac.to_json('example_json.json')