# Workshop Part 1: Data processing
---
## Assignments notebook
---

## Contents of the notebook
- Crane datasets
    - Import the Crane datasets
    - Basis analysis of the Crane datasets
    - Visualize the Crane datasets
    - Export the Crane datasets to JSON
- GPSroute datasets
    - Import the GPS-route datasets
    - Basis analysis of the  GPS-route datasets
    - Visualize the GPS-route datasets
    - Export the GPS-route datasets to JSON

## Assignments notebook
Assignments related to the Crane datasets:

1. Find the amount of transmissions related to the Crane: "Frida".
2. Find the names of the columns in the Crane dataset: "Frida".
3. Visualize the Flightpath of the Crane: "Frida", using MatplotLib and Cartopy.
4. Export the dataframe related to the Crane: "Frida" to the JSON file format.

Assignments related to the GPS Route datasets:

5. Find the amount of signals related to the GPS-route: "Zeeland_Car_1".
6. Find the names of the columns in the GPS-route dataset: "Zeeland_Car_1".
7. Visualize the GPS-Route: "Zeeland_Car_1", using MatplotLib en Cartopy.
8. Export the dataframe related to the GPS-route: "Zeeland_Car_1" to the JSON file format

#### NOTE: To run a cell, you have to select the cell and press the Run button at the top of the screen. <br>

#### NOTE 2: For convenience, you can type the first letter of a variable and press TAB to automatically add the     variable 
    


### Importing the required modules.
---

In [None]:
import pandas as pd

import gpxpy

import matplotlib.pyplot as plt

import cartopy

import cartopy.crs as ccrs 

import cartopy.feature as cfeature 

# Crane datasets

---
### Importing the Crane Datasets


The Crane datasets come in the fileformat: "CSV" and can be found in the folder: '../../Datasets/CSV/'.
To read a datasets with the fileformat CSV, in a Pandas dataframe, we use the built-in Pandas function: "read_csv()".

As parameter we pass the file location of the CSV file we want to read. We do this for each Crane dataset in the folder: '../../Datasets/CSV/'. We assign the dataframe to a fitting variable. This variable represents the name of the Crane related to the dataset.

---

In [None]:
Agnetha = pd.read_csv('../../Datasets/CSV/20181003_Dataset_SV_GPS_Crane_9407_STAW_Crane_RRW-BuGBk_Agnetha.csv')
Frida = pd.read_csv('../../Datasets/CSV/20181003_Dataset_SV_GPS_Crane_9381_STAW_Crane_RRW-BuGBk_Frida.csv')
Cajsa = pd.read_csv('../../Datasets/CSV/20181003_Dataset_SV_GPS_Crane_9472_STAW_Crane_RRW-BuGR_Cajsa.csv')


---
### Basis analyses of the Crane datasets.

Below we are going to perform a basic analysis on each of the Crane datasets. After performing the analysis we can answer the following questions:
- How big is each dataset (How much transmissions)?
- How much columns does each dataset have?
- Which columns does each dataset have?
- What do these columns represent?

---
First we would like to know the amount of transmissions belonging to each Crane. The amount of transmissions is equal to the amount of datarows in the dataframes, representing the Cranes. 

To find out how much datarows a dataframe contains, we use the built-in Pandas function: ".shape" on the dataframe in question. This function prints the dimensionality of a dataframe. 

The dimensionality consists of:
- The amount of datarows, on index 0 (.shape[0])
- The amount of columns, on index 1 (.shape[1])

The line below prints the amount of tranmissions belonging to the Crane: "Agnetha".

In [None]:
Agnetha.shape[0]

#### Assignment 1: Find the amount of transmissions related to the Crane: "Frida".

In [None]:
#TODO

The line below prints the amount of tranmissions belonging to the Crane: "Casja".

In [None]:
Cajsa.shape[0]

---

Now we want to find out the amount of columns, their names and their datatypes of each Crane dataset. <br>
The line below prints the column names and their datatypes belonging to the Crane:"Agnetha" 

In [None]:
Agnetha.dtypes

#### Assignment 2 : Find the names of the columns in the Crane dataset: "Frida".

In [None]:
#TODO

The line below prints the column names and their datatypes belonging to the Crane: "Cajsa"

In [None]:
Cajsa.dtypes

---

Now we want to create a simple visualization of the Crane Datasets.<br>
For this we are going to use the Python libraries MathPlotLib and CartopyNOTE: The first time you run the cell below, some files will be downloaded. Don't worry about this and please be patient.

In [None]:
'''
Below we create a new plot using MathPlotLib. 
We pass the a size of the figure as parameter. 
'''
plt.figure(figsize = (20, 12))

'''
Below we create a new Catopy map. 

We pass the projection of the Cartopy map as parameter. 
The projection we are going to use is called: "PlatteCarree". The crs stands for: "Coordinate Reference system".
The type of CRS used in the Cartopy map defines the way the map will be shown. PlatteCarre uses
equirectangular projection (North Latitude and East Longitude). 

For more info related to this type of projection you should visit the URL:
https://en.wikipedia.org/wiki/Equirectangular_projection

We assign the instance of the plot to a variable called: "cartopyMapCranes".
'''
cartopyMapCranes = plt.axes(projection=ccrs.PlateCarree())

'''
Below we add the coastal lines to the cartopy map. We pass the resolution: "10m" as parameter. This value defines
the maximum deviation the coastal line can have. The higher the value, the higher the devation of the correct
location of the lines.
'''
cartopyMapCranes.coastlines(resolution='10m')

'''
Below we add the landsurface to the Cartopy map.
We give the landsurface (face) the color white. 
We give the edges of the landsurface (edge) te color black.
'''
cartopyMapCranes.add_feature(cartopy.feature.LAND.with_scale('10m'), edgecolor='black', facecolor = "white")
'''
Below we add the lakes to the cartopy map. 
We give the edges of the lake the color black.
'''
cartopyMapCranes.add_feature(cfeature.LAKES.with_scale('10m'), edgecolor = 'black')

'''
Below we add the sea surface to the Cartopy Map
'''
cartopyMapCranes.add_feature(cfeature.OCEAN) 

'''
Below we add the rivers to the Cartopy Map
'''
cartopyMapCranes.add_feature(cfeature.RIVERS.with_scale('10m')) 

'''
Below we add the borders to the Cartopy Map
'''
cartopyMapCranes.add_feature(cfeature.BORDERS.with_scale('10m'))

'''
Now we want to create a dataplot which shows the flightpaths of the Cranes. We plot this data on the CartopyMap,
which we created above, using MathPlotLib. The plot we are going to create is called a scatter plot. This is one 
of the many types of MathPlotLib plots.  

For more information regarding the types of MathPlotLib plots, visit the following URL:
https://matplotlib.org/3.1.1/tutorials/introductory/sample_plots.html

To plot a dataset we first have to declare on which instance of a CartopyMap we want to plot the data.
Next we have to declare the type of plot which we want to create.
Finally we need to pass values to the instance of the plot. These values are as follows:
1) The Longitude coordinates (The name of the column that contains this value).
2) The Latitude coordinates (The name of the column that contains this value).
3) The color which the datapoints are going to have.
4) This size which the datapoints are going to have (The radius).

The line of code below is used to plot the Flightpath of the Crane:"Agnetha" on the CartopyMap.
The following applies to this line of code:
1) First we declare the instance of the cartopymap which is the variable: "cartopyMapCranes".
2) Next we want to declare the type of MathPlotLib plot that is going to be created, which is a scatter plot.
3) Next we want to declare the Dataframe and name of the column representing the Longitude Coordinates of the 
   Crane. The dataframe, containing the data of the Crane: "Agnetha", is called: "Agnetha". 
   The Longitude column is called: "location-long". 
4) Then we want to do the same for the Latitude Coordinates.
5) Next we want to declare the color of the datapoints, which is red in this case. 
6) Finally we want to set the size of the datapoints, which is 1 in this case.
'''
cartopyMapCranes.scatter(Agnetha['location-long'], Agnetha['location-lat'],color="red", s = 1)


# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#                                                                                       #
#    ASSIGNMENT 3: Visualize the Flightpath of the Crane: "Frida" on the Cartopy Map    #
#                                                                                       #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 

#TODO

'''

The line of code below is used to plot the Flightpath of the Crane:"Cajsa" on the CartopyMap.
The following applies to this line of code:
1) First we declare the instance of the cartopymap which is the variable: "cartopyMapCranes".
2) Next we want to declare the type of MathPlotLib plot that is going to be created, which is a scatter plot.
3) Next we want to declare the Dataframe and name of the column representing the Longitude Coordinates of the 
   Crane. The dataframe, containing the data of the Crane: "Cajsa", is called: "Cajsa". 
   The Longitude column is called: "location-long". 
4) Then we want to do the same for the Latitude Coordinates.
5) Next we want to declare the color of the datapoints, which is green in this case. 
6) Finally we want to set the size of the datapoints, which is 1 in this case.
'''
cartopyMapCranes.scatter(Cajsa['location-long'], Cajsa['location-lat'],color="green", s = 1)

---
Now that we know what the Flightpaths of the Cranes look like, we want to export the Dataframe to the JSON 
file format. 

To export a dataframe to JSON, we use the built-in Pandas function: "to_json()" on the dataframe.<br>
First you need to declare the dataframe that you want to export to JSON. <br>
Then you call the function: ".to_json()" on the dataframe.<br>

In this function we need to pass the location on which we want to save the JSON file and the orientation in which
the records need to be written to the file. 
 <br>

---
The line below exports the dataframe of the Crane: "Agnetha" to the file format JSON. <br>
The dataframe on which we are going to call the function: ".to_json()".<br>
De values wich we are going to pass in this function are as follows: 
- The file location in which we are going to save the file is: ../../Datasets/JSON/
- The name of the file is going to be: Crane-Agnetha.json
- The orientation in which the records are going to written to the file is: records (which makes the data human readable)

In [None]:
Agnetha.to_json('../../Datasets/JSON/Crane-Agnetha.json',orient = 'records')

#### Assignment 4: Export the dataframe related to the Crane: "Frida" to the JSON file format.

In [None]:
#TODO

---
The line below exports the dataframe of the Crane: "Cajsa" to the file format JSON. <br>
The dataframe on which we are going to call the function: ".to_json()".<br>
De values wich we are going to pass in this function are as follows: 
- The file location in which we are going to save the file is: ../../Datasets/JSON/
- The name of the file is going to be: Crane-Cajsa.json
- The orientation in which the records are going to written to the file is: records (which makes the data human readable)

In [None]:
Cajsa.to_json('../../Datasets/JSON/Crane-Cajsa.json',orient = 'records')

---

---

## End Assignment 1 to 4
### you should go back to the presentation for information related to GPX datasets.

---

---

---
# GPS-route datasets

---
### Below a Generic function is defined to read GPX files in a dataframe. 
#### NOTE: The way this is done is described in the complete GeoStack Course. 
---

In [None]:
def create_dataframe(df):
    data = df.tracks[0].segments[0].points
    df = pd.DataFrame(columns=['lon', 'lat', 'alt', 'time'])

    for point in data:
        df = df.append({'lon' : point.longitude,
                        'lat' : point.latitude, 
                        'alt' : point.elevation,
                        'time': point.time}, ignore_index=True)
    return df

---
### Importing the GPS-route datasets

The GPS-route datasets come in the file format GPX and can be found in the folder: '../../Datasets/GPX/'. 
To read a GPX dataset in a Pandas dataframe, the Python library: "gpxpy" is used. This library is created for the processing of GPX data. For more info related to gpxpy, visit the URL: <br>

https://github.com/tkrajina/gpxpy

To read the GPX dataset we us the syntax: gpxpy.parse(). As parameter we pass the file location of the GPX file we want to read. We do this for each GPX dataset and assign the result to a fitting variable. 

---

In [None]:
Biesbosch = create_dataframe(gpxpy.parse(open('../../Datasets/GPX/Biesbosch.gpx', 'r')))
Zeeland_Car_1 = create_dataframe(gpxpy.parse(open('../../Datasets/GPX/Zeeland_Car_1.gpx', 'r')))
Zeeland_Car_2 = create_dataframe(gpxpy.parse(open('../../Datasets/GPX/Zeeland_Car_2.gpx', 'r')))


---
### Basic analysis of the GPS-route datasets

Below we perform a basic analysis on the GPS-route datasets. 



Below we are going to perform a basic analysis on each of the Crane datasets. After performing the analysis we can answer the following questions:
- How big is each dataset (How much Signals in each Route)?
- How much columns does each dataset have?
- Which columns does each dataset have?
- What do these columns represent?

In part 2 of the workshop we are going to create data models using the answers to this question.

---
The line below prints the amount of Signals belonging to the GPS-Route: "Biesbosch". As mentioned above, we call the function: ".shape" on the dataframe representing the Biesbosch route. 

In [None]:
Biesbosch.shape[0]

#### Assignment 5 : Find the amount of signals related to the GPS-route: "Zeeland_Car_1"

In [None]:
#TODO

The line below prints the amount of Signals belonging to the GPS-Route: "Zeeland_Car_2". As mentioned above, we call the function: ".shape" on the dataframe representing the Zeeland Car 2 route. 

In [None]:
Zeeland_Car_2.shape[0]

---
Now we want to find out the amount of columns, their names and their datatypes of each GPS-Route. <br>

---
The line below prints the column names and their datatypes belonging to the Route:"Biesbosch" 

In [None]:
Biesbosch.dtypes

#### Assignment 6 : Find the names of the columns in the GPS-route dataset: "Zeeland_Car_1".

In [None]:
#TODO

The line below prints the column names and their datatypes belonging to the Route:"Zeeland_Car_2"

In [None]:
Zeeland_Car_2.dtypes

---
Now we want to create a simple visualization of the GPS-Route Datasets.<br>
For this we are going to use the Python libraries MathPlotLib and Cartopy

In [None]:
'''
First we create a new Cartopy map called:"cartopyMapRoutes".
In case you forgot what the code below does, you can always go back to assigment 3.
'''
plt.figure(figsize = (20, 12))
cartopyMapRoutes = plt.axes(projection=ccrs.PlateCarree())
cartopyMapRoutes.coastlines(resolution='10m')
cartopyMapRoutes.add_feature(cartopy.feature.LAND.with_scale('10m'), edgecolor='black', facecolor = "white")
cartopyMapRoutes.add_feature(cfeature.LAKES.with_scale('10m'), edgecolor = 'black')
cartopyMapRoutes.add_feature(cfeature.RIVERS.with_scale('10m')) 
cartopyMapRoutes.add_feature(cfeature.BORDERS.with_scale('10m'))


'''
Now we want to plot the GPS-Routes on the new Cartopy Map.
Just like with the Crane Flightpaths, we are going to create scatter plots and add them to the CartopyMap called:
"cartopyMapRoutes".

The line below adds the GPS-Route:"Biesbosch" to the Cartopy Map. 
The following applies to this line of code:
1) First we declare the instance of the cartopymap which is the variable: "cartopyMapRoutes".
2) Next we want to declare the type of MathPlotLib plot that is going to be created, which is a scatter plot.
3) Next we want to declare the Dataframe and name of the column representing the Longitude Coordinates of the 
   GPS-Route. The dataframe, containing the data of the route: "Biesbosch", is called: "Biesbosch". 
   The Longitude column is called: "lon". 
4) Then we want to do the same for the Latitude Coordinates.
5) Next we want to declare the color of the datapoints, which is red in this case. 
6) Finally we want to set the size of the datapoints, which is 1 in this case.
'''
cartopyMapRoutes.scatter(Biesbosch['lon'],Biesbosch['lat'],color="red", s = 1)


# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#                                                                                       #
#    ASSINGMENT 7: Visualize the GPS-Route: "Zeeland_Car_1" on the Cartopy Map.      #
#                                                                                       #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 

#TODO

'''
The line below adds the GPS-Route:"Biesbosch" to the Cartopy Map. 
The following applies to this line of code:
1) First we declare the instance of the cartopymap which is the variable: "cartopyMapRoutes".
2) Next we want to declare the type of MathPlotLib plot that is going to be created, which is a scatter plot.
3) Next we want to declare the Dataframe and name of the column representing the Longitude Coordinates of the 
   GPS-Route. The dataframe, containing the data of the route: "Zeeland_Car_2", is called: "Zeeland_Car_2". 
   The Longitude column is called: "lon". 
4) Then we want to do the same for the Latitude Coordinates.
5) Next we want to declare the color of the datapoints, which is green in this case. 
6) Finally we want to set the size of the datapoints, which is 1 in this case.
'''
cartopyMapRoutes.scatter(Zeeland_Car_2['lon'],Zeeland_Car_2['lat'],color="green", s = 1)

---
Now that we know what the Routes of the GPS-Routes look like, we want to export the Dataframes to the JSON 
file format. 

As mentioned above:

To export a dataframe to JSON, we use the built-in Pandas function: "to_json()" on the dataframe.<br>
First you need to declare the dataframe that you want to export to JSON. <br>
Then you call the function: ".to_json()" on the dataframe.<br>

In this function we need to pass the location on which we want to save the JSON file and the orientation in which
the records need to be written to the file. 
 <br>
 
 ---
The line below exports the dataframe of the GPS-Route: "Biesbosch" to the file format JSON. <br>
The dataframe on which we are going to call the function: ".to_json()".<br>
De values wich we are going to pass in this function are as follows: 
- The file location in which we are going to save the file is: ../../Datasets/JSON/
- The name of the file is going to be: Route-Biesbosch.json
- The orientation in which the records are going to written to the file is: records (which makes the data human readable)

In [None]:
Biesbosch.to_json('../../Datasets/JSON/Route-Biesbosch.json',orient = 'records')

#### Assignment 8 : Export the dataframe related to the GPS-route: "Zeeland_Car_1" to the JSON file format

In [None]:
#TODO

The line below exports the dataframe of the GPS-Route: "Zeeland Car 2" to the file format JSON. <br>
The dataframe on which we are going to call the function: ".to_json()".<br>
De values wich we are going to pass in this function are as follows: 
- The file location in which we are going to save the file is: ../../Datasets/JSON/
- The name of the file is going to be: Route-Zeeland_Car_2.json
- The orientation in which the records are going to written to the file is: records (which makes the data human readable)

In [None]:
Zeeland_Car_2.to_json('../../Datasets/JSON/Route-Zeeland_Car_2.json',orient = 'records')