# Runbook
A notebook which runs other notebooks using `papermill`. It tries to simulate KNIME, passing around `input_table` and `output_table`.

Run *this* notebook in the `base` environment; the Python 3 kernel. There `papermill` is installed and `ipywidgets` installed and enabled.

*Note:* The other notebooks (KNIME nodes) should run in the `geopandas` virtual environment which has all the packages for geodata processing. `ipykernel` is installed there to create a kernel.

In [1]:
import papermill as pm
import pandas as pd

## Passing data from node to node
It is possible to supply input parameters (as a `dict`) to the notebooks. These parameters have to be JSON serializable and unfortunately pandas data frames are not. 

Data frames should be converted to JSON (`to_json()`). These JSON are passed from cell to cell as `input_json` and `output_json`. Inside the nodes (notebooks) there are some development cells for parsing.

In [3]:
# Generic notebook run
def knime_node(notebook_name, parameters):
    # Execute the notebook
    pm.execute_notebook(
       notebook_name,
       notebook_name,
       parameters = parameters
    )
    
    # Read and return the output table (and a JSON)
    nb = pm.read_notebook(notebook_name)
    output_json = nb.dataframe[nb.dataframe.name == 'output_table']['value'].iat[0]
    output_table = pd.read_json(output_json)
    return (output_json, output_table)

## Source

In [4]:
# First setup the parameters we need
parameters = dict(
    folder = r'/home/ab/Documents/Open-data/shapefiles/shp-provincie',
    filename = r'provincie-grenzen.shp'
)

# Run the KNIME node
(output_json, output_table) = knime_node('Source Node.ipynb', parameters)
output_table.head()

HBox(children=(IntProgress(value=0, max=6), HTML(value='')))




Unnamed: 0,id,provincien,wkt
0,1,Noord-Holland,MULTIPOLYGON (((140119.7695681169 557037.39335...
1,2,Groningen,"MULTIPOLYGON (((214930.61 595370.8100000001, 2..."
10,11,Gelderland,"MULTIPOLYGON (((170028.343 445109.103, 169993...."
11,12,Noord-Brabant,"MULTIPOLYGON (((122802.846 383738.555, 122804...."
2,3,Overijssel,"MULTIPOLYGON (((204118.188 494995.02, 204081.2..."


## Reduce the precision
Geef het aantal decimalen achter de komma aan. De eenheid is in dezelfde eenheid als de projectie.
* RD New: in meters
* WGS84: in graden. In Nederland 1 m ~ 1e-5 graden

In [13]:
# Met RD New, 2 decimalen is cm nauwkeurig
parameters = dict(input_json = output_json, decimals = 2)

#
(output_json, output_table) = knime_node('Reduce Precision Node.ipynb', parameters)
output_table.head()

HBox(children=(IntProgress(value=0, max=8), HTML(value='')))




Unnamed: 0,id,provincien,wkt
0,1,Noord-Holland,"MULTIPOLYGON (((140119.76 557037.39, 140142.83..."
1,2,Groningen,"MULTIPOLYGON (((214930.61 595370.81, 214743.01..."
10,11,Gelderland,"MULTIPOLYGON (((170028.34 445109.10, 169993.08..."
11,12,Noord-Brabant,"MULTIPOLYGON (((122802.84 383738.55, 122804.31..."
2,3,Overijssel,"MULTIPOLYGON (((204118.18 494995.02, 204081.24..."


## Convert to AC

In [17]:
#
parameters = dict(input_json = output_json)

#
(_, output_table) = knime_node('WKT To AC Node.ipynb', parameters)
output_table.head()

HBox(children=(IntProgress(value=0, max=6), HTML(value='')))




Unnamed: 0,id,provincien,wkt,AC
0,1,Noord-Holland,"MULTIPOLYGON (((140119.76 557037.39, 140142.83...","140119.76,557037.39:140142.83,557010.64:140170..."
1,2,Groningen,"MULTIPOLYGON (((214930.61 595370.81, 214743.01...","214930.61,595370.81:214743.01,595404.8:214635...."
10,11,Gelderland,"MULTIPOLYGON (((170028.34 445109.10, 169993.08...","170028.34,445109.10:169993.08,445096.36:169963..."
11,12,Noord-Brabant,"MULTIPOLYGON (((122802.84 383738.55, 122804.31...","122802.84,383738.55:122804.31,383742.16:122807..."
2,3,Overijssel,"MULTIPOLYGON (((204118.18 494995.02, 204081.24...","204118.18,494995.02:204081.24,495019.16:204010..."


## Development below
Mind your step.

In [None]:
output_json

In [None]:
import re
# Match all numbers with more than `decimals` numbers behind the dot
# Replace that match with the captured group, thus removing abundant decimals
input_json = re.sub(r'([0-9]+\.[0-9]{1})[0-9]+', r'\1', output_json)

In [None]:
re.findall(r'[0-9]\.([0-9]*)', input_json)

In [None]:
np.mean([len(decimals) for decimals in re.findall('\.([0-9]*)', wkt)])

In [None]:
output_table.head()

In [None]:
pd.read_json(input_json).head()