**Forewords**

- This script is a [Jupyter notebook](https://jupyter.org/), mixing Markdown narratives with Python code chunks
	- Python is the 🔝 most popular programming language in the world and the backbone of Arches
- This script is hosted on GitHub, it is mirrored here on [Google colab](https://colab.google/): each changes made on the GitHub script will appear here
	- the script can also be download and executed as an independant copy
- Google colab platform offer free space and computing time for online collaborative development of Jupyter scripts
	- another option could be [MyBinder](https://mybinder.org/)

The purpose of this presentation is:
1. show the join use of Jupyter/Python and GitHub to query the DB for users having intermediate skills in IT (or Arches)
2. collect ideas on Enhanced record minimum standard (ERMS)

# Enhanced record minimum standard compliance of Heritage Places

Enhanced record minimum standard (ERMS) is the minimum standard of data enhancement for heritage places. The report of Heritage Places ERMS is done downstream, once the heritages places (HP) have been recorded in the database 

### Import libraries

In [4]:
import psycopg2 as pg
import pandas as pd
import numpy as np
import re
import requests
import json
import ipywidgets as widgets
from IPython.display import display
import matplotlib.pyplot as plt
import plotly.express as px

### Constants

Load:
- the UUID of HP in its resource model (RM)
- the read-only user `eamenar` parameters (see: [creating-a-read-only-user](https://github.com/eamena-project/eamena-arches-dev/tree/main/dev/postgres#creating-a-read-only-user)) on the training EAMENA instance
- ...

The GeoJSON url and query

In [5]:
GEOJSON_URL = "https://database.eamena.org/api/search/export_results?paging-filter=1&tiles=true&format=geojson&reportlink=false&precision=6&total=326&language=en&advanced-search=%5B%7B%22op%22%3A%22and%22%2C%2234cfea78-c2c0-11ea-9026-02e7594ce0a0%22%3A%7B%22op%22%3A%22~%22%2C%22lang%22%3A%22en%22%2C%22val%22%3A%22Sistan%22%7D%2C%2234cfea87-c2c0-11ea-9026-02e7594ce0a0%22%3A%7B%22op%22%3A%22%22%2C%22val%22%3A%22%22%7D%7D%2C%7B%22op%22%3A%22or%22%2C%2234cfea69-c2c0-11ea-9026-02e7594ce0a0%22%3A%7B%22op%22%3A%22%22%2C%22val%22%3A%22%22%7D%2C%2234cfea73-c2c0-11ea-9026-02e7594ce0a0%22%3A%7B%22op%22%3A%22%22%2C%22val%22%3A%22%22%7D%2C%2234cfea43-c2c0-11ea-9026-02e7594ce0a0%22%3A%7B%22op%22%3A%22%22%2C%22val%22%3A%224ed99706-2d90-449a-9a70-700fc5326fb1%22%7D%2C%2234cfea5d-c2c0-11ea-9026-02e7594ce0a0%22%3A%7B%22op%22%3A%22%22%2C%22val%22%3A%22%22%7D%2C%2234cfea95-c2c0-11ea-9026-02e7594ce0a0%22%3A%7B%22op%22%3A%22~%22%2C%22lang%22%3A%22en%22%2C%22val%22%3A%22%22%7D%7D%5D&resource-type-filter=%5B%7B%22graphid%22%3A%2234cfe98e-c2c0-11ea-9026-02e7594ce0a0%22%2C%22name%22%3A%22Heritage%20Place%22%2C%22inverted%22%3Afalse%7D%5D&map-filter=%7B%22type%22%3A%22FeatureCollection%22%2C%22features%22%3A%5B%7B%22id%22%3A%22e84886109295dcb2d515f9ab792832bf%22%2C%22type%22%3A%22Feature%22%2C%22properties%22%3A%7B%22buffer%22%3A%7B%22width%22%3A10%2C%22unit%22%3A%22m%22%7D%2C%22inverted%22%3Afalse%7D%2C%22geometry%22%3A%7B%22coordinates%22%3A%5B%5B%5B61.5629662657594%2C31.341070427554456%5D%2C%5B61.39269902363566%2C31.226740239181964%5D%2C%5B61.52316353383432%2C30.977760218239922%5D%2C%5B61.773036239808164%2C30.92940344148805%5D%2C%5B61.89244443558445%2C31.037461248216815%5D%2C%5B61.933352798951745%2C31.22484931983834%5D%2C%5B61.5629662657594%2C31.341070427554456%5D%5D%5D%2C%22type%22%3A%22Polygon%22%7D%7D%5D%7D"

Verbose

In [23]:
verbose = False

In [31]:
resp = requests.get(GEOJSON_URL)
hps = resp.json()

ℹ️ cells are editable, for example `verbose` can be changed to `True`

## Heritage place selection

Loop through the list

In [68]:
verbose = True
selected_hp = []
for i in range(len(hps['features'])):
	selected_hp.append(hps['features'][i]['properties']['EAMENA ID'])
if verbose:
	print("first HPs:")
	print(selected_hp[:5])

first HPs:
['EAMENA-0192340', 'EAMENA-0192357', 'EAMENA-0182044', 'EAMENA-0182044', 'EAMENA-0182035', 'EAMENA-0182038', 'EAMENA-0192567', 'EAMENA-0231502', 'EAMENA-0192442', 'EAMENA-0192441']


## Heritage places field with their UUIDs

Read the [erms-template-readonly.tsv](https://github.com/eamena-project/eamena-arches-dev/blob/main/dev/data_quality/erms-template-readonly.tsv) file (see: [README.md](https://github.com/eamena-project/eamena-arches-dev/tree/main/dev/data_quality#erms)). Will only show the complete rows (drop `NA`)

In [69]:
tsv_file = "https://raw.githubusercontent.com/eamena-project/eamena-arches-dev/main/dev/data_quality/erms-template-readonly.tsv"
df = pd.read_csv(tsv_file, delimiter = '\t')
df = df[["level1", "level2", "level3", "uuid_sql", "Enhanced record minimum standard"]]
df_listed = df.dropna()
if verbose:
    print(df_listed.to_markdown())

|    | level1                    | level2                            | level3                                             | uuid_sql                             | Enhanced record minimum standard                                                                                                                         |
|---:|:--------------------------|:----------------------------------|:---------------------------------------------------|:-------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|
|  1 | ASSESSMENT SUMMARY        | ASSESSMENT ACTIVITY               | Investigator Role Type                             | d2e1ab96-cc05-11ea-a292-02e7594ce0a0 | Yes                                                                                                                                                      |
|  5 | ASSESSMENT SUMMARY        | ASSESSMENT 

ℹ️ pros/cons for TSV and XLSX structures:
	- TSV is automatically rendered on GitHub + search; it is a plain text format
	- XLSX is easy to edit (filter, sort, conditional formating)

Select the level of aggregation (`level1`, `level2` or `level3`) on which the spider plot will be done

In [70]:
options=['level1', 'level2', 'level3']
radio_button = widgets.RadioButtons(
    options=options,
    description='Select an option:'
)
display(radio_button)


RadioButtons(description='Select an option:', options=('level1', 'level2', 'level3'), value='level1')

Plot the ERMS dataframe for this level

In [222]:
# mylevel = 'level3'
mylevel = radio_button.value
df_erms = df_listed.copy()
df_erms['Enhanced record minimum standard'] = df_erms['Enhanced record minimum standard'].str.contains(r'Yes', case = False, na = False, regex = True).astype(int)
df_erms = df_erms[[mylevel, "Enhanced record minimum standard"]]
df_erms.columns.values[0] = "field"
df_erms = df_erms.groupby(['field'])['Enhanced record minimum standard'].sum()
print(f'You selected: {mylevel}')
df_erms = pd.DataFrame({
	'field': df_erms.index,
	'value' : df_erms.values
				  })
print(df_erms.to_markdown(index=False))

You selected: level3
| field                                              |   value |
|:---------------------------------------------------|--------:|
| Cadastral Reference                                |       0 |
| Cultural Period Certainty                          |       1 |
| Damage Extent Type                                 |       0 |
| Designation                                        |       0 |
| Designation From Date                              |       0 |
| Designation To Date                                |       0 |
| Disturbance Cause Assignment Assessor Name - Actor |       0 |
| Disturbance Cause Category Type                    |       1 |
| GE Imagery Acquisition Date                        |       1 |
| General Description                                |       0 |
| General Description Type                           |       0 |
| Geometry Extent Certainty                          |       0 |
| Grid ID                                            |       1 |
| He

Gather data from the HP and ERMS (creates an empty dataframe, loop over UUIDs to collect data from the selected HP {{selected_hp}}, and fill the empty dataframe)

In [None]:
verbose = True
level_values = df_listed[mylevel].unique()
l_erms = []
frames = {} 
# len(selected_hp)
for i in range(5):
    a_hp = selected_hp[i]
    if verbose:
        print("read: " + a_hp)
    # create an empty df
    df_res = pd.DataFrame({'field': level_values, 
                           'recorded': np.repeat(0, len(level_values)).tolist()})
    # len(df_res)
    for j in range(len(df_res)):
        a_field = df_res.iloc[j]["field"]
        try:
            a_value = hps['features'][i]['properties'][a_field]
            if verbose:
                print("field: '" + a_field + "' has value :'" + str(a_value) + "'")
        except:
            if verbose:
                print(" /!\ '" + a_field + "' listed in the ERMS dataframe is not a field listed in the database")
        if a_value is not None:
            # row_num = df_res[df_res['field'] == df_field].index.tolist()
            df_res.at[j, 'recorded'] = df_res.loc[j]['recorded'] + 1
    l_erms.append(df_res)
    frames[a_hp] = df_res

In [81]:
frames.keys()
frames['EAMENA-0192357']

Unnamed: 0,field,recorded
0,Investigator Role Type,1
1,GE Imagery Acquisition Date,1
2,Resource Name,0
3,Name Type,0
4,Heritage Place Type,1
5,General Description Type,1
6,General Description,1
7,Heritage Place Function,1
8,Designation,0
9,Designation From Date,0


## Spider diagram

Show spider diagram with number of fields recorded. If `level3` has been selected, the spider plot will also plot the ERMS. 

In [84]:
import math
ncol = 3
nrow = math.ceil(len(frames.keys()) / ncol)

2

In [90]:
frames.keys()

dict_keys(['EAMENA-0192340', 'EAMENA-0192357', 'EAMENA-0182044', 'EAMENA-0182035'])

In [130]:
import plotly.express as px
df = px.data.wind()
# fig = px.scatter_polar(df, r="frequency", theta="direction")
colors = {'recorded': 'blue', 'Enhanced record minimum standard': 'red'}
fig = px.scatter_polar(melted_df, r='Value', theta='field', color = 'Value Set', color_discrete_map = colors)
fig.show()

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import math

ncol = 3
nrow = math.ceil(len(frames.keys()) / ncol)

fig = make_subplots(rows=nrow, cols=ncol, specs=[[{"type": "Polar"},{"type": "Polar"}]])
fig.add_trace(go.Scatterpolar(
                  r = melted_df['Value'],
                  theta = melted_df['field'],
                  fill = 'toself',
                  marker_color='rgb(47,138,196)',
                  name = "Player_data"), row=1, col=1)
fig.show()
# # fig = px.scatter_polar(df, r="frequency", theta="direction")
# colors = {'recorded': 'blue', 'Enhanced record minimum standard': 'red'}
# fig_polar = px.scatter_polar(melted_df, r='Value', theta='field', color = 'Value Set', color_discrete_map = colors)
# fig.add_trace(fig_polar, row=1, col=1)
# # fig = px.scatter_polar(melted_df, r='Value', theta='field', color = 'Value Set', color_discrete_map = colors)
# fig.show()

In [187]:
melted_df

Unnamed: 0,field,Value Set,Value
55,Related Detailed Condition Resource,Enhanced record minimum standard,0
29,GE Imagery Acquisition Date,Enhanced record minimum standard,1
30,Resource Name,Enhanced record minimum standard,0
31,Name Type,Enhanced record minimum standard,0
32,Heritage Place Type,Enhanced record minimum standard,0
33,General Description Type,Enhanced record minimum standard,0
34,General Description,Enhanced record minimum standard,0
35,Heritage Place Function,Enhanced record minimum standard,1
36,Designation,Enhanced record minimum standard,0
37,Designation From Date,Enhanced record minimum standard,0


In [137]:
data = go.Scatterpolar(
        r = melted_df['Value'],
        theta = melted_df['field'],
        mode = 'markers',
    )
fig = go.Figure(data)
fig.show()

In [167]:
fig = make_subplots(rows=2, cols=1, specs=[[{'type': 'polar'}]*1]*2)
fig.add_trace(go.Scatterpolar(
	name = "my name",
        r = melted_df['Value'],
        theta = melted_df['field'],
        mode = 'markers',
    ), 1, 1)

In [182]:
mylevel

'level3'

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

ncol = 3
nrow = math.ceil(len(frames.keys()) / ncol)

# fig = make_subplots(rows=nrow, cols=ncol)
fig = make_subplots(rows=nrow, cols=ncol, specs=[[{'type': 'polar'}]*ncol]*nrow, subplot_titles=tuple(frames.keys()))
# fig = make_subplots(rows=nrow, cols=ncol, specs=[[{'type': 'polar'}]*nrow]*ncol)
# fig = make_subplots(rows=nrow, cols=ncol, start_cell="top-left")
colors = {'recorded': 'blue', 'Enhanced record minimum standard': 'red'}

current_column = 1
current_row = 1
# frames.keys()
for a_hp in frames.keys():
	df = frames[a_hp]
	print(a_hp)
	if mylevel == 'level3':
		merged_df = pd.merge(df, df_erms, on = 'field')
		melted_df = pd.melt(merged_df, id_vars = ['field'], var_name = 'Value Set', value_name = 'Value')
		melted_df.sort_values('Value Set', inplace = True)
		# TODO distinguish ERMS from data with colors on 'Value Set'
		melted_df_color = melted_df['Value Set'].map(colors)
		fig.add_trace(go.Scatterpolar(
			name = a_hp,
			r = melted_df['Value'],
			theta = melted_df['field'],
			mode = 'markers',
			marker=dict(color = melted_df_color),
			# marker_color = melted_df['Value Set'], # "blue",
			hovertemplate="<br>".join([
			"value: %{r}",
			"field: %{theta}"
		])), 
			current_row, current_column)
	else:
		variable = df_res['field'].tolist()
		value = df_res['recorded'].tolist()
		df = pd.DataFrame(dict(
			value = value,
			variable = variable))
		fig.add_trace(go.Scatterpolar(
			name = a_hp,
			r = melted_df['Value'],
			theta = melted_df['field'],
			mode = 'markers',
			marker_color = "blue",
			hovertemplate="<br>".join([
			"value: %{r}",
			"field: %{theta}"
		])), 
			current_row, current_column)
		
	current_column = current_column + 1
	# end of line..
	if current_column == ncol:
		current_row = current_row + 1
		current_column = 1
fig.show()

In [236]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

ncol = 3
nrow = math.ceil(len(frames.keys()) / ncol)

# fig = make_subplots(rows=nrow, cols=ncol)
fig = make_subplots(rows=nrow, cols=ncol, specs=[[{'type': 'polar'}]*ncol]*nrow, subplot_titles=tuple(frames.keys()))
# fig = make_subplots(rows=nrow, cols=ncol, specs=[[{'type': 'polar'}]*nrow]*ncol)
# fig = make_subplots(rows=nrow, cols=ncol, start_cell="top-left")
colors = {'recorded': 'blue', 'Enhanced record minimum standard': 'red'}

current_column = 1
current_row = 1
# frames.keys()
for a_hp in frames.keys():
	df = frames[a_hp]
	print(a_hp)
	if mylevel == 'level3':
		fig.add_trace(go.Scatterpolar(
			name =  "  erms",
			r = df_erms['value'],
			theta = df_erms['field'],
			# mode = 'markers',
			# marker=dict(color = melted_df_color),
			# marker_color = "red",
			fill='toself',
			fillcolor='red',
			line_color='red',
			hovertemplate="<br>".join([
			"value: %{r}",
			"field: %{theta}"]),
			showlegend=False), 
			current_row, current_column)		
		fig.add_trace(go.Scatterpolar(
			name = a_hp,
			r = df['recorded'],
			theta = df['field'],
			mode = 'markers',
			# marker=dict(color = melted_df_color),
			marker_color = "blue",
			hovertemplate="<br>".join([
			"value: %{r}",
			"field: %{theta}"])
			), 
			current_row, current_column)
	else:
		fig.add_trace(go.Scatterpolar(
			name = a_hp,
			r = df['recorded'],
			theta = df['field'],
			mode = 'markers',
			marker_color = "blue",
			hovertemplate="<br>".join([
			"value: %{r}",
			"field: %{theta}"]),
			showlegend=False), 
			current_row, current_column)
	current_column = current_column + 1
	# end of line..
	if current_column == ncol:
		current_row = current_row + 1
		current_column = 1
fig.show()

EAMENA-0192340
EAMENA-0192357
EAMENA-0182044
EAMENA-0182035


In [201]:
frames['EAMENA-0192340']
type(df_erms)
dd = df_erms.to_frame()
dd.columns

Index(['Enhanced record minimum standard'], dtype='object')

In [55]:
tit = selected_hp + " - " + mylevel 
if mylevel == 'level3':
    # plot the ERMS
    colors = {'recorded': 'blue', 'Enhanced record minimum standard': 'red'}
    merged_df = pd.merge(l_erms[0], df_erms, on = 'field')
    melted_df = pd.melt(merged_df, id_vars = ['field'], var_name = 'Value Set', value_name = 'Value')
    melted_df.sort_values('Value Set', inplace = True)
    if verbose:
        print(melted_df.to_markdown())
    fig = px.line_polar(melted_df, r='Value', theta='field', color = 'Value Set',
                        line_close = False, color_discrete_map = colors, title = tit)
    fig.show()
else:
    variable = df_res['field'].tolist()
    value = df_res['recorded'].tolist()
    df = pd.DataFrame(dict(
        value = value,
        variable = variable))
    fig = px.line_polar(df, r = 'value', theta = 'variable', 
                        line_close = True, title = tit)
    fig.show()

TypeError: can only concatenate list (not "str") to list

## Development

* add a loop to work with 1..n HP
* improve the spider plot output (inetrvals, grid layout, etc.)
* connect the main DB

## Questions

* What kind of structure should we select to provide a list of HP (dataframe, list, etc.)?
* Do we want to have this ERMS assessement upstream (on the BU)?
* Do the code chunk are useful is this document, if not they can be grouped in functions, and these functions called from the Jupyter notebook (ex: `!python myfunction.py`)?