## Enhanced record minimum standard compliance of Heritage Places

Enhanced record minimum standard (ERMS) is the minimum standard of data enhancement for heritage places. The report of Heritage Places ERMS is done downstream, once the heritages places (HP) have been recorded in the database 

Import libraries

In [1]:
import psycopg2 as pg
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
import plotly.express as px

### Constants

Load:
- the UUID of HP in its resource model (RM)
- the read-only user `eamenar` parameters (see: [creating-a-read-only-user](https://github.com/eamena-project/eamena-arches-dev/tree/main/dev/postgres#creating-a-read-only-user)) on the training EAMENA instance
- ...

In [2]:
# Heritage Place Resource Model UUID
uuid_hp = '34cfe992-c2c0-11ea-9026-02e7594ce0a0'
# connection parameters
dbname = "eamena"
user = "eamenar"
password = "eamenar"
host = "52.50.27.140"
port = "5432"
# verbose
verbose = False

Connect the database

In [21]:
try:
    connection = pg.connect(
        dbname = dbname,
        user = user,
        password = password,
        host = host,
        port = port
    )
    cur = connection.cursor()
    if verbose:
        print("Connection established successfully!")
except pg.Error as e:
    print(f"Error: {e}")

## Heritage place selection

Select an HP and get its UUID

In [22]:
selected_hp = 'EAMENA-0500002'
sqll = """
SELECT
      resourceinstanceid AS resourceid
      FROM tiles
      WHERE tiledata -> '%s' -> 'en' ->> 'value' LIKE '%s'
""" % (uuid_hp, selected_hp)
# print(sqll)
cur.execute(sqll)
hpid = cur.fetchone()[0]  
if verbose: 
      print("the UUID of '" + selected_hp + "' is '" + hpid + "'")

## Heritage places field andwith their UUIDs

Read the [output.tsv](https://github.com/eamena-project/eamena-arches-dev/blob/main/dev/data_quality/output.tsv) file with listed UUID linked to fields. This TSV file is exported automatically ([this GitHub Action](https://github.com/eamena-project/eamena-arches-dev/actions/workflows/update-trigger.yml)) from the [erms_template.xlsx](https://github.com/eamena-project/eamena-arches-dev/blob/main/dev/data_quality/template.xlsx) file, in the same directory. The file 'template.xlsx' is considered to be the authorative document for ERMS.

In [23]:
tsv_file = "https://raw.githubusercontent.com/eamena-project/eamena-arches-dev/main/dev/data_quality/output.tsv"
df = pd.read_csv(tsv_file, delimiter='\t')
df = df[["level1", "level2", "level3", "uuid_sql", "Enhanced record minimum standard"]]
df_listed = df.dropna()
if verbose:
    print(df_listed.to_markdown())

Plot the ERMS dataframe, and select the level of aggregation (`level1`, `level2` or `level3`) on which the spider plot will be done

In [24]:
mylevel = 'level3'
df_erms = df_listed.copy()
df_erms['Enhanced record minimum standard'] = df_erms['Enhanced record minimum standard'].str.contains(r'Yes', case=False, na=False, regex=True).astype(int)
df_erms = df_erms[[mylevel, "Enhanced record minimum standard"]]
df_erms.columns.values[0] = "field"
df_erms = df_erms.groupby(['field'])['Enhanced record minimum standard'].sum()
print(df_erms.to_markdown())

| field                                      |   Enhanced record minimum standard |
|:-------------------------------------------|-----------------------------------:|
| Cadastral Reference                        |                                  0 |
| Cultural Period Certainty                  |                                  1 |
| Damage Extent Type                         |                                  0 |
| Designation                                |                                  0 |
| Designation From Date                      |                                  0 |
| Designation To Date                        |                                  0 |
| Disturbance Cause Assignment Assessor Name |                                  0 |
| Disturbance Cause Category Type            |                                  1 |
| GE Imagery Acquisition Date                |                                  1 |
| General Description                        |                              

Create an empty dataframe, loop over UUIDs to collect data from the selected HP {{selected_hp}}, and fill the empty dataframe

In [25]:
# empty dataframe
level_values = df_listed[mylevel].unique()
data = {'field': level_values,
        'recorded': np.repeat(0, len(level_values)).tolist()}
df_res = pd.DataFrame(data)
# loop and fill it
for i in df_listed.index:
    if verbose:
        print("read: " + df_listed[mylevel][i] + ' | ' + df_listed['uuid_sql'][i])
    df_field = df_listed[mylevel][i]
    df_field_sql = re.sub(" ", "_", df_field) # rm space
    df_uuid = df_listed['uuid_sql'][i]
    sqll = """
    SELECT value FROM values 
    WHERE valueid::text IN
    (
    SELECT tiledata ->> '%s' AS %s
    FROM tiles 
    WHERE resourceinstanceid::text LIKE '%s'
    AND tiledata -> '%s' IS NOT NULL
    )
    """ % (df_uuid, df_field_sql, hpid, df_uuid)
    if verbose:
        print(sqll)
    cur.execute(sqll)
    outvalue = cur.fetchall()
    if len(outvalue) > 0:
        row_num = df_res[df_res['field'] == df_field].index.tolist()
        df_res.at[row_num[0], 'recorded'] = df_res.loc[row_num[0]]['recorded'] + 1
        if verbose:
            print("recorded values: " + str(outvalue))
if(verbose):
    print(df_res.to_markdown())

## Spider diagram

Show spider diagram with number of fields recorded. If `level3` has been selected, the spider plot will also plot the ERMS. 

In [26]:
if mylevel == 'level3':
    colors = {'recorded': 'blue', 'Enhanced record minimum standard': 'red'}
    merged_df = pd.merge(df_res, df_erms, on='field')
    melted_df = pd.melt(merged_df, id_vars=['field'], var_name='Value Set', value_name='Value')
    melted_df.sort_values('Value Set', inplace=True)
    if verbose:
        print(melted_df.to_markdown())
    fig = px.line_polar(melted_df, r='Value', theta='field', color='Value Set',
                        line_close = False, color_discrete_map=colors, title = selected_hp)
    fig.show()
else:
    variable = df_res['field'].tolist()
    value = df_res['recorded'].tolist()
    df = pd.DataFrame(dict(
        value = value,
        variable = variable))
    fig = px.line_polar(df, r = 'value', theta = 'variable', line_close = True)
    fig.show()