- implement labcutoff approach (based on calculation examples for all cases?)
  - LOD/LOQ values

- review formulas
  - persistence strategy
    - default reference values (comes with the formula)
    - default calculation workflow(s) for an ObservableProperty
    - study configuration / overrides for calculation & reference values
    - calculation design persistence structure
      - calculation_name
      - calculation_implementation > e.g. point to named python function in hbm module
      - calculation_implementation_as_string
      - calculation_arguments > list of inputs: mapping name + type + expected unit + source/discovery path
      - calculation_results > list of outputs: mapping name + type + expected unit + destination path
      - conditionals: validity conditions (and/or check at input level)?
  - implement python sum function, configure with input parameters and combine with imputation/normalisation functionality to handle current existing formulas?
  - imp, meb (medium bound) > multiple cases to be persisted simultaneously?
  - extensions: mol (core), lip/crt/sg (prop)

- review indicators / varnames / constraints approach
  - properties were previously defined at samplegroup level, indicators not necessarily
  - Unit_keys preventing direct prop <> indicator conversion:
    - Note: varname is unique at core level, prop has composite key: varname, samplegroup, unit 
    - Should these be separate indicators/observable properties (& varnames)?
      - 4 blood items: 2x ['µg/L', 'pg BEQ/g'], 2x ['µg/L', 'pmol/g globin']
      - 81 air items: [('ng/filter', 'µg/kg dm')]
    - find historical data solution for "duplicated" units: ["ng/filter", "pg BEQ/g", "pmol/g globin"]
    - Are there any conversions within props/indicators that convert between non-compatible units
  - anything to be handled with the historical "tl" situation?

- review categorisation and links:
  - Close_match > rather use the conceptual "broader"/"narrower" relations ?
    - related: dd_core > relatedvarnames
  - complete indicator_type list (exposuremarker, effectmarker, observation, ...) and ensure correct translation
  - biochementity_links from effectmarkers, core <> prop, base <> core
  - groupings from effectmarkers, base_category, core_category
  - use calculated/measured sum and compound relations to deduce "types" of biochementities
    - (compound, linear/branched/combination, measured/conceptual group, -deterministic- sum, metabolite, ...)

- find non-biochementity indicators > change type to IndicatorType.observation + remove biochementity & links?
- perclpfos-lpfos_perc: sampleobscore_key vs varname > sampleobscore_key is ignored
- Statistics are per indicator with constraint > create indicator&property at detailed matrix level
  - create new indicators for 2 'pg BEQ/g' blood items, 2 'pmol/g globin' blood items and 81 'ng/filter' air items
  - create indicators for each of the entries in forYaml_sampleobsprop_statistics.csv


In [None]:
import yaml
import json
import decimal

import pandas as pd
import numpy as np

from peh import BioChemEntity, Translation, BioChemEntityLink, BioChemEntityLinkType, BioChemIdentifier
from peh import Indicator, IndicatorType, ObservableEntityType, QudtQuantityKind
from peh import ObservableProperty, ObservablePropertyMetadataElement, ObservationType, ObservationResultType, CalculationDesign
from peh import ValidationStatus, ValidationHistoryRecord

from linkml_runtime.dumpers import json_dumper, yaml_dumper

In [None]:
UNIT_CONVERSION = {
 '-': {"property": "observation", "quantity_kind": "Dimensionless", "default_unit": "UNITLESS"},
 'mitochondrial/nuclear DNA': {"property": "mitochondrial/nuclear DNA", "quantity_kind": "Dimensionless", "default_unit": "UNITLESS"},
 'number of cells': {"property": "number of cells", "quantity_kind": "Dimensionless", "default_unit": "NUM"},

 '%': {"property": "percentage", "quantity_kind": "DimensionlessRatio", "default_unit": "PERCENT"},
 '% (g/100g)': {"property": "mass percentage", "quantity_kind": "DimensionlessRatio", "default_unit": "PERCENT"},
 '% dm': {"property": "percentage dry matter", "quantity_kind": "DimensionlessRatio", "default_unit": "PERCENT"},
 'index %': {"property": "index percentage", "quantity_kind": "DimensionlessRatio", "default_unit": "PERCENT"},
 'ppb': {"property": "parts per billion", "quantity_kind": "DimensionlessRatio", "default_unit": "PPB"},

 'ng/g': {"property": "mass ratio", "quantity_kind": "DimensionlessRatio", "default_unit": "PPB"},
 'pg/mg': {"property": "mass ratio", "quantity_kind": "DimensionlessRatio", "default_unit": "PPB"},
 'µg/kg dm': {"property": "mass ratio dry matter", "quantity_kind": "DimensionlessRatio", "default_unit": "PPB"},
 'µg/kg ww': {"property": "mass ratio wet weight", "quantity_kind": "DimensionlessRatio", "default_unit": "PPB"},
 'pg BEQ/g': {"property": "bioanalytical equivalent mass ratio", "quantity_kind": "DimensionlessRatio", "default_unit": "PPTR"},
 'mmol/mol': {"property": "amount of substance ratio", "quantity_kind": "DimensionlessRatio", "default_unit": "PPTH"},

 'pg': {"property": "mass", "quantity_kind": "Mass", "default_unit": "PicoGM"},

 'L': {"property": "volume", "quantity_kind": "Volume", "default_unit": "L"},
 'fL': {"property": "volume", "quantity_kind": "Volume", "default_unit": "FemtoL"},

 'amount/µL': {"property": "amount per volume", "quantity_kind": "NumberDensity", "default_unit": "NUM-PER-MicroL"},
 'milj/µL': {"property": "amount per volume", "quantity_kind": "NumberDensity", "default_unit": "NUM-PER-PicoL"},
 'IU/L': {"property": "amount per volume", "quantity_kind": "SerumOrPlasmaLevel", "default_unit": "IU-PER-L"},
 'kU/L': {"property": "amount per volume", "quantity_kind": "SerumOrPlasmaLevel", "default_unit": "IU-PER-MilliL"},
 'µLU/mL': {"property": "amount per volume", "quantity_kind": "SerumOrPlasmaLevel", "default_unit": None},

 'Osm/L': {"property": "osmotic concentration", "quantity_kind": "AmountOfSubstanceConcentration", "default_unit": "MOL-PER-L"},

 'g/L': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "MilliGM-PER-MilliL"},
 'g/dL': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "GM-PER-DeciL"},
 'mg/L': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "MilliGM-PER-L"},
 'mg/dL': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "MilliGM-PER-DeciL"},
 'ng/L': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "NanoGM-PER-L"},
 'ng/dL': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "NanoGM-PER-DeciL"},
 'ng/mL': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "NanoGM-PER-MilliL"},
 'pg/mL': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "PicoGM-PER-MilliL"},
 'µg/L': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "MicroGM-PER-L"},
 'µg/mL': {"property": "mass concentration", "quantity_kind": "MassConcentration", "default_unit": "MicroGM-PER-MilliL"},
 
 'nmol/L': {"property": "substance concentration", "quantity_kind": "AmountOfSubstanceConcentration", "default_unit": "NanoMOL-PER-L"},
 'µmol/L': {"property": "substance concentration", "quantity_kind": "AmountOfSubstanceConcentration", "default_unit": "MicroMOL-PER-L"},
 
 'pmol/g globin': {"property": "substance per mass", "quantity_kind": "AmountOfSubstancePerUnitMass", "default_unit": "FemtoMOL-PER-KiloGM"},

 'ng/filter': {"property": "mass per filter unit", "quantity_kind": "Mass", "default_unit": "GM"},
}

In [None]:
# forYaml_sampleobsbase.csv
df_base = pd.read_csv("../source_tables/indicator_input/forYaml_sampleobsbase.csv", sep=';', encoding='utf-8')
dl_base = df_base.replace({np.nan:None}).to_dict(orient="records")
dd_base = {e["sampleobsbase_key"]:e for e in dl_base}
base_translation_dict = {e["sampleobsbase_id"]: e["sampleobsbase_key"] for e in dl_base}
print("[base keys]: ", " - ".join(list(dd_base.values())[0].keys()))

biochementity_dict = {
    e["sampleobsbase_key"]: BioChemEntity(
        id = e["sampleobsbase_key"],
        unique_name = e["sampleobsbase_key"],
        name = e["name_en"],
        label = e["label_en"],
        molweight_grampermol = round(decimal.Decimal(e["molweight_grampermol"]), 2) if e["molweight_grampermol"] else None,
        translations = [
            Translation(property_name="name", language="nl-be", translated_value=e["name_nl"]),
            Translation(property_name="label", language="nl-be", translated_value=e["label_nl"]),
        ]
    )
    for e in dl_base
}

In [None]:
# forYaml_sampleobsbase_related.csv
df_base_relations = pd.read_csv("../source_tables/indicator_input/forYaml_sampleobsbase_related.csv", sep=';', encoding='utf-8')
dl_base_relations = df_base_relations.replace({np.nan:None}).to_dict(orient="records")
dd_base_relations = {d:[br for br in dl_base_relations if br["sampleobsbase_id"]==d] for d in set([r["sampleobsbase_id"] for r in dl_base_relations])}

def get_linktype(db_name):
    translation_dict = {
        "branched version of": BioChemEntityLinkType.branched_version_of,
        "parentcompound": BioChemEntityLinkType.has_parent_compound,
        "parent compound": BioChemEntityLinkType.has_parent_compound,
        "exact_match": BioChemEntityLinkType.exact_match,
        "close_match": BioChemEntityLinkType.close_match,
    }
    return translation_dict[db_name]

for br in set([r["sampleobsbase_id"] for r in dl_base_relations]):
    rl = [r for r in dl_base_relations if r["sampleobsbase_id"]==br]
    biochementity_dict[base_translation_dict[br]].biochementity_links = [
        BioChemEntityLink(biochementity_linktype=get_linktype(r["relation"]), biochementity=biochementity_dict[base_translation_dict[r["sampleobsbase_relatedto_id"]]].id) for r in rl
    ]

In [None]:
# forYaml_sampleobsbase_validation.csv
df_base_validation = pd.read_csv("../source_tables/indicator_input/forYaml_sampleobsbase_validation.csv", sep=';', encoding='utf-8')
dl_base_validation = df_base_validation.replace({np.nan:None}).to_dict(orient="records")

In [None]:
# check dl_base_validation duplications (should return empty list)
unique_keys = list(set([v["sampleobsbase_key"] for v in dl_base_validation]))
base_validation_dict = {k:[bv for bv in dl_base_validation if bv["sampleobsbase_key"]==k] for k in unique_keys}
list(set([k for k in list(base_validation_dict.keys()) if len(base_validation_dict[k]) > 1]))

In [None]:
# Check validation records without corresponding sampleobsbase records (should return empty list)
[bv["sampleobsbase_key"] for bv in dl_base_validation if bv["sampleobsbase_key"] not in biochementity_dict.keys()]

In [None]:
def get_validation_status(db_name):
    translation_dict = {
        "Validated": ValidationStatus.validated,
        "Unvalidated": ValidationStatus.unvalidated,
        "InProgress_Expert": ValidationStatus.in_progress,
        "InProgress_VITOInternal": ValidationStatus.in_progress,
    }
    return translation_dict[db_name]

for bv in [r for r in dl_base_validation if r["sampleobsbase_key"] in biochementity_dict.keys()]:
    biochementity_dict[bv["sampleobsbase_key"]].current_validation_status = get_validation_status(bv["validationStatus"])
    biochementity_dict[bv["sampleobsbase_key"]].validation_history = [
        ValidationHistoryRecord(
            validation_status=get_validation_status(bv["validationStatus"]), validation_remark=bv["validationReference"],
            validation_actor=bv["validationEmail"], validation_institute=bv["validationInstitute"])
    ]
    if bv["validationStatus"] == "Validated" and bv["validationID"] == 'inchikey_id':
        biochementity_dict[bv["sampleobsbase_key"]].biochemidentifiers = [
            BioChemIdentifier(identifier_schema="INCHIKEY", identifier_code=bv["inchikey_id"], validation_history=[
                ValidationHistoryRecord(
                    validation_status=get_validation_status(bv["validationStatus"]), validation_remark=bv["validationReference"],
                    validation_actor=bv["validationEmail"], validation_institute=bv["validationInstitute"])
            ])
        ]
    if bv["validationStatus"] == "Validated" and bv["validationID"] == 'chebi_id':
        biochementity_dict[bv["sampleobsbase_key"]].biochemidentifiers = [
            BioChemIdentifier(identifier_schema="CHEBI", identifier_code=bv["chebi_id"], validation_history=[
                ValidationHistoryRecord(
                    validation_status=get_validation_status(bv["validationStatus"]), validation_remark=bv["validationReference"],
                    validation_actor=bv["validationEmail"], validation_institute=bv["validationInstitute"])
            ])
        ]


In [None]:
# get unique set of cores and full list of props from forYaml_sampleobscore_sampleobsprop.csv
df_core_prop = pd.read_csv("../source_tables/indicator_input/forYaml_sampleobscore_sampleobsprop.csv", sep=';', encoding='utf-8')
dl_core_prop = df_core_prop.replace({np.nan:None}).to_dict(orient="records")

dl_core = [{k:v for k,v in [cp for cp in dl_core_prop if cp['sampleobscore_id']==i][0].items() if k in ['sampleobscore_id', 'sampleobscore_key', 'varname', 'sort', 'vartype_key', 'datatype_key', 'relatedvarnames',
  'vartypedetail_key', 'label_en', 'name_en', 'label_nl', 'name_nl', '']} for i in set([ci["sampleobscore_id"] for ci in dl_core_prop])]
dd_core = {e["varname"]:e for e in dl_core}
core_translation_dict = {e["sampleobscore_id"]: e["varname"] for e in dl_core}
print("[core keys]: ", ", ".join(list(dd_core.values())[0].keys()))
print("datatype_key: ", ", ".join(set([c["datatype_key"] for c in dl_core])))
print("vartype_key: ", ", ".join(set([c["vartype_key"] for c in dl_core])))
print("vartypedetail_key: ", ", ".join(set([c["vartypedetail_key"] for c in dl_core])))

dl_prop = [{k:v for k,v in p.items() if k in ["sampleobsprop_id", "sampleobscore_key", "varname", "samplegroup_key", "extensions", "unit_key", "significantdecimals", "zeroallowed", "formula"]} for p in dl_core_prop]
dd_prop = {"|".join([e["varname"], e["samplegroup_key"], e["unit_key"]]):e for e in dl_prop}
prop_translation_dict = {e["sampleobsprop_id"]: "|".join([e["varname"], e["samplegroup_key"], e["unit_key"]]) for e in dl_prop}
print("[prop keys]: ", ", ".join(list(dd_prop.values())[0].keys()))
print("samplegroup_key: ", ", ".join(sorted(set([c["samplegroup_key"] for c in dl_prop]))))
print("unit_key: ", ", ".join(sorted(set([c["unit_key"] for c in dl_prop]))))

In [None]:
def get_indicator_identifier(property, varname, matrix):
    return f"{property} of {varname} in {matrix}"

indicator_dict = {
    get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]): Indicator(
        id = get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]),
        unique_name = get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]),
        name = get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]),
        label = get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]),
        indicator_type = IndicatorType.exposuremarker,
        varname = e["varname"],
        property = UNIT_CONVERSION[e["unit_key"]]["property"],
        quantity_kind = UNIT_CONVERSION[e["unit_key"]]["quantity_kind"],
        matrix = e["samplegroup_key"],
        relevant_observable_entity_types = [ObservableEntityType.person, ObservableEntityType.sample]
    )
    for e in dl_prop
    if e["unit_key"] not in ["ng/filter", "pg BEQ/g", "pmol/g globin"]
}
print(len(dl_prop))
print(len(set([get_indicator_identifier(UNIT_CONVERSION[p["unit_key"]]["property"], p["varname"], p["samplegroup_key"]) for p in dl_prop])))
print(len([p for p in dl_prop if p["unit_key"] not in ["ng/filter", "pg BEQ/g", "pmol/g globin"]]))
print(len(indicator_dict))

In [None]:
# get core <> base relations from forYaml_sampleobsbase_sampleobscore.csv
df_base_core_relations = pd.read_csv("../source_tables/indicator_input/forYaml_sampleobsbase_sampleobscore.csv", sep=';', encoding='utf-8')
dl_base_core_relations = df_base_core_relations.replace({np.nan:None}).to_dict(orient="records")
dd_base_core_relations = {base_translation_dict[d]:[br for br in dl_base_core_relations if br["sampleobsbase_id"]==d] for d in base_translation_dict.keys()}
dd_core_base_relations = {core_translation_dict[c]:[br for br in dl_base_core_relations if br["sampleobscore_id"]==c] for c in core_translation_dict.keys()}
print(list(set([bcr["linktype"] for bcr in dl_base_core_relations])))

In [None]:
# create BioChemEntity objects for base groups defined as core
dl_base_core_group_relations = [bcr for bcr in dl_base_core_relations if bcr["linktype"] == "group_contains"]
core_group_relation_ids = list(set([bcr["sampleobscore_id"] for bcr in dl_base_core_group_relations]))
core_group_relation_keys = [dd_core[core_translation_dict[sampleobscore_id]]["varname"] for sampleobscore_id in core_group_relation_ids]

print(f"{len(core_group_relation_ids)} core groups being added to biochementity_dict")

for sampleobscore_id in core_group_relation_ids:
    core = dd_core[core_translation_dict[sampleobscore_id]]
    linked_base_ids = [bcr["sampleobsbase_id"] for bcr in dl_base_core_relations if bcr["sampleobscore_id"] == sampleobscore_id and bcr["linktype"] == "group_contains"]
    if core["varname"] in biochementity_dict.keys():
        print(f"{core['varname']} already exists in biochementity_dict")
    else:
        biochementity_dict[core["varname"]] = BioChemEntity(
            id = core["varname"],
            unique_name = core["varname"],
            name = core["name_en"],
            label = core["label_en"],
            translations = [
                Translation(property_name="name", language="nl-be", translated_value=core["name_nl"]),
                Translation(property_name="label", language="nl-be", translated_value=core["label_nl"]),
            ],
            biochementity_links=[
                BioChemEntityLink(biochementity_linktype=BioChemEntityLinkType.group_contains, biochementity=biochementity_dict[base_translation_dict[base_id]].id) for base_id in linked_base_ids
            ]
        )

In [None]:
# get effect marker, their grouping and BioChemEntity relation from effectmarkersandprop.csv
df_effectmarkers = pd.read_csv("../source_tables/indicator_input/effectmarkersandprop.csv", sep=';', encoding='utf-8')
dl_effectmarkers = df_effectmarkers.replace({np.nan:None}).to_dict(orient="records")
dd_effectmarkers = {k:[em for em in dl_effectmarkers if "|".join([em["varname"], em["samplegroup_key"]])==k] for k in set(["|".join([emk["varname"], emk["samplegroup_key"]]) for emk in dl_effectmarkers])}
print(len(set(["|".join([em["varname"], em["samplegroup_key"]]) for em in dl_effectmarkers])))
print(len(dl_effectmarkers))

In [None]:
biochementity_keys = ['varname', 'vartype_key', 'name_en', 'linktype', 'chebi_id', 'inchikey_id']
unique_effectmarker_biochementity_list = list(set([tuple(pv for pn,pv in em.items() if pn in biochementity_keys) for em in dl_effectmarkers]))
print(len(unique_effectmarker_biochementity_list))
em_biochementity_inclusion = [em for em in unique_effectmarker_biochementity_list if em[3]]
print(len(em_biochementity_inclusion), set([em[1] for em in em_biochementity_inclusion]), set([em[3] for em in em_biochementity_inclusion]))
em_biochementity_exclusion = [em for em in unique_effectmarker_biochementity_list if not em[3]]
print(len(em_biochementity_exclusion), set([em[1] for em in em_biochementity_exclusion]), set([em[3] for em in em_biochementity_exclusion]))

indicator_keys = ['varname', 'vartype_key', 'name_en', 'samplegroup_key', 'linktype', 'chebi_id', 'inchikey_id']
unique_effectmarker_indicator_list = list(set([tuple(pv for pn,pv in em.items() if pn in indicator_keys) for em in dl_effectmarkers]))
print(len(unique_effectmarker_indicator_list))
print(len(dl_effectmarkers))

In [None]:
inchi_to_base = {v['inchikey_id']:v["sampleobsbase_key"] for v in dl_base_validation if v['validationID']=="inchikey_id"}
chebi_to_base = {v['chebi_id']:v["sampleobsbase_key"] for v in dl_base_validation if v['validationID']=="chebi_id"}

for indicator_key in indicator_dict.keys():
    if indicator_key in dd_effectmarkers.keys():
        indicator_dict[indicator_key].indicator_type = IndicatorType.effectmarker
        for em in dd_effectmarkers[indicator_key]:
            if em["linktype"] is None:
                if len(indicator_dict[indicator_key].biochementity_links):
                    print(indicator_key, indicator_dict[indicator_key].biochementity_links)
            else:
                em_done = False
                if em["chebi_id"] and em["chebi_id"] in chebi_to_base.keys():
                    indicator_dict[indicator_key].biochementity_links = [
                        BioChemEntityLink(biochementity_linktype=BioChemEntityLinkType.exact_match, biochementity=biochementity_dict[chebi_to_base[em["chebi_id"]]].id)
                    ]
                elif em["inchikey_id"] and em["inchikey_id"] in inchi_to_base.keys():
                    indicator_dict[indicator_key].biochementity_links = [
                        BioChemEntityLink(biochementity_linktype=BioChemEntityLinkType.exact_match, biochementity=biochementity_dict[inchi_to_base[em["inchikey_id"]]].id)
                    ]
    elif indicator_dict[indicator_key].varname in core_group_relation_keys:
        indicator_dict[indicator_key].biochementity_links = [
            BioChemEntityLink(biochementity_linktype=BioChemEntityLinkType.exact_match, biochementity=biochementity_dict[indicator_dict[indicator_key].varname].id)
        ]
    else:
        indicator_dict[indicator_key].biochementity_links = [
            BioChemEntityLink(biochementity_linktype=get_linktype(cbr['linktype']), biochementity=biochementity_dict[base_translation_dict[cbr['sampleobsbase_id']]].id)
            for cbr in dd_core_base_relations[indicator_dict[indicator_key].varname]
        ]

In [None]:
# forYaml_sampleobscore_category.csv > add grouping_id_list in Indicator dict
dl_core_cat = pd.read_csv("../source_tables/indicator_input/forYaml_sampleobscore_category.csv", sep=';', encoding='utf-8').replace({np.nan:None}).to_dict(orient="records")
for core_cat in dl_core_cat:
    for indicator in [i for i in indicator_dict.values() if i.unique_name.startswith(core_translation_dict[core_cat["sampleobscore_id"]] + "|")]:
        indicator.grouping_id_list = list(set(indicator.grouping_id_list + [core_cat['category_key']]))


In [None]:
# forYaml_sampleobscore_sampleobsprop.csv
# - create ObservableProperty objects, one for each Indicator, adding: unit, etc, ...
observable_property_dict = {
    get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]): ObservableProperty(
        id = get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]),
        unique_name = get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]),
        name = " concentration in ".join([e["varname"], e["samplegroup_key"]]),
        label = " concentration in ".join([e["varname"], e["samplegroup_key"]]),
        categorical=False, multivalued=False,
        default_required=False, default_significantdecimals=e["significantdecimals"], default_zeroallowed=bool(e["zeroallowed"]),
        value_type = dd_core[e["varname"]]["datatype_key"],
        default_unit = UNIT_CONVERSION[e["unit_key"]]["default_unit"],
        default_observation_result_type = ObservationResultType.calculation if dd_core[e["varname"]]["vartype_key"]=="derived" else ObservationResultType.measurement,
        relevant_observation_types = [ObservationType.sampling],
        indicator=get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]),
        calculation_designs = [
            CalculationDesign(calculation_name="*", calculation_implementation_as_string=e["formula"])
        ] if e["formula"] else None
    )
    for e in dl_prop
    if e["unit_key"] not in ["ng/filter", "pg BEQ/g", "pmol/g globin"]
}

In [None]:
# forYaml_sampleobsprop_statistics.csv > add constraint to ObservableProperty value_metadata (as field & value)
dl_prop_stat = pd.read_csv("../source_tables/indicator_input/forYaml_sampleobsprop_statistics.csv", sep=';', encoding='utf-8').replace({np.nan:None}).to_dict(orient="records")

# preliminary solution for forYaml_sampleobsprop_statistics.csv not containing units
stats_id_dict = {
    stats_id: list(set([get_indicator_identifier(UNIT_CONVERSION[e["unit_key"]]["property"], e["varname"], e["samplegroup_key"]) for e in dl_prop if "|".join([e["varname"], e["samplegroup_key"]])==stats_id]))
    for stats_id in set(["|".join([p["varname"], p["samplegroup_key"]]) for p in dl_prop_stat if p["varname"] and p["samplegroup_key"]])
}

for prop_stat in dl_prop_stat:
    if prop_stat["varname"] and prop_stat["samplegroup_key"] and prop_stat["statswhat"] and prop_stat["statsvalue"]:
        for indicator_id in stats_id_dict["|".join([prop_stat["varname"], prop_stat["samplegroup_key"]])]:
            observable_property_dict[indicator_id].value_metadata.append(ObservablePropertyMetadataElement(
                field=prop_stat["statswhat"],
                value=str(prop_stat["statsvalue"])
            ))
            observable_property_dict[indicator_id].value_metadata.append(ObservablePropertyMetadataElement(
                field=prop_stat["statswhat"]+"_provenance",
                value=json.dumps({
                "contact": prop_stat["statsprovenance_who"],
                "source": prop_stat["statsprovenance_source"],
                "source_detail": prop_stat["statsprovenance_source_detail"],
                "source_info": prop_stat["statsprovenance_source_info"], 
            })))


In [None]:
yaml_dumper.dump({"biochementities": [v for v in biochementity_dict.values() if v]}, "../extract/BioChemEntityList_data.yaml")
yaml_dumper.dump({"indicators": [v for v in indicator_dict.values() if v]}, "../extract/IndicatorList_data.yaml")
yaml_dumper.dump({"observable_properties": [v for v in observable_property_dict.values() if v]}, "../extract/ObservablePropertyList_data.yaml")
