# Tutorial - Preparing measurement table

This tutorial takes you through the entire workflow of the [Biology][biology] module.

In [None]:
import eds_scikit
import pandas as pd

## 1 - Load data <a id="load-data"></a>

!!! tip "Big volume"
    Measurement table can be large. Do not forget to set proper spark config.

In [None]:
to_add_conf = [
    ("master", "yarn"),
    ("deploy_mode", "client"),
    ("spark.driver.memory", ...),
    ("spark.executor.memory", ...),
    ("spark.executor.cores", ...),
    ("spark.executor.memoryOverhead", ...),
    ("spark.driver.maxResultSize", ...)
    ...
]

spark, sc, sql = eds_scikit.improve_performances(to_add_conf=to_add_conf)

from eds_scikit.io.hive import HiveData

In [None]:
data = HiveData(
    spark_session=spark,
    database_name="cse_xxxxxxx_xxxxxxx",
    tables_to_load=[
        "care_site",
        "concept",
        "visit_occurrence",
        "measurement",
        "concept_relationship",
    ],
)

## 2 - Quick use : Preparing measurement table <a id="quick-use"></a>

### a) Define biology concept-sets <a id="define-biology-concept-set"></a>

In order to work on the measurements of interest, you can extract a list of concepts-sets by:

- Selecting [default concepts-sets](../../datasets/concepts-sets.md) provided in the library.
- Modifying the codes of a selected default concepts-set.
- Creating a concepts-set from scratch.

__Code selection can be tricky. See <a href="#concept-codes-explorer">Concept codes relationships exploration</a> section for more details on how to select them.__

In [None]:
from eds_scikit.biology import ConceptsSet

# Creating Concept-Set
custom_leukocytes = ConceptsSet("Custom_Leukocytes")

custom_leukocytes.add_concept_codes(
    concept_codes=['A0174', 'H6740', 'C8824'], 
    terminology='GLIMS_ANABIO' 
)
custom_leukocytes.add_concept_codes(
    concept_codes=['6690-2'], 
    terminology='ITM_LOINC'
)

# Importing Concept-Set (see. 4.b for details on existing concepts sets)
glucose_blood = ConceptsSet("Glucose_Blood_Concentration")

In [None]:
concepts_sets = [
    custom_leukocytes, 
    glucose_blood
]

### b) Prepare measurements <a id="prepare-measurements"></a>

!!! tip "Lazy execution"
    Execution will be lazy, except if ```convert_units=True```.

In [None]:
from eds_scikit.biology import prepare_measurement_table

In [None]:
measurement = prepare_measurement_table(data,
                                        start_date="2022-01-01", end_date="2022-05-01",
                                        concept_sets=concepts_sets,
                                        convert_units=False,
                                        get_all_terminologies=True
                                       )

__Now you have your measurement table mapped with concept set terminology.__ Next sections are about measurement codes analysis, units and plots.

## 3 - Detailed use : Analysing measurement table<a id="detailed-use"></a>

### a) Measurements statistics table <a id="stat-table"></a>

In [None]:
from eds_scikit.biology import measurement_values_summary

In [None]:
stats_summary = measurement_values_summary(measurement, 
                                           category_cols=["concept_set", "GLIMS_ANABIO_concept_code", "GLIMS_LOINC_concept_code"], 
                                           value_column="value_as_number", 
                                           unit_column="unit_source_value")

stats_summary

### b) Measurements units correction <a id="units-correction"></a>

In [None]:
glucose_blood.add_conversion("mol", "g", 180)
glucose_blood.add_target_unit("mmol/l")

concepts_sets = [glucose_blood, custom_leukocytes]

In [None]:
measurement = prepare_measurement_table(data, 
                                        start_date="2022-01-01", end_date="2022-05-01",
                                        concept_sets=concepts_sets,
                                        convert_units=True, 
                                        get_all_terminologies=False
                                       )

In [None]:
stats_summary = measurement_values_summary(measurement, 
                                           category_cols=["concept_set", "GLIMS_ANABIO_concept_code"], 
                                           value_column="value_as_number_normalized", #converted
                                           unit_column="unit_source_value_normalized")

stats_summary

### c) Plot biology summary <a id="plot-summary"></a>

Applying ```plot_biology_summary``` to computed measurement dataframe, merged with care sites, allows to generate nice exploration plots such as :

- [Interactive volumetry](../../_static/biology/viz/interactive_volumetry.html)

- [Interactive distribution](../../_static/biology/viz/interactive_distribution.html)

In [None]:
from eds_scikit.biology import plot_biology_summary

In [None]:
measurement = measurement.merge(data.visit_occurrence[["care_site_id", "visit_occurrence_id"]], on="visit_occurrence_id")
measurement = measurement.merge(data.care_site[["care_site_id", "care_site_short_name"]], on="care_site_id")

In [None]:
plot_biology_summary(measurement, value_column="value_as_number_normalized") 

## 4 - Further : Concept Codes, Concepts Sets and Units <a id="further"></a>

### 1 - Concept codes relationships exploration <a id="concept-codes-explorer"></a>

Concept codes relationships can be tricky to understand and to manipulate. Function ```prepare_biology_relationship_table``` allows to build __mapping dataframe between main AP-HP biology referential__.

See ```io.settings.measurement_config["mapping"]``` and ```io.settings.measurement_config["source_terminologies"]``` configurations for mapping details.

In [None]:
from eds_scikit.biology import prepare_biology_relationship_table

biology_relationship_table = prepare_biology_relationship_table(data)
biology_relationship_table = biology_relationship_table.to_pandas()

Relationship between codes from different referentials.

In [None]:
columns = [col for col in biology_relationship_table.columns if "concept_code" in col]

biology_relationship_table[biology_relationship_table.GLIMS_ANABIO_concept_code.isin(['A0174', 'H6740', 'C8824'])][columns].drop_duplicates()

|   ANALYSES_LABORATOIRE_concept_code | GLIMS_ANABIO_concept_code   | GLIMS_LOINC_concept_code   | ITM_ANABIO_concept_code   | ITM_LOINC_concept_code   |
|------------------------------------:|:----------------------------|:---------------------------|:--------------------------|:-------------------------|
|                                   0 | C8824                       | 33256-9                    | Unknown                   | Unknown                  |
|                                   1 | A0174                       | 6690-2                     | A0174                     | 6690-2                   |
|                                   1 | A0174                       | 26464-8                    | A0174                     | 6690-2                   |


In [None]:
biology_relationship_table[biology_relationship_table.GLIMS_LOINC_concept_code.isin(['33256-9', '6690-2', '26464-8'])][columns].drop_duplicates()

|   ANALYSES_LABORATOIRE_concept_code | GLIMS_ANABIO_concept_code   | GLIMS_LOINC_concept_code   | ITM_ANABIO_concept_code   | ITM_LOINC_concept_code   |
|------------------------------------:|:----------------------------|:---------------------------|:--------------------------|:-------------------------|
|                                   4 | E4358                       | 6690-2                     | Unknown                   | Unknown                  |
|                                   2 | C9097                       | 26464-8                    | Unknown                   | Unknown                  |
|                                   6 | K3232                       | 6690-2                     | Unknown                   | Unknown                  |
|                                   5 | E6953                       | 26464-8                    | Unknown                   | Unknown                  |
|                                   1 | C8824                       | 33256-9                    | Unknown                   | Unknown                  |
|                                   4 | E4358                       | 26464-8                    | Unknown                   | Unknown                  |
|                                   5 | E6953                       | 6690-2                     | Unknown                   | Unknown                  |
|                                   7 | K6094                       | 6690-2                     | Unknown                   | Unknown                  |
|                                   0 | C9784                       | 6690-2                     | C9784                     | 6690-2                   |
|                                   0 | C9784                       | 26464-8                    | C9784                     | 6690-2                   |
|                                   3 | A0174                       | 6690-2                     | A0174                     | 6690-2                   |
|                                   3 | A0174                       | 26464-8                    | A0174                     | 6690-2                   |


### 2 - Concepts-Sets <a id="concepts-sets"></a>

To get all availables concepts sets see `datasets.default_concepts_sets`. More details about their definition and how they are build can be found in this [section](#concepts-sets).


In [None]:
from eds_scikit import datasets
from eds_scikit.biology import ConceptsSet

In [None]:
print(ConceptsSet("Glucose_Blood_Concentration").concept_codes)

In [None]:
datasets.default_concepts_sets

### 3 - Units <a id="units"></a>

Units module makes conversion between units easier. It uses configuration files `datasets.units` and `datasets.elements`.

In [None]:
from eds_scikit import datasets

In [None]:
from eds_scikit.biology import Units

In [None]:
units = Units()

print("L to ml : ", units.convert_unit("L", "ml"))
print("m/s to m/h : ", units.convert_unit("m/s", "m/h"))
print("g to mol : ", units.convert_unit("g", "mol"))
units.add_conversion("mol", "g", 180)
print("g to mol : ", units.convert_unit("g", "mol"))