# ERDRI CDS
The "Set of common data elements for Rare Diseases Registration" is the first practical instrument released by the EU RD Platform aiming at increasing interoperability of RD registries.

It contains 16 data elements to be registered by each rare disease registry across Europe, which are considered to be essential for further research. They refer to patient's personal data, diagnosis, disease history and care pathway, information for research purposes and about disability.

The "Set of common data elements for Rare Diseases Registration" was produced by a Working Group coordinated by the JRC and composed of experts from EU projects which worked on common data sets: EUCERD Joint Action, EPIRARE and RD-Connect.

[Source](https://eu-rd-platform.jrc.ec.europa.eu/set-of-common-data-elements_en)

## 1. Defining the ERDRI CDS Data Model
Instead of loading in a data model from a file, we can also create a data model by defining it in code. This skips the parsing step and allows us to define the data model in a more flexible way.

![ERDRI CDS](../res/imgs/notebooks/erdri_cds1.png)
![ERDRI CDS](../res/imgs/notebooks/erdri_cds2.png)

## 1.1. Resources

Every data model in the modern digital health environment will probably rely on at least one or more code systems. Code systems might include terminologies, classifications, or ontologies. Popular code systems include LOINC, SNOMED CT, ICD-10, and Orphanet Rare Disease Ontology (ORDO).

In this case, we will use the following code systems (in order as listed in the CODING column of the ERDRI CDS data model above):
- [ORDO](https://www.orpha.net/en/disease): Orphanet Rare Disease Ontology
- [Alpha-ID-SE](https://www.bfarm.de/EN/Code-systems/Terminologies/Alpha-ID-SE/_node.html): Simplified, uniform and standardised coding of rare diseases according to ICD-10-GM
- [ICD-9](https://iris.who.int/handle/10665/39473): International Classification of Diseases, Ninth Revision
- [ICD-9-CM](https://archive.cdc.gov/www_cdc_gov/nchs/icd/icd9cm.htm#:~:text=ICD%2D9%2DCM%20is%20the,10%20for%20mortality%20coding%20started.): International Classification of Diseases, Ninth Revision, Clinical Modification
- [ICD-10](https://www.who.int/classifications/icd/en/): International Classification of Diseases, Tenth Revision
- [HGVS](https://hgvs.org/): Human Genome Variation Society
- [HGNC](https://www.genenames.org/): HUGO Gene Nomenclature Committee
- [OMIM](https://www.omim.org/): Online Mendelian Inheritance in Man
- [HPO](https://hpo.jax.org/): Human Phenotype Ontology


There are already some popular code systems predefined in the `phenopacket-mapper` package. We can use them directly in our data model. Although optional, it is recommended to add the correct or most recent version to each code system. This will help to ensure that the code system is correctly identified and used in the data model.

The versions added below are the most recent at the time of writing this notebook. You can check for the most recent version of each code system on the respective website.

In [ ]:
from phenopacket_mapper.data_standards.code_system import ORDO, ICD9, HGVS, HGNC, OMIM, HPO 

In [None]:
resources = [
    ORDO.set_version("1.0.19 (2024-08-02)"),
    ICD9,
    HGVS.set_version("21.0.4 (2024-08-15)"),
    HGNC.set_version("2024-08-23"),
    OMIM.set_version("2024-09-12"),
    HPO.set_version("2024-06-07")
]

The keen eyed among you might have spotted that we forgot to add the ICD-9-CM, ICD-10, and Alpha-ID-SE code systems. We can add them to the list of resources by creating a new `CodeSystem` object for each of them. We can then add them to the `resources` list.:

In [ ]:
from phenopacket_mapper.data_standards.code_system import CodeSystem

In [ ]:
alpha = CodeSystem(name='Alpha-ID-SE', namespace_prefix='alpha', url='https://www.bfarm.de/EN/Code-systems/Terminologies/Alpha-ID-SE/_node.html')
icd9cm = CodeSystem(name='International Classification of Diseases 9 Clinical Modification (USA)', namespace_prefix='icd-9-cm', url='http://hl7.org/fhir/sid/icd-9-cm')
icd10 = CodeSystem(name='International Classification of Diseases 10 (WHO)', namespace_prefix='icd-10', url='http://hl7.org/fhir/sid/icd-10')

In [1]:
resources.append(alpha)
resources.append(icd9cm)
resources.append(icd10)

NameError: name 'resources' is not defined

## 1.2 Fields of the data model and their value sets

As you can see above, the ERDRI CDS are made up of 8 sections with a total of 16 fields. Each field has a name, a description, and a value set. The value set is a list of possible values that the field can take, such as a list of codings, strings or numerical values to choose from. Another option is to restrict values in a field to a possible data type (e.g., string), code system (e.g., ORDO), a date, etc.

However, to limit the complexity of this example, we will omit the 7th and 8th sections, as they are also difficult to model in phenopackets. We will focus on the first 6 sections, containing 11 fields.

We will start here by defining the value sets of all the fields in the ERDRI CDS data model. We will use the code systems we defined above to define the value sets of the fields.

In [1]:
from phenopacket_mapper.data_standards.value_set import ValueSet

We will start by defining the value set for the first field of the ERDRI CDS data model, which is the pseudonym. The pseudonym is a string, so we will define a value set with the element type set to `str`. We will also add a name and description to the value set to make it easier to identify later on.

In [ ]:
# 1. Pseudonym
# 1.1. Pseudonym
vs_1_1 = ValueSet(
    elements=[str],
    name="Value set for 1.1. Pseudonym",
    description="Value set for field 1.1. Pseudonym of the ERDRI CDS data model in section 1. Pseudonym",
)

Next, we get to the section about personal information. Here the first field is the date of birth. The date of birth is a date, so we will define a value set with the element type set to the `Date` type provided by `phenopacket-mapper`.

In [ ]:
from phenopacket_mapper.data_standards import Date

In [ ]:
# 2. Personal information
# 2.1. Date of Birth
vs_2_1 = ValueSet(
    elements=[Date],
    name="Value set for 2.1. Date of Birth",
    description="Value set for field 2.1. Date of Birth of the ERDRI CDS data model in section 2. Personal information",
)

The next field: "Sex" is a categorical field with four possible values: 
- Female,
- Male,
- Undetermined, and
- Foetus (Unknown). 

It is to be noted that it is generally recommended to encode concepts with an internationally recognized code system. However, in this case the authors of the ERDRI CDS data model have not specified a code system for the field, making things easier for us. 

The same can be said for field 3.1.

At the bottom of this notebook there is an examplary implementation of this field using concepts from the SNOEMD CT code system. The example also introduces you to `CodeableConcept` objects.

In [ ]:
# 2.2. Sex
vs_2_2 = ValueSet(
    elements=["Female", "Male", "Undetermined", "Foetus (Unknown)"],
    name="Value set for 2.2. Sex",
    description="Value set for field 2.2. Sex of the ERDRI CDS data model in section 2. Personal information",
)
# 3. Patient Status
# 3.1. Patient's status
vs_3_1 = ValueSet(
    elements=["Alive", "Dead", "Lost in follow-up", "Opted-out"],
    name="Value set for 3.1. Patient's status",
    description="Value set for field 3.1. Patient's status of the ERDRI CDS data model in section 3. Patient Status",
)

The field date of death relies on a date value, so we will define a value set with the element type set to the `Date` type provided by `phenopacket-mapper`.

In [ ]:
# 3.2. Date of death
vs_3_2 = ValueSet(
    elements=[Date],
    name="Value set for 3.2. Date of death",
    description="Value set for field 3.2. Date of death of the ERDRI CDS data model in section 3. Patient Status",
)

Note that we could have also just created a single value set for the date fields and reused it for all the date fields. This would have been more efficient and would have reduced the amount of code we had to write. However, for the sake of clarity, we have defined separate value sets for each field.
let's try this for the next field.

In [ ]:
# 4. Care Pathway
# 4.1. First contact with specialised centre
vs_4_1 = ValueSet(
    elements=[Date],
    name="Date value set",
    description="Value set for date fields",
)

To implement the next field age at onset, we can use the `extend` method of the `ValueSet` class to create a new value set based on the dat value set but expanded by the values required by 5.1.:
- Antenatal
- At birth
- Undetermined

The `extend` function returns an expanded copy of the original value set with the new elements added. The original value set remains unchanged.

We can reuse this value set for the next field, age at diagnosis.

In [ ]:
# 5. Disease history
# 5.1. Age at onset
vs_5_1 = vs_4_1.extend(
    new_name="Onset value set",
    value_set=ValueSet(["Antenatal", "At birth", "Undetermined"])
)
# 5.2. Age at diagnosis
vs_5_2 = vs_5_1

The fields in the following section diagnosis are all defined by using codings from code systems. We will define the value sets for these fields using the list of resources we defined above.

In [ ]:
# 6. Diagnosis
# 6.1. Diagnosis of the rare disease
vs_6_1 = ValueSet(
    elements=[ORDO, ICD9, icd9cm, alpha, icd10],
    name="Value set for 6.1. Diagnosis of the rare disease",
    description="Value set for field 6.1. Diagnosis of the rare disease of the ERDRI CDS data model in section 6. Diagnosis",
)
# 6.2. Genetic diagnosis
vs_6_2 = ValueSet(
    elements=[HGVS, HGNC, OMIM],
    name="Value set for 6.2. Genetic diagnosis",
    description="Value set for field 6.2. Genetic diagnosis of the ERDRI CDS data model in section 6. Diagnosis",
)
# 6.3. Undiagnosed case
vs_6_3 = ValueSet(
    elements=[HPO, HGVS],
    name="Value set for 6.3. Undiagnosed case",
    description="Value set for field 6.3. Undiagnosed case of the ERDRI CDS data model in section 6. Diagnosis",
)

## Define the `DataModel` object
After defining value sets for each field in the ERDRI CDS data model, we can now define the `DataModel` object. The `DataModel` object is the main object that represents the data model. 

In [ ]:
# 1. Pseudonym
# 1.1. Pseudonym

# 2. Personal information
# 2.1. Date of Birth
# 2.2. Sex

# 3. Patient Status
# 3.1. Patient's status
# 3.2. Date of death

# 4. Care Pathway
# 4.1. First contact with specialised centre

# 5. Disease history
# 5.1. Age at onset
# 5.2. Age at diagnosis

# 6. Diagnosis
# 6.1. Diagnosis of the rare disease
# 6.2. Genetic diagnosis
# 6.3. Undiagnosed case

#### Continuance of the discussion raised by field 2.2. above

For the sake of completeness, we could use SNOMED Clinical Terms (SNOMED CT) to encode this variable using a value set containg:
- [SNOMED:248152002](https://browser.ihtsdotools.org/?perspective=full&conceptId1=248152002&edition=MAIN/2024-09-01&release=&languages=en) Female
- [SNOMED:248153007](https://browser.ihtsdotools.org/?perspective=full&conceptId1=248153007&edition=MAIN/2024-09-01&release=&languages=en) Male
- [SNOMED:373068000, SNOMED: 734000001](https://browser.ihtsdotools.org/?perspective=full&conceptId1=404684003&edition=MAIN/2024-09-01&release=&languages=en) Undetermined biological sex
- [SNOMED:303112003, SNOMED:373068000, SNOMED: 734000001](https://browser.ihtsdotools.org/?perspective=full&conceptId1=303112003&edition=MAIN/2024-09-01&release=&languages=en) Fetal period, biological sex unknown

To include multiple codings as a single value, one can use a `CodeableConcept` (`phenopacket_mapper.data_standards.code.CodeableConcept`) object. This object can contain multiple codings, each with a different code system if so wanted.

E.g.:

In [3]:
from phenopacket_mapper.data_standards import Coding, CodeableConcept
from phenopacket_mapper.data_standards.code_system import SNOMED_CT

sct_coding_female = Coding(code="248152002", system=SNOMED_CT, display="Female (finding)")
sct_coding_male = Coding(code="248153007", system=SNOMED_CT, display="Male (finding)")
sct_coding_undetermined = Coding(code="373068000", system=SNOMED_CT, display="Undetermined (qualifier value)")
sct_coding_bio_sex = Coding(code="734000001", system=SNOMED_CT, display="Biological sex (property) (qualifier value)")
sct_coding_foetus = Coding(code="303112003", system=SNOMED_CT, display="Fetal period (qualifier value)")

cc_und_bio_sex = CodeableConcept(coding=[sct_coding_undetermined, sct_coding_bio_sex], text="Undetermined biological sex")
cc_foetus = CodeableConcept(coding=[sct_coding_foetus, sct_coding_undetermined, sct_coding_bio_sex], text="Fetal period, biological sex unknown")

vs_bio_sex = ValueSet(elements=[sct_coding_female, sct_coding_male, cc_und_bio_sex, cc_foetus], name="", description="")