# ERDRI CDS
The "Set of common data elements for Rare Diseases Registration" is the first practical instrument released by the EU RD Platform aiming at increasing interoperability of RD registries.

It contains 16 data elements to be registered by each rare disease registry across Europe, which are considered to be essential for further research. They refer to patient's personal data, diagnosis, disease history and care pathway, information for research purposes and about disability.

The "Set of common data elements for Rare Diseases Registration" was produced by a Working Group coordinated by the JRC and composed of experts from EU projects which worked on common data sets: EUCERD Joint Action, EPIRARE and RD-Connect.

[Source](https://eu-rd-platform.jrc.ec.europa.eu/set-of-common-data-elements_en)

## 1. Defining the ERDRI CDS Data Model
To create a data model definition using this package, it can be of use to define the data model first in a tabular format such as csv or excel. We have transcribed the first six sections of the ERDRI CDS into an excel file. Take a look:

In [5]:
import pandas as pd

from pathlib import Path

import rarelink_phenopacket_mapper as rlpm

In [6]:
erdri_cds_excel_path = Path('../res/test_data/erdri/erdri_cds.xlsx')
erdri_cds_tabular = pd.read_excel(erdri_cds_excel_path)

erdri_cds_tabular.head(15)

Unnamed: 0,section,name,description,data_type,required,specification
0,1. Pseudonym,1.1. Pseudonym,Patient's pseudonym,string,True,
1,2. Personal information,2.1. Date of Birth,Patient's date of birth,date,True,dd/mm/yy
2,2. Personal information,2.2. Sex,Patient's sex at birth,string,True,"Female, Male, Undetermined, Foetus (Unknown)"
3,3. Patient Status,3.1. Patient's status,Patient alive or dead,,True,"Alive, Dead, Lost in follow-up, Opted-out"
4,3. Patient Status,3.2. Date of death,Patient's date of death,date,True,dd/mm/yy
5,4. Care Pathway,4.1. First contact with specialised centre,Date of first contact with specialised centre,date,True,dd/mm/yy
6,5. Disease history,5.1. Age at onset,Age at which symptoms/signs first appeared,"string, date",True,"Antenatal, At birth, Date (dd/mm/yyyy), Undete..."
7,5. Disease history,5.2. Age at diagnosis,Age at which diagnosis was made,,True,"Antenatal, At birth, Date (dd/mm/yyyy), Undete..."
8,6. Diagnosis,6.1. Diagnosis of the rare disease,Diagnosis retained by the specialised centre,"orpha code, alpha code, icd-9, icd-9-cm, icd-10",True,Orpha code (strongly\nrecommended – see link) ...
9,6. Diagnosis,6.2. Genetic diagnosis,Genetic diagnosis retained by\nthe specialised...,"hgvs code, hgnc code, omim code",True,International classification of\nmutations (HG...


### Data Model Definition
Now we can import this tabular data model definition into the package and create a data model definition object.

We start by defining a dictionary that holds the names of the fields of the `DataField` class as keys and maps them onto columns of the file we want to import our data model from. Conveniently, we have named the columns the same as the fields, which is recommended but not necessary.

We pass a path to the data model tabular definition, its file type and the `column_names` dictionary onto the `rlpm.pipeline.read_data_model` method.

In [7]:
column_names = {
    'name': 'name',
    'section': 'section',
    'description': 'description',
    'data_type': 'data_type',
    'required': 'required',
    'specification': 'specification',
    'ordinal': ''
}

erdri_cds_data_model = rlpm.pipeline.read_data_model(erdri_cds_excel_path, file_type='excel', column_names=column_names)

print(erdri_cds_data_model)

                    section                                        name  \
0              1. Pseudonym                              1.1. Pseudonym   
1   2. Personal information                          2.1. Date of Birth   
2   2. Personal information                                    2.2. Sex   
3         3. Patient Status                       3.1. Patient's status   
4         3. Patient Status                          3.2. Date of death   
5           4. Care Pathway  4.1. First contact with specialised centre   
6        5. Disease history                           5.1. Age at onset   
7        5. Disease history                       5.2. Age at diagnosis   
8              6. Diagnosis          6.1. Diagnosis of the rare disease   
9              6. Diagnosis                      6.2. Genetic diagnosis   
10             6. Diagnosis                       6.3. Undiagnosed case   

                                          description  \
0                                Patient's