# DRUG_EXPOSURE

See [drug_exposure](https://ohdsi.github.io/CommonDataModel/cdm54.html#drug_exposure):

*"This table captures records about the exposure to a Drug ingested or otherwise introduced into the body. A Drug is a biochemical substance formulated in such a way that when administered to a Person it will exert a certain biochemical effect on the metabolism. Drugs include prescription and over-the-counter medicines, vaccines, and large-molecule biologic therapies. Radiological devices ingested or applied locally do not count as Drugs."*

```{mermaid}
erDiagram
    OMOP_DRUG_EXPOSURE {
        integer drug_exposure_id
        integer person_id
        integer drug_concept_id
        date drug_exposure_start_date
        datetime drug_exposure_start_datetime
        date drug_exposure_end_date
        datetime drug_exposure_end_datetime
        date verbatim_end_date
        integer drug_type_concept_id
        varchar(20)  stop_reason 
        integer refills
        float quantity
        integer days_supply
        varchar(MAX) sig
        integer route_concept_id
        varchar(50) lot_number
        integer provider_id
        integer visit_occurrence_id
        integer visit_detail_id
        varchar(50) drug_source_value
        integer drug_source_concept_id
        varchar(50) route_source_value
        varchar(50) dose_unit_source_value
    }
```

The execution of the transformation is carried out by the file [genomop_drug_exposure.py](../examples/genomop_drug_exposure.py). 

This script performs the following steps:
1. Loads parameters
2. Loads vocabulary tables
   - In this case it also loads the CIMA codes to map the drug codes from the CIMA database. 
3. Loads each file
   1. Assign a new `vocabulary_id` column if needed. See `append_vocabulary` parameter.
   2. Rename the column containing the drug names or codes to `_source_value`. See `column_map` parameter.
   3. Maps source codes to a specific column in the CONCEPT table to retrieve the `source_concept_id`. See `vocabulary_config` parameter.
4. Creates a single table with all files.
5. Map source_concept_id codes to concept_id. 
6. Check for codes that were not mapped.
   - Parameter `unmapped_measurement` can be used to define additional mappings to be applied for measurement codes.
7. Creates primary key (`drug_exposure_id`)
8.  Finds any entries that are contained within a visit in the VISIT_OCCURRENCE table and assigns the corresponding `visit_occurrence_id`.
9.  Adapt the table to the schema of the DRUG_EXPOSURE table.
10. Saves the omop table to the defined output folder.
 

The configuration file will be [genomop_drug_exposure_params.yaml](../examples/genomop_drug_exposure_params.yaml). It must have the following structure:

```yaml
input_dir: rare/03_omop_initial/
output_dir: rare/04_omop_intermediate/DRUG_EXPOSURE/
input_files:
  - Hepatocarcinoma_drug_dispensation.parquet
  - Hepatocarcinoma_drug_medication.parquet
vocab_dir: raw/omop_vocab/
visit_dir: rare/04_omop_intermediate/VISIT_OCCURRENCE/
append_vocabulary:
  Hepatocarcinoma_drug_dispensation.parquet: CIMA
  Hepatocarcinoma_drug_medication.parquet: CIMA
column_map:
  Hepatocarcinoma_drug_dispensation.parquet:
    cima_code: drug_source_value
  Hepatocarcinoma_drug_medication.parquet:
    cima_code: drug_source_value
vocabulary_config:
  Hepatocarcinoma_drug_dispensation.parquet:
    CIMA: concept_code
  Hepatocarcinoma_drug_medication.parquet:
    CIMA: concept_code
unmapped_drug:
  "698083": 1501700
```

The parameters are:
- `input_dir` is the path from `data_dir` to the directory where input data is.
- `output_dir` is the path from `data_dir` to the directory where data will be saved to.
- `input_files` is the list of files, as paths from `data_dir / input_dir`, to be used.
- `vocab_dir` is the path from `data_dir` to the directory where the vocabulary tables (CONCEPT, CONCEPT_RELATIONSHIP, etc.) are.
- `visit_dir` is the path from `data_dir` to the directory where the VISIT_OCCURRENCE table is.
- `append_vocabulary` is a dict that defines, for each file in `input_files`, the name of the vocabulary to be added as a new uniform column.
  - i.e. a new column, `vocabulary_id`, will be added to the table with the provided value for every row.
  - This entry is not mandatory.
- `column_map` is a dict that defines, for each file in `input_files`, the column that will be renamed to `source_value` to perform the identification of codes in the CONCEPT table.
  - It can be used to rename any other column if needed.
- `vocabulary_config` is a dict that defines, for each file in `input_files`, how to perform the mapping from each vocabulary to the concept_table
  - Defines a map between each vocabulary present in the file (key) and what column in the CONCEPT table, i.e. concept_name or concept_code, should be used for mapping (value).
  - Each source_value has to be mapped to a concept in the CONCEPT table. Usually the source values are either descriptors (which map to concept_name), or codes (which map to concept_code)
- `unmapped_drug` is a dict that maps each **drug** source code (key) to the standard omop code (value). Can be used in case the source_value or source_concept_id columns do not have map in the CONCEPT_RELATIONSHIP table.


Important notes:
- The parameters `input_dir` and `output_dir` are defined in relation to the `data_dir` folder defined in the `.env` file. 
- Even though mapping from multiple vocabularies is allowed with the `vocabulary_config` parameter, the assignation of multiple `vocabulary_id` values cannot currently be performed here and has to be done in the `process_rare_files` stage.