# CONDITION_OCCURRENCE

See [condition_occurrence](https://ohdsi.github.io/CommonDataModel/cdm54.html#condition_occurence). 

This table records events that suggest the presence of a diagnosis, symptom, or other sign of disease. It can be reported by the patient or by a professional.

```{mermaid}
erDiagram
    OMOP_CONDITION_OCCURRENCE {
        integer condition_occurence_id
        integer person_id
        integer condition_concept_id
        date condition_start_date
        datetime condition_start_datetime
        date condition_end_date
        datetime condition_end_datetime
        integer condition_type_concept_id
        integer condition_staus_concept_id
        varchar(20) stop_reason
        integer provider_id
        integer visit_occurence_id
        integer visit_detail_id
        varchar(50) condition_source_value
        integer condition_source_concept_id
        varchar(50) condition_status_source_value
    }
```

The execution of the stage is performed by the file [genomop_condition_occurrence.py](../example/genomop_condition_occurrence.py). 

This script performs the following steps:
1. Loads parameters
2. Loads vocabulary tables
3. Loads each file
   1. Assign a new `vocabulary_id` column if needed. See `append_vocabulary` parameter.
   2. Rename the column containing the conditions names or codes to `source_value`. See `column_map` parameter.
   3. Maps source codes to a specific column in the CONCEPT table to retrieve the `source_concept_id`. See `vocabulary_config` parameter.
4. Creates a single table with all files.
5. Maps the `source_concept_id` to the corresponding standard `concept_id`.
6. Creates the primary key of the table: `condition_occurrence_id`.
7. Finds any entries that are contained within a visit in the VISIT_OCCURRENCE table and assigns the corresponding `visit_occurrence_id`.
8. Adapt the table to the schema of the CONDITION_OCCURRENCE table.
9. Saves the omop table to the defined output folder.

The configuration file will be [genomop_condition_occurrence_params.yaml](../example/genomop_condition_occurrence_params.yaml). It must have the following structure:

```yaml
input_dir: preomop/03_omop_initial/
output_dir: preomop/04_omop_intermediate/CONDITION_OCCURRENCE/
input_files:
  - 02_Patologias_BPS.parquet
vocab_dir: raw/omop_vocab/
visit_dir: preomop/04_omop_intermediate/VISIT_OCCURRENCE/
fallback_vocabs:
  ICD10CM: concept_code
  ICD9CM: concept_code
append_vocabulary:
  02_Patologias_BPS.parquet: BPS Pathology
column_map:
  02_Patologias_BPS.parquet:
    desc_patologia: source_value
vocabulary_config:
  02_Patologias_BPS.parquet: 
    BPS Pathology: concept_name
```

The parameters are:
- `input_dir` is the path from `data_dir` to the directory where input data is.
- `output_dir` is the path from `data_dir` to the directory where data will be saved to.
- `input_files` is the list of files, as paths from `data_dir / input_dir`, to be used.
- `vocab_dir` is the path from `data_dir` to the directory where the vocabulary tables (CONCEPT, CONCEPT_RELATIONSHIP, etc.) are.
- `visit_dir` is the path from `data_dir` to the directory where the VISIT_OCCURRENCE table is.
- `fallback_vocabs` is an **optional** parameter used in case there are source values that could not be mapped. It contains a dict where keys are vocabularies and values are:
  - `concept_code` to map to concept_codes within the vocabulary.
  - `concept_name` to map to concept_names within the vocabulary.
- `append_vocabulary` is a dict that defines, for each file in `input_files`, the name of the vocabulary to be added as a new uniform column.
  - i.e. a new column, `vocabulary_id`, will be added to the table with the provided value for every row.
  - This entry is not mandatory.
- `column_map` is a dict that defines, for each file in `input_files`, the column that will be renamed to `source_value` to perform the identification of codes in the CONCEPT table.
  - It can be used to rename any other column if needed.
- `vocabulary_config` is a dict that defines, for each file in `input_files`, how to perform the mapping from each vocabulary to the concept_table
  - Defines a map between each vocabulary present in the file (key) and what column in the CONCEPT table, i.e. concept_name or concept_code, should be used for mapping (value).
  - Each source_value has to be mapped to a concept in the CONCEPT table. Usually the source values are either descriptors (which map to concept_name), or codes (which map to concept_code)

Important notes:
- The parameters `input_dir` and `output_dir` are defined in relation to the `data_dir` folder defined in the `.env` file. 
- Even though mapping from multiple vocabularies is allowed with the `vocabulary_config` parameter, the assignation of multiple `vocabulary_id` values cannot currently be performed here and has to be done in the `process_rare_files` stage.