# PROVIDER

See [provider](https://ohdsi.github.io/CommonDataModel/cdm54.html#provider). This table stores information about the personnel who provide health system services. In the absence of personal identifiers, it can be used to store info about generic providers specialties, which is what we want. The fields in the table are as follows:

```{mermaid}
erDiagram
    OMOP_PROVIDER {
        integer provider_id
        varchar(255) provider_name
        varchar(20) npi
        varchar(20) dea
        integer specialty_concept_id
        integer care_site_id
        integer year_of_birth
        integer gender_concept_id
        varchar(50) provider_source_value
        varchar(50) specialty_source_value
        integer specialty_source_concept_id
        varchar(50) gender_source_value
        integer gender_source_concept_id
    }
```

This table will help us to further specify the type of visit in the VISIT_OCCURRENCE table. It must be created before VISIT_OCCURRENCE.

Usually, we are going to use a file with a list of visits. This file has a column that refers to the specialty of the visit. In this case we do not want every row of the file, only the unique values of the specialty column, which we will use to build the PROVIDER table.

The script that handles the transformation from the original file to the OMOP format is [genomop_provider.py](../examples/genomop_provider.py). 

The script performs the following actions:
1. Load parameters
2. Iterate over the files (See parameter `input_files`).
   1. Read the table
   2. Rename specific columns to fit the omop standard (See parameter `column_name_map`)
      - At least one column in `column_name_map` parameter has to map to `specialty_source_value`.
   3. Retrieve unique values on `specialty_source_value`.
   4. Iterate over unique values:
      - Save a row of the PROVIDER table, defining `provider_id`, `specialty_source_value` and `specialty_concept_id`.
      - Specialty mapping are defined by the parameter `column_values_map`.
3. Check for unmapped values
4. Fills the rest of the columns.
5. Save the file.

The parameters file will be [genomop_provider_params.yaml](../examples/genomop_provider_params.yaml). The parameters file must have the following structure:


```yaml
input_dir: preomop/03_omop_initial/
output_dir: preomop/04_omop_intermediate/PROVIDER/
input_files:
  - 05_Consultas_externas.parquet
column_name_map:
  05_Consultas_externas.parquet:
    ESPECIALIDAD: specialty_source_value
column_values_map:
  05_Consultas_externas.parquet:
    specialty_source_value:
      ginecologia: 38003902
      aparato digestivo: 38004455
```

The neccesary params are:
- `input_dir` is the path from `data_dir` to the directory where input data is.
- `output_dir` is the path from `data_dir` to the directory where data will be saved to.
- `input_files` is the list of files, as paths from `data_dir / input_dir`, to be used.
- `column_name_map` is a dictionary that defines, for each file in `input_files`, the relation from original names to their omop counterparts. i.e. original_names: omop_name.
- `column_values_map` is a dictionary, for each file in `input_files`, and, for each **new omop column**, the mapping from original values to omop standard codes.

The parameters `input_dir` and `output_dir` are defined in relation to the `data_dir` folder defined in the `.env` file.