# LOCATION

See [location](https://ohdsi.github.io/CommonDataModel/cdm54.html#location). The OMOP PERSON table has the fields shown in the following diagram:

```{mermaid}
erDiagram
    OMOP_PERSON {
        integer location_id
        varchar(50) address_1
        varchar(50) address_2
        varchar(50) city
        varchar(2) state
        varchar(9) zip 
        varchar(20) county
        varchar(50) location_source_value
        integer country_concept_id
        varchar(80) country_source_value
        float latitude
        float longitude
    }
```


The LOCATION table serves as a way to store the physical location of patients of care sites. Its contents can be referenced by the PERSON table and CARE_SITE table.

The script that handles the transformation from the original file to OMOP format is [genomop_location.py](../src/genomop_location.py). Basically it reads the table, retrieves an unique list of locations and fills the rest of the columns. 

To define it we create a yaml file [genomop_location_params.yaml](../src/genomop_location_params.yaml), which has the following structure:

```YAML
input_dir: rare/03_omop_initial/
output_dir: rare/04_omop_intermediate/LOCATION/
input_files:
  - Hepatocarcinoma_Sociodemo.parquet
column_name_map:
  Hepatocarcinoma_Sociodemo.parquet:
    COD_DISTRITO: location_id
    DESC_DISTRITO: location_source_value
column_values_map:
  Hepatocarcinoma_Sociodemo.parquet:
    source_col: 
      source_value_1: destination_value_1
      source_value_2: destination_value_2
constant_values:
  Hepatocarcinoma_Sociodemo.parquet:
    country_concept_id: 42020824
    country_source_value: Spain
```

The neccesary params are:
- `input_dir` is the path from `data_dir` to the directory where input data is.
- `output_dir` is the path from `data_dir` to the directory where data will be saved to.
- `input_files` is the list of files to be used.
- `column_name_map` is a dictionary that defines, for each file in `input_list`, the relation from original names to their omop counterparts. i.e. original_names: omop_name. This parameter controls which columns will change names during processing.
  - There has to be at least one mapping from a source column to `location_id`. If not, the stage will fail.
  - If no suitable column exist in the source data, one should be created before running this stage.
- `column_values_map` is an optional dictionary that defines, for each file in `input_list`, the mapping between values inside each column to new ones. 
  - This is a helper parameter to perform some small changes before transforming to OMOP.
- `constant_values` is an optional dictionary that defines, for each file in `input_list`, some constant columns to be added to the file.
  - This is a helper parameter to add constant values to every record.

### Add dvc stage

The relevant files are [genomop_location.py](../src/genomop_location.py) and [genomop_location_params.yaml](../src/genomop_location_params.yaml).
 
To add the stage append the following entry to the [dvc.yaml](../dvc.yaml) file:

```yaml
  genomop_location:
    cmd: pixi run python3 src/genomop_location.py
    deps:
    - src/genomop_location.py
    - external/bps_to_omop/bps_to_omop/location.py
    - external/bps_to_omop/bps_to_omop/omop_schemas.py
    - external/bps_to_omop/bps_to_omop/utils/extract.py
    - external/bps_to_omop/bps_to_omop/utils/format_to_omop.py
    - external/bps_to_omop/bps_to_omop/utils/map_to_omop.py
    - ${repo_data_dir}rare/03_omop_initial
    params:
    - src/genomop_location_params.yaml:
    outs:
    - ${repo_data_dir}rare/04_omop_intermediate/LOCATION/
```