
# Processed Data Model and Value Sets for REDCap
This notebook defines the data model and value sets based on the provided data dictionary for use in REDCap.
We have processed the variable names, field labels, SNOMED codes, and other metadata from the data dictionary.

### Steps:
1. **Loading the Data Dictionary**.
2. **Extracting the Data Model** (Variable names, Field types, SNOMED codes).
3. **Defining Value Sets** (Choices or predefined values based on SNOMED codes).
4. **Exporting to REDCap format**.


In [None]:

import pandas as pd

# Load the provided data dictionary
file_path = '/mnt/data/RareLink_v2.0_DataDictionary - GenAdipositasALTDemo_DataDictionary_2024-09-05 (17).csv'
data_dictionary = pd.read_csv(file_path)

# Display first few rows to understand structure
data_dictionary.head()



### Extracting the Data Model
Here we extract relevant columns such as `Variable / Field Name`, `Field Label`, and `SNOMED codes` to define the data model. This helps in creating a clear mapping between the fields and their respective annotations.


In [None]:

# Extract relevant columns for the data model
data_model = data_dictionary[['Variable / Field Name', 'Field Label', 'Field Type', 'Field Note', 'Field Annotation']]

# Filter for rows with SNOMED codes in the annotation
data_model_with_snomed = data_model[data_model['Field Annotation'].notnull()]

# Display the data model with SNOMED codes
data_model_with_snomed.head()



### Defining Value Sets
Value sets are predefined lists of possible values for certain fields. We'll identify the fields that have associated SNOMED codes or other predefined sets, and format them for use in REDCap.


In [None]:

# Extract value set fields where annotations mention SNOMED codes
value_sets = data_model_with_snomed[['Variable / Field Name', 'Field Label', 'Field Annotation']]

# Function to extract SNOMED and other annotations
def extract_snomed_details(annotation):
    parts = annotation.split('|')
    if len(parts) > 1:
        return {
            'SNOMED Code': parts[0].replace('Variable:\nSNOMED:', '').strip(),
            'Description': parts[1].strip(),
        }
    return {'SNOMED Code': '', 'Description': ''}

# Apply extraction function
value_sets['SNOMED Mapping'] = value_sets['Field Annotation'].apply(
    lambda x: extract_snomed_details(x)['SNOMED Code'] if pd.notnull(x) else 'No SNOMED mapping'
)
value_sets['Description'] = value_sets['Field Annotation'].apply(
    lambda x: extract_snomed_details(x)['Description'] if pd.notnull(x) else ''
)

# Display the value sets with SNOMED mappings
value_sets.head()



### Exporting Data Model and Value Sets to REDCap Format
Finally, we export the processed data model and value sets into a format that can be imported back into REDCap.


In [None]:

# Export the processed data model with SNOMED mappings
export_path = '/mnt/data/processed_data_model_with_snomed.csv'
value_sets.to_csv(export_path, index=False)
export_path
