# What is "concept ID"?

In the AIREADI dataset, OMOP concept IDs are assigned to various variables such as HbA1c, visual acuity, and answers to questionnaires for standardization purposes. For more details, see the OMOP Clinical Data Structure documentation: https://docs.aireadi.org/docs/1/dataset/clinical-data/OMOP-Clinical-Data-Structure/. Finding concept IDs is essential when searching for specific values.

To explore which values are included, you can refer to the OMOP Mapping Table for clinical data (https://docs.aireadi.org/v1-omopTable) or the Data Domain Table for clinical lab tests (https://docs.aireadi.org/v1-dataDomainTable).

# How to find concept IDs?

To identify concept IDs, you can use files such as condition_occurrence.csv, measurement.csv, and observation.csv within the clinical_data section of the dataset.

Here are example codes of finding concept IDs.

In [None]:
import os
import stat
from tqdm import tqdm
import time
import pandas as pd

In [7]:
# custom path -- change to match your file structure
data_root =  r'path/to/your/clinical_data/'  # change this to your own path

In [27]:
# Load clinical data from TSV and CSV files
measurement_df = pd.read_csv(data_root + 'measurement.csv')
condition_occurrence_df = pd.read_csv(data_root + 'condition_occurrence.csv')
observation_df = pd.read_csv(data_root + 'observation.csv')

In [25]:
# Find concept IDs in measurement.csv
measurement_unique_values = measurement_df['measurement_source_value'].unique()
measurement_sorted_list = sorted(measurement_unique_values)
for value in measurement_sorted_list:
    concept_id = measurement_df['measurement_concept_id'].get(measurement_df['measurement_source_value']==value).iloc[0]
    print(f"'{value}': {concept_id},")

'A/G Ratio': 4288601,
'ALT (IU/L)': 3006923,
'AST (IU/L)': 3013721,
'Albumin (g/dL)': 3024561,
'Alkaline Phosphatase (IU/L)': 3035995,
'BUN (mg/dL)': 3013682,
'BUN/Creatinine ratio': 4112223,
'Bilirubin, Total (mg/dL)': 3024128,
'C-Peptide (ng/mL)': 3010084,
'CRP - HS (mg/L)': 3010156,
'Calcium (mg/dL)': 3006906,
'Carbon Dioxide, Total (mEq/L)': 3015632,
'Chloride (mEq/L)': 3014576,
'Creatinine (mg/dL)': 3016723,
'Globulin, Total (g/dL)': 3021886,
'Glucose (mg/dL)': 3004501,
'HDL Cholesterol (mg/dL)': 3007070,
'HbA1c (%)': 3004410,
'INSULIN (ng/mL)': 3016244,
'LDL Cholesterol Calculation (mg/dL)': 3028288,
'NT-proBNP (pg/mL)': 3029187,
'Potassium (mEq/L)': 3023103,
'Protein, Total (g/dL)': 3020630,
'Sodium (mEq/L)': 3019550,
'Total Cholesterol (mg/dL)': 3027114,
'Triglycerides (mg/dL)': 3022192,
'Troponin-T (ng/L)': 40769783,
'Urine Albumin (mg/dL)': 3001802,
'Urine Creatinine (mg/dL)': 3017250,
'bmi_vsorres, BMI': 4245997,
'bp1_diabp_vsorres, Diastolic (mmHg)': 3012888,
'bp1_sysbp_vso

In [59]:
# Find concept IDs in condition_occurrence.csv and observation.csv
if "condition_concept_id" in condition_occurrence_df.columns:
    condition_matches = condition_occurrence_df["condition_concept_id"].isin(observation_df["qualifier_concept_id"])
    matching_observations = observation_df[observation_df["qualifier_concept_id"].isin(condition_occurrence_df["condition_concept_id"])]

    used_rows = observation_df["qualifier_concept_id"].isin(matching_observations["qualifier_concept_id"])
    observation_remaining = observation_df[~used_rows]


In [77]:
condition_unique_values = matching_observations['observation_source_value'].unique()
condition_sorted_list = sorted(condition_unique_values)
for value in condition_sorted_list:
    concept_id = matching_observations['qualifier_concept_id'].get(matching_observations['observation_source_value']==value).iloc[0]
    print(f"'{value}': {concept_id},")

'mh_a1c, Elevated A1C levels (elevated blood sugar': 2005200547,
'mhoccur_ad, Dementia (Examples: Alzheimer's Disea': 4182210,
'mhoccur_amd, Age-related macular degeneration (AM': 374028,
'mhoccur_ca, Cancer (any type)': 4194405,
'mhoccur_circ, Circulation problems (Examples: art': 2005200015,
'mhoccur_clsh, High blood cholesterol': 4159131,
'mhoccur_cns, Other neurological conditions': 46271045,
'mhoccur_cogn, Mild cognitive impairment (known as': 439795,
'mhoccur_crt, Cataracts (in one or both eyes)': 4317977,
'mhoccur_cvdot, Other heart issues (Examples: pace': 2005200627,
'mhoccur_ded, Dry eye (in one or both eyes)': 4036620,
'mhoccur_ear, Hearing impairment': 439378,
'mhoccur_gi, Digestive problems (Examples: stomach': 4201745,
'mhoccur_glc, Glaucoma (in one or both eyes)': 437541,
'mhoccur_hbp, High blood pressure': 316866,
'mhoccur_lbp, Low blood pressure': 317002,
'mhoccur_mi, Heart attack': 4329847,
'mhoccur_ms, Multiple sclerosis': 374919,
'mhoccur_oa, Osteoporosis': 80502,
'

In [75]:
observation_unique_values = observation_remaining['observation_source_value'].unique()
observation_sorted_list = sorted(observation_unique_values)
for value in observation_sorted_list:
    concept_id = observation_remaining['observation_concept_id'].get(observation_remaining['observation_source_value']==value).iloc[0]
    print(f"'{value}': {concept_id},")

'age_years_at_interview': 2005200369,
'brthyy, Year (e.g. 1967)': 4078999,
'c184390_dat, Date paper consent completed': 4160030,
'cage, Age (in years)': 4265453,
'ces1, I was bothered by things that usually don't': 1761605,
'ces10, I could not "get going"': 1761073,
'ces2, I had trouble keeping my mind on what I was': 1761791,
'ces3, I felt depressed': 1761895,
'ces4, I felt that everything I did was an effort': 1761503,
'ces5, I felt hopeful about the future': 1761675,
'ces6, I felt fearful': 1761739,
'ces7, My sleep was restless': 1761695,
'ces8, I was happy': 1761700,
'ces9, I felt lonely': 1761319,
'cesmpdat, CES-D-10 survey date': 4160030,
'cesstartts, CES-D-10 Survey Started Timestamp (fr': 2005200558,
'cestl, CESD-10 Score': 1761347,
'cl_maristat, Marital Status': 4053609,
'clock_visuospatial_executive': 21492218,
'cm_act, Have you taken acetaminophen medicines, s': 2005200146,
'cm_ant, Have you taken antihistamines, such as co': 2005200148,
'cm_asp, Have you taken aspirin in th