# UMLS Data Quick Pull 
This document will pull all the SNOMEDCT_US, ICD10, CPT, and LOINC code of the given concept(s) and export them as an excel file. Note that this **does not** pull descendants or children of the concepts given.  This will run more efficiently, but it may not have all calls that you are looking for.  I recommend taking a look at the original document if you want an even more detailed data pull (it just takes a while to compile).

Before being able to use this script, you must create an account with UMLS. The website can be found [here](https://uts.nlm.nih.gov/uts/profile). After your account has been approved, navigate to your [profile](https://uts.nlm.nih.gov/uts/profile) and find your APIKey. This is required for this script to run.

This script is very thorough and will frequently return more codes than required. Some codes returned may not be directly relevant to the diagnosis. It is recommended that the analyst reviews the data returned for accuracy concerning their specific diagnosis.

## Notes and Limitations  
- If you do not change the Excel_Sheet_Name variable and attempt to run the code again, the code will rewrite your existing file with the string_list variables
- In order to ensure that all desired CPT codes are returned the type of procedure associated with the diagnosis must be included in the string_list variable search. 

## Variable Descriptions
- "apikey" 
    - This must be inputted as a string.
    - example: apiKey = '123a4b56-7c8d-9d12-e3fg-4h5i67j89k0d'
    - You get this by navigating by accessing "My Profile" on this [UMLS website](https://uts.nlm.nih.gov/uts/profile)
- "string_list" 
    - This must be inputted as a list of strings. Below is an example of proper structure:
    - ["apple","orange","cat","puppy"]
- "Excel_Sheet_Name" 
    - Outputs each code into 5 columns: "Data Concept", "Data Subconcept", "Coding Standard", "Code Value", "Code Description" and filters for repeats between multiple parent inputs. This means if the two parents you are looking up are 'Pneumonitis (disorder)' and 'Pnuemonia (disorder)', all the overlaps between 'Pneumonitis (disorder)' and 'Pnuemonia (disorder)' will be removed from the final excel sheet.  

## Common Errors
- No current common errors have been reported. If you run into an error with this script, please contact me at alyssa.warnock@amida.com and I will do my best to help troubleshoot any issues with the code.  
    
## Steps to Use: 
1. Make sure you have an account active with [UMLS](https://uts.nlm.nih.gov/uts/umls/home)
2. Change the apikey string to reflect your personalized api key from your UMLS profile page
3. Change the string_list list to reflect the data you want to pull.
    - **IMPORTANT:** Do not change variable names and keep the data in the string list format
4. Change the Excel_Sheet_Name string to reflect the name you want the outputted Excel Sheet to be named. 
5. Run sheet using the double triangle selection tool in the toolbar, located between and below the "Kernel" and "Widgets"
6. Click "restart and run all cells"       

In [1]:
## CHANGE INPUTS HERE ##
apikey = 'YOUR API KEY HERE'
string_list = ["Asthma", "with Asthma"]
Excel_Sheet_Name = "Asthma Sheet"

In [2]:
# Code Structure Outline 
# SNOMED-CT UMLS CUI 
# SNOMED-CT UMLS CUI:SNOMEDCT Code Transformation
# SNOMED-CT Descendants 

# ICD10 UMLS CUI 
# ICD10 UMLS CUI:SNOMEDCT Code Transformation
# ICD10 Descendants

# CPT UMLS CUI 
# CPT UMLS CUI:SNOMEDCT Code Transformation
# CPT Descendants

# LNC UMLS CUI 
# LNC UMLS CUI:SNOMEDCT Code Transformation
# LNC Descendants

In [3]:
## DO NOT CHANGE BELOW THIS LINE ##
import requests 
import argparse
import numpy as np
import pandas as pd
version = 'current'

names = []

for x in np.arange(0, len(string_list),1):
    list_item = string_list[x]
    names.append(list_item.replace(" ", "_"))

In [4]:
# Keep in mind this pulls the CUI code for SNOMEDCT from UMLS
# You will need to convert these CUI codes from UMLS codes into their associated SNOMEDCT, ICD10, LNC, CPT, etc codes

code = []
name = [] 
vocab_type = []

for x in np.arange(0, len(string_list),1):
    string = str(string_list[x])
    uri = "https://uts-ws.nlm.nih.gov"
    content_endpoint = "/rest/search/"+version
    full_url = uri+content_endpoint
    page = 0

    try:
        while True:
            page += 1
            query = {'string':string,'apiKey':apikey, 'pageNumber':page}
            query['includeObsolete'] = 'true'
            #query['includeSuppressible'] = 'true'
            #query['returnIdType'] = "sourceConcept"
            query['sabs'] = "SNOMEDCT_US"
            r = requests.get(full_url,params=query)
            r.raise_for_status()
            r.encoding = 'utf-8'
            outputs  = r.json()

            items = (([outputs['result']])[0])['results']

            if len(items) == 0:
                if page == 1:
                    #print('No results found.'+'\n')
                    break
                else:
                    break

            #print("Results for page " + str(page)+"\n")

            for result in items:
                code.append(result['ui'])
                name.append(result['name'])
                vocab_type.append(result['rootSource'])

    except Exception as except_error:
        print(except_error)
        
snomed_df = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": vocab_type, "Code Value": code, "Code Description": name})

In [5]:
# Converts the SNOMED-CT CUI Codes from the chunk above into SNOMEDCT_US Codes
base_uri = 'https://uts-ws.nlm.nih.gov'
cui_list = snomed_df["Code Value"]

sabs = 'SNOMEDCT_US'
SNOMEDCT_name = []
SNOMEDCT_code = []
SNOMEDCT_root = []

for cui in cui_list:
        page = 0
        
        # o.write('SEARCH CUI: ' + cui + '\n' + '\n')
        
        while True:
            page += 1
            path = '/search/'+version
            query = {'apiKey':apikey, 'string':cui, 'sabs':sabs, 'returnIdType':'code', 'pageNumber':page}
            output = requests.get(base_uri+path, params=query)
            output.encoding = 'utf-8'
            #print(output.url)
        
            outputJson = output.json()
        
            results = (([outputJson['result']])[0])['results']
            
            if len(results) == 0:
                if page == 1:
                    #print('No results found for ' + cui +'\n')
                    # o.write('No results found.' + '\n' + '\n')
                    break
                else:
                    break
                    
            for item in results:
                SNOMEDCT_code.append(item['ui'])
                SNOMEDCT_name.append(item['name'])
                SNOMEDCT_root.append(item['rootSource'])

snomed_trans_df = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": SNOMEDCT_root, "Code Value": SNOMEDCT_code, "Code Description": SNOMEDCT_name})

In [6]:
# Get ICD10 Codes from another API source
# This is because UMLS is lacking alone in their ICD10 call

clin_table_ICD10_code = []
clin_table_ICD10_name = []

for x in np.arange(0, len(names),1):
    value = names[x]
    URL = f"https://clinicaltables.nlm.nih.gov/api/icd10cm/v3/search?sf=code,name&terms={value}&maxList=500"
    response = requests.get(URL)
    variable = response.json()
    
    for y in np.arange(0, len(variable[3]),1):
        clin_table_ICD10_code.append(variable[3][y][0])
    
    for y in np.arange(0, len(variable[3]),1):
        clin_table_ICD10_name.append(variable[3][y][1])
        
clin_table_test_pd = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": "ICD10", "Code Value": clin_table_ICD10_code, "Code Description": clin_table_ICD10_name}).drop_duplicates().reset_index(drop=True)

In [7]:
# Keep in mind this pulls the CUI code for UMLS
# You will need to convert these CUI codes from UMLS codes into their associated SNOMEDCT, ICD10, LNC, CPT, etc codes

code_2 = []
name_2 = [] 
vocab_type_2 = []

for x in np.arange(0, len(string_list),1):
    string = str(string_list[x])
    uri = "https://uts-ws.nlm.nih.gov"
    content_endpoint = "/rest/search/"+version
    full_url = uri+content_endpoint
    page = 0

    try:
        while True:
            page += 1
            query = {'string':string,'apiKey':apikey, 'pageNumber':page}
            query['includeObsolete'] = 'true'
            #query['includeSuppressible'] = 'true'
            #query['returnIdType'] = "sourceConcept"
            query['sabs'] = "ICD10"
            r = requests.get(full_url,params=query)
            r.raise_for_status()
            r.encoding = 'utf-8'
            outputs  = r.json()

            items = (([outputs['result']])[0])['results']

            if len(items) == 0:
                if page == 1:
                    #print('No results found.'+'\n')
                    break
                else:
                    break

            #print("Results for page " + str(page)+"\n")

            for result in items:
                code_2.append(result['ui'])
                name_2.append(result['name'])
                vocab_type_2.append(result['rootSource'])

    except Exception as except_error:
        print(except_error)
        
icd_df = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": vocab_type_2, "Code Value": code_2, "Code Description": name_2})

In [8]:
# Converts the ICD10 CUI Codes from the chunk above into ICD10 Codes
base_uri = 'https://uts-ws.nlm.nih.gov'
cui_list = icd_df["Code Value"]

sabs = 'ICD10'
ICD10_name = []
ICD10_code = []
ICD10_root = []

for cui in cui_list:
        page = 0
        
        # o.write('SEARCH CUI: ' + cui + '\n' + '\n')
        
        while True:
            page += 1
            path = '/search/'+version
            query = {'apiKey':apikey, 'string':cui, 'sabs':sabs, 'returnIdType':'code', 'pageNumber':page}
            output = requests.get(base_uri+path, params=query)
            output.encoding = 'utf-8'
            #print(output.url)
        
            outputJson = output.json()
        
            results = (([outputJson['result']])[0])['results']
            
            if len(results) == 0:
                if page == 1:
                    #print('No results found for ' + cui +'\n')
                    # o.write('No results found.' + '\n' + '\n')
                    break
                else:
                    break
                    
            for item in results:
                ICD10_code.append(item['ui'])
                ICD10_name.append(item['name'])
                ICD10_root.append(item['rootSource'])

icd10_trans_df = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": ICD10_root, "Code Value": ICD10_code, "Code Description": ICD10_name})

In [9]:
# Combine the ICD10 pull from a different API and merge it with the UMLS pull 
ICD10_full = pd.concat([clin_table_test_pd, icd10_trans_df.loc[:]]).drop_duplicates().reset_index(drop=True)
ICD10_full

Unnamed: 0,Data Concept,Data Sub-Concept,Coding Standard,Code Value,Code Description
0,Diagnosis Code,,ICD10,J45.998,Other asthma
1,Diagnosis Code,,ICD10,J82.83,Eosinophilic asthma
2,Diagnosis Code,,ICD10,J45.909,"Unspecified asthma, uncomplicated"
3,Diagnosis Code,,ICD10,J45.991,Cough variant asthma
4,Diagnosis Code,,ICD10,J45.20,"Mild intermittent asthma, uncomplicated"
5,Diagnosis Code,,ICD10,J45.30,"Mild persistent asthma, uncomplicated"
6,Diagnosis Code,,ICD10,J45.40,"Moderate persistent asthma, uncomplicated"
7,Diagnosis Code,,ICD10,J45.50,"Severe persistent asthma, uncomplicated"
8,Diagnosis Code,,ICD10,J45.901,Unspecified asthma with (acute) exacerbation
9,Diagnosis Code,,ICD10,J45.902,Unspecified asthma with status asthmaticus


In [10]:
# Combine the SNOMEDCT_US and ICD10 Transformed and Decendents DataFrames
SNOMEDCT_ICD10_trans_decend = pd.concat([snomed_trans_df, ICD10_full.loc[:]]).drop_duplicates().reset_index(drop=True)

In [11]:
# Keep in mind this pulls the CUI code for UMLS
# You will need to convert these CUI codes from UMLS codes into their associated SNOMEDCT, ICD10, LNC, CPT, etc codes

code_3 = []
name_3 = [] 
vocab_type_3 = []

for x in np.arange(0, len(string_list),1):
    string = str(string_list[x])
    uri = "https://uts-ws.nlm.nih.gov"
    content_endpoint = "/rest/search/"+version
    full_url = uri+content_endpoint
    page = 0

    try:
        while True:
            page += 1
            query = {'string':string,'apiKey':apikey, 'pageNumber':page}
            query['includeObsolete'] = 'true'
            #query['includeSuppressible'] = 'true'
            #query['returnIdType'] = "sourceConcept"
            query['sabs'] = "CPT"
            r = requests.get(full_url,params=query)
            r.raise_for_status()
            r.encoding = 'utf-8'
            outputs  = r.json()

            items = (([outputs['result']])[0])['results']

            if len(items) == 0:
                if page == 1:
                    #print('No results found.'+'\n')
                    break
                else:
                    break

            #print("Results for page " + str(page)+"\n")

            for result in items:
                code_3.append(result['ui'])
                name_3.append(result['name'])
                vocab_type_3.append(result['rootSource'])

    except Exception as except_error:
        print(except_error)
        
cpt_df = pd.DataFrame({"Data Concept": "Procedure Code", "Data Sub-Concept": "N/A", "Coding Standard": vocab_type_3, "Code Value": code_3, "Code Description": name_3})

In [12]:
# Converts the CPT CUI Codes from the chunk above into CPT Codes
base_uri = 'https://uts-ws.nlm.nih.gov'
cui_list = cpt_df["Code Value"]

sabs = 'CPT'
CPT_name = []
CPT_code = []
CPT_root = []

for cui in cui_list:
        page = 0
        
        # o.write('SEARCH CUI: ' + cui + '\n' + '\n')
        
        while True:
            page += 1
            path = '/search/'+version
            query = {'apiKey':apikey, 'string':cui, 'sabs':sabs, 'returnIdType':'code', 'pageNumber':page}
            output = requests.get(base_uri+path, params=query)
            output.encoding = 'utf-8'
            #print(output.url)
        
            outputJson = output.json()
        
            results = (([outputJson['result']])[0])['results']
            
            if len(results) == 0:
                if page == 1:
                    #print('No results found for ' + cui +'\n')
                    # o.write('No results found.' + '\n' + '\n')
                    break
                else:
                    break
                    
            for item in results:
                CPT_code.append(item['ui'])
                CPT_name.append(item['name'])
                CPT_root.append(item['rootSource'])

CPT_trans_df = pd.DataFrame({"Data Concept": "Procedure Code", "Data Sub-Concept": "N/A", "Coding Standard": CPT_root, "Code Value": CPT_code, "Code Description": CPT_name})

In [13]:
# Keep in mind this pulls the CUI code for UMLS
# You will need to convert these CUI codes from UMLS codes into their associated SNOMEDCT, ICD10, LNC, CPT, etc codes

code_4 = []
name_4 = [] 
vocab_type_4 = []

for x in np.arange(0, len(string_list),1):
    string = str(string_list[x])
    uri = "https://uts-ws.nlm.nih.gov"
    content_endpoint = "/rest/search/"+version
    full_url = uri+content_endpoint
    page = 0

    try:
        while True:
            page += 1
            query = {'string':string,'apiKey':apikey, 'pageNumber':page}
            query['includeObsolete'] = 'true'
            #query['includeSuppressible'] = 'true'
            #query['returnIdType'] = "sourceConcept"
            query['sabs'] = "LNC"
            r = requests.get(full_url,params=query)
            r.raise_for_status()
            r.encoding = 'utf-8'
            outputs  = r.json()

            items = (([outputs['result']])[0])['results']

            if len(items) == 0:
                if page == 1:
                    #print('No results found.'+'\n')
                    break
                else:
                    break

            #print("Results for page " + str(page)+"\n")

            for result in items:
                code_4.append(result['ui'])
                name_4.append(result['name'])
                vocab_type_4.append(result['rootSource'])

    except Exception as except_error:
        print(except_error)
        
loinc_df = pd.DataFrame({"Data Concept": "Observation Code", "Data Sub-Concept": "N/A", "Coding Standard": vocab_type_4, "Code Value": code_4, "Code Description": name_4})

In [14]:
# Converts the LOINC CUI codes from the chunk above into LOINC

base_uri = 'https://uts-ws.nlm.nih.gov'
cui_list = loinc_df["Code Value"]

sabs = 'LNC'
LOINC_name = []
LOINC_code = []
LOINC_root = []

for cui in cui_list:
        page = 0
        
        # o.write('SEARCH CUI: ' + cui + '\n' + '\n')
        
        while True:
            page += 1
            path = '/search/'+version
            query = {'apiKey':apikey, 'string':cui, 'sabs':sabs, 'returnIdType':'code', 'pageNumber':page}
            output = requests.get(base_uri+path, params=query)
            output.encoding = 'utf-8'
            #print(output.url)
        
            outputJson = output.json()
        
            results = (([outputJson['result']])[0])['results']
            
            if len(results) == 0:
                if page == 1:
                    #print('No results found for ' + cui +'\n')
                    # o.write('No results found.' + '\n' + '\n')
                    break
                else:
                    break
                    
            for item in results:
                LOINC_code.append(item['ui'])
                LOINC_name.append(item['name'])
                LOINC_root.append(item['rootSource'])

loinc_trans_df = pd.DataFrame({"Data Concept": "Observation Code", "Data Sub-Concept": "N/A", "Coding Standard": LOINC_root, "Code Value": LOINC_code, "Code Description": LOINC_name})

In [15]:
# Combine the CPT and LOINC Transformed and Decendents DataFrames
CPT_LOINC_trans_decend = pd.concat([loinc_trans_df, CPT_trans_df.loc[:]]).drop_duplicates().reset_index(drop=True)

In [16]:
# Combine SNOMEDCT, ICD10, CPT, and LOINC Transformed and Decendents DataFrames Together 
SNOMEDCT_ICD10_CPT_LOINC_trans_df = pd.concat([CPT_LOINC_trans_decend, SNOMEDCT_ICD10_trans_decend.loc[:]]).drop_duplicates().reset_index(drop=True)

In [25]:
## Uncomment to view dataframe
# SNOMEDCT_ICD10_CPT_LOINC_trans_df

Unnamed: 0,Data Concept,Data Sub-Concept,Coding Standard,Code Value,Code Description
182,Procedure Code,,CPT,2016F,Asthma risk assessed (Asthma)
183,Procedure Code,,CPT,2015F,Asthma impairment assessed (Asthma)
184,Procedure Code,,CPT,5250F,Asthma discharge plan provided to patient (Ast...
185,Procedure Code,,CPT,1038F,"Persistent asthma (mild, moderate or severe) (..."
186,Procedure Code,,CPT,1039F,Intermittent asthma (Asthma)
187,Procedure Code,,CPT,1005F,Asthma symptoms evaluated (includes documentat...
188,Procedure Code,,CPT,4140F,Inhaled corticosteroids prescribed (Asthma)
189,Procedure Code,,CPT,4144F,Alternative long-term control medication presc...
190,Procedure Code,,CPT,4000F,"Tobacco use cessation intervention, counseling..."
191,Procedure Code,,CPT,1032F,Current tobacco smoker or currently exposed to...


In [None]:
excel_name = f'{Excel_Sheet_Name}' + ".xlsx"

SNOMEDCT_ICD10_CPT_LOINC_trans_decend.to_excel(excel_name)