# UMLS SNOMEDCT_US Quick Pull 
This document will pull all the SNOMEDCT_US given parent concept(s), the parent concept(s) children, and export them as an excel file.

Before being able to use this script, you must create an account with UMLS. The website can be found [here](https://uts.nlm.nih.gov/uts/profile). After your account has been approved, navigate to your [profile](https://uts.nlm.nih.gov/uts/profile) and find your APIKey. This is required for this script to run.

This script is very thorough and will frequently return more codes than required. Some codes returned may not be directly relevant to the diagnosis. It is recommended that the analyst reviews the data returned for accuracy concerning their specific diagnosis.

## Notes and Limitations  
- If you do not change the Excel_Sheet_Name variable and attempt to run the code again, the code will rewrite your existing file with the string_list variables
- In order to ensure that all desired CPT codes are returned the type of procedure associated with the diagnosis must be included in the string_list variable search. 

## Variable Descriptions
- "apikey" 
    - This must be inputted as a string.
    - example: apiKey = '123a4b56-7c8d-9d12-e3fg-4h5i67j89k0d'
    - You get this by navigating by accessing "My Profile" on this [UMLS website](https://uts.nlm.nih.gov/uts/profile)
- "string_list" 
    - This must be inputted as a list of strings. Below is an example of proper structure:
    - ["apple","orange","cat","puppy"]
- "Excel_Sheet_Name" 
    - Outputs each code into 5 columns: "Data Concept", "Data Subconcept", "Coding Standard", "Code Value", "Code Description" and filters for repeats between multiple parent inputs. This means if the two parents you are looking up are 'Pneumonitis (disorder)' and 'Pnuemonia (disorder)', all the overlaps between 'Pneumonitis (disorder)' and 'Pnuemonia (disorder)' will be removed from the final excel sheet.  

## Common Errors
- No current common errors have been reported. If you run into an error with this script, please contact me at alyssa.warnock@amida.com and I will do my best to help troubleshoot any issues with the code.  
    
## Steps to Use: 
1. Make sure you have an account active with [UMLS](https://uts.nlm.nih.gov/uts/umls/home)
2. Change the apikey string to reflect your personalized api key from your UMLS profile page
3. Change the string_list list to reflect the data you want to pull.
    - **IMPORTANT:** Do not change variable names and keep the data in the string list format
4. Change the Excel_Sheet_Name string to reflect the name you want the outputted Excel Sheet to be named. 
5. Run sheet using the double triangle selection tool in the toolbar, located between and below the "Kernel" and "Widgets"
6. Click "restart and run all cells"       

In [None]:
## CHANGE INPUTS HERE ##
apikey = 'YOUR API KEY HERE'
string_list = ["Asthma"]
Excel_Sheet_Name = "Asthma SNOMED-CT Sheet"

In [None]:
## DO NOT CHANGE BELOW THIS LINE ##
import requests 
import argparse
import numpy as np
import pandas as pd
version = 'current'

In [None]:
# Keep in mind this pulls the CUI code for SNOMEDCT from UMLS
# You will need to convert these CUI codes from UMLS codes into their associated SNOMEDCT, ICD10, LNC, CPT, etc codes

code = []
name = [] 
vocab_type = []

for x in np.arange(0, len(string_list),1):
    string = str(string_list[x])
    uri = "https://uts-ws.nlm.nih.gov"
    content_endpoint = "/rest/search/"+version
    full_url = uri+content_endpoint
    page = 0

    try:
        while True:
            page += 1
            query = {'string':string,'apiKey':apikey, 'pageNumber':page}
            query['includeObsolete'] = 'true'
            #query['includeSuppressible'] = 'true'
            #query['returnIdType'] = "sourceConcept"
            query['sabs'] = "SNOMEDCT_US"
            r = requests.get(full_url,params=query)
            r.raise_for_status()
            r.encoding = 'utf-8'
            outputs  = r.json()

            items = (([outputs['result']])[0])['results']

            if len(items) == 0:
                if page == 1:
                    #print('No results found.'+'\n')
                    break
                else:
                    break

            #print("Results for page " + str(page)+"\n")

            for result in items:
                code.append(result['ui'])
                name.append(result['name'])
                vocab_type.append(result['rootSource'])

    except Exception as except_error:
        print(except_error)
        
snomed_df = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": vocab_type, "Code Value": code, "Code Description": name})

In [None]:
# Converts the SNOMED-CT CUI Codes from the chunk above into SNOMEDCT_US Codes
base_uri = 'https://uts-ws.nlm.nih.gov'
cui_list = snomed_df["Code Value"]

sabs = 'SNOMEDCT_US'
SNOMEDCT_name = []
SNOMEDCT_code = []
SNOMEDCT_root = []

for cui in cui_list:
        page = 0
        
        # o.write('SEARCH CUI: ' + cui + '\n' + '\n')
        
        while True:
            page += 1
            path = '/search/'+version
            query = {'apiKey':apikey, 'string':cui, 'sabs':sabs, 'returnIdType':'code', 'pageNumber':page}
            output = requests.get(base_uri+path, params=query)
            output.encoding = 'utf-8'
            #print(output.url)
        
            outputJson = output.json()
        
            results = (([outputJson['result']])[0])['results']
            
            if len(results) == 0:
                if page == 1:
                    #print('No results found for ' + cui +'\n')
                    # o.write('No results found.' + '\n' + '\n')
                    break
                else:
                    break
                    
            for item in results:
                SNOMEDCT_code.append(item['ui'])
                SNOMEDCT_name.append(item['name'])
                SNOMEDCT_root.append(item['rootSource'])

snomed_trans_df = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": SNOMEDCT_root, "Code Value": SNOMEDCT_code, "Code Description": SNOMEDCT_name})

In [None]:
# Get children of SNOMED-CT 
# This works, it just takes a while to run
# If you want more responses, change 'children' to 'descendants' this will run a lot slower, but will return more SNOMEDCT_US values

decend_names = []
decend_values = []
decend_root = []

for x in np.arange(0,len(SNOMEDCT_code),1):
    source = 'SNOMEDCT_US'
    identifier = str(SNOMEDCT_code[x])
    operation = 'children'
    uri = "https://uts-ws.nlm.nih.gov"
    content_endpoint = "/rest/content/"+version+"/source/"+source+"/"+identifier+"/"+operation

    pageNumber=0

    try:
        while True:
            pageNumber += 1
            query = {'apiKey':apikey,'pageNumber':pageNumber}
            r = requests.get(uri+content_endpoint,params=query)
            r.encoding = 'utf-8'
            items  = r.json()

            if r.status_code != 200:
                if pageNumber == 1:
                    # print('No results found.'+'\n')
                    break
                else:
                    break

            # print("Results for page " + str(pageNumber)+"\n")

            for result in items["result"]:
                decend_values.append(result["ui"])
                decend_names.append(result["name"])
                decend_root.append(result["rootSource"])

    except Exception as except_error:
        print(except_error)
        
SNOMED_decend = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": decend_root, "Code Value": decend_values, "Code Description": decend_names})
SNOMED_trans_decend = pd.concat([SNOMED_decend, snomed_trans_df.loc[:]]).drop_duplicates().reset_index(drop=True)

In [None]:
excel_name = f'{Excel_Sheet_Name}' + ".xlsx"

SNOMED_trans_decend.to_excel(excel_name)