# UMLS RxNorm Data Quick Pull 
This document will pull all RxNorm codes of the given generic or name brand medication that are available to UMLS and export them as an excel file. 

Before being able to use this script, you must create an account with UMLS. The website can be found [here](https://uts.nlm.nih.gov/uts/profile). After your account has been approved, navigate to your [profile](https://uts.nlm.nih.gov/uts/profile) and find your APIKey. This is required for this script to run.

This script is very thorough and will frequently return more codes than required, particularly if you are searching generic drugs. Some codes returned may not be directly relevant to the diagnosis. It is recommended that the analyst reviews the data returned for accuracy concerning their specific diagnosis.

## Notes, Limitations, and Recommendations   
- If you do not change the Excel_Sheet_Name variable and attempt to run the code again, the code will rewrite your existing file with the string_list variables
- It is recommended that the generic brand of drugs are included in the search to increase the quality of the data returned
- Depending on the length of the string_list variable, this script can take up to 20 minutes to run.  This is normal.  Leave the script running and come back to it later.  If the script's run time exceeds 30 minutes, try running two or more smaller queries (remember to change the Excel_Sheet_Name variable to avoid overwriting your previous output). 
- The RxNorm code for some name brands may not be returned, but their medications will be returned. 
    - Example Returns: 14 ACTUAT Arnuity 0.2 MG/ACTUAT Dry Powder Inhaler, but won't return the RxNorm code for just Arnuity
- It is recommended to include variations of the medication.
    - Ex: if you want to search for Advair, include "Advair", "Advair Diskus", "Advair HFA"

## Variable Descriptions
- "apikey" 
    - This must be inputted as a string.
    - Eample: apiKey = '123a4b56-7c8d-9d12-e3fg-4h5i67j89k0d'
    - You get this by navigating by accessing "My Profile" on this [UMLS website](https://uts.nlm.nih.gov/uts/profile)
- "string_list" 
    - This must be inputted as a list of strings. Below is an example of proper structure:
    - ["apple","orange","cat","puppy"]
- "Excel_Sheet_Name" 
    - Outputs each code into 5 columns: "Data Concept", "Data Subconcept", "Coding Standard", "Code Value", "Code Description" and filters for repeats between multiple parent inputs. This means if the two parents you are looking up are 'Pneumonitis (disorder)' and 'Pnuemonia (disorder)', all the overlaps between 'Pneumonitis (disorder)' and 'Pnuemonia (disorder)' will be removed from the final excel sheet.  

## Common Errors
- No current common errors have been reported. If you run into an error with this script, please contact me at alyssa.warnock@amida.com and I will do my best to help troubleshoot any issues with the code.  
    
## Steps to Use: 
1. Make sure you have an account active with [UMLS](https://uts.nlm.nih.gov/uts/umls/home)
2. Change the apikey string to reflect your personalized api key from your UMLS profile page
3. Change the string_list list to reflect the data you want to pull.
    - **IMPORTANT:** Do not change variable names and keep the data in the string list format
4. Change the Excel_Sheet_Name string to reflect the name you want the outputted Excel Sheet to be named. 
5. Run sheet using the double triangle selection tool in the toolbar, located between and below the "Kernel" and "Widgets"
6. Click "restart and run all cells"     

In [2]:
## CHANGE INPUTS HERE ##
apikey = 'YOUR API KEY HERE'
string_list = ["Prednisone Intensol", "Adrenalin", "Ventolin HFA", "Xopenex", "Accolate", "Advair Diskus", "Advair", "Aerospan HFA",
               "Alvesco", "Asmanex Twisthaler", "Breo Ellipta", "Cinqair", "Dulera", "Dupixent", "Fasenra", "Flovent HFA", 
               "Flovent", "Nucala", "Pulmicort Flexhaler", "QVAR RediHaler", "Serevent Diskus", "Serevent", "Singulair", "Spiriva Respimat", 
               "Symbicort", "Trelegy Ellipta", "Xolair", "Zyflo", "Rayos", "Auvi-Q", "Proventil HFA", "Proventil", "Xopenex HFA", "Xopenex", "Advair HFA", 
               "Asmanex HFA", "Asmanex", "Flovent Diskus", "Pulmicort Respules", "Zyflo CR", "Medrol", "Epipen 2-Pak", "Epipen", "Proair HFA", "Proair",
               "Xopenex Concentrate", "Millipred", "EpiPen Jr 2-Pak", "ProAir RespiClick", "Orapred", "Symjepi", "ODT", 
               "AirDuo RespiClick", "AirDuo", "Pediapred", "Wixela Inhub", "ArmonAi RespiClick", "ArmonAi", "Arnuit Ellipta", "Prednisone", "Epinephrine", "Albuterol", "Levalbuterol", "Zafirlukast", "Fluticasone", "Flunisolide", "Ciclesonide", 
               "Reslizumab", "Mometasone", "Dupilumab", "Benralizumab", "Fluticasone", "Mepolizumab", "Budesonide", "Beclomethasone", 
               "Montelukast", "Tiotropium", "Budesonide", "Fluticasone Furoate", "Omalizumab", "Zileuton", "methylprednisolone", 
               "prednisolone", "Salmeterol", "Formoterol", "Umeclidinium", "Vilanterol"]
Excel_Sheet_Name = "Asthma RxNorm Sheet"

In [1]:
## DO NOT CHANGE CODE BELOW THIS LINE ## 
import requests 
import argparse
import numpy as np
import pandas as pd
version = 'current'

In [5]:
# Collect Data Pulled 
ui_code_RxNorm = []
rootSource_RxNorm = []
name_RxNorm = []

for x in np.arange(0, len(string_list), 1):
    value = string_list[x]
    URL = f"https://uts-ws.nlm.nih.gov/rest/search/current?apiKey={apikey}&string={value}&sabs=RXNORM&returnidType=code&pageSize=2000"
    response = requests.get(URL)
    variable = response.json()
    
    if 'result' in variable:
        # Pull ui code
        for y in np.arange(0, len(variable['result']['results']), 1):
            ui_code_RxNorm.append(variable['result']['results'][y]['ui'])

        # Pull rootSource code
        for y in np.arange(0, len(variable['result']['results']), 1):
            rootSource_RxNorm.append(variable['result']['results'][y]['rootSource'])

        # Pull RxNorm name code
        for y in np.arange(0, len(variable['result']['results']), 1):
            name_RxNorm.append(variable['result']['results'][y]['name'])
    else: 
        continue

RxNorm_pd = pd.DataFrame({"Data Concept": "RxNorm Code", "Data Sub-Concept": "N/A", "Coding Standard": rootSource_RxNorm, "Code Value": ui_code_RxNorm, "Code Description": name_RxNorm}).drop_duplicates().reset_index(drop=True)
RxNorm_pd

Unnamed: 0,Data Concept,Data Sub-Concept,Coding Standard,Code Value,Code Description
0,RxNorm Code,,MTH,C1963284,Adrenalin
1,RxNorm Code,,RXNORM,C3820021,Adrenalin Injectable Product
2,RxNorm Code,,MTH,C3714560,epinephrine 1 MG/ML Injectable Solution [Adren...
3,RxNorm Code,,MTH,C3888403,1 ML epinephrine 1 MG/ML Injection [Adrenalin]
4,RxNorm Code,,RXNORM,C4047020,epinephrine Injection [Adrenalin]
...,...,...,...,...,...
1611,RxNorm Code,,RXNORM,C3709479,7 ACTUAT umeclidinium 0.0625 MG/ACTUAT / vilan...
1612,RxNorm Code,,RXNORM,C3709487,30 ACTUAT umeclidinium 0.0625 MG/ACTUAT / vila...
1613,RxNorm Code,,RXNORM,C2935023,vilanterol
1614,RxNorm Code,,RXNORM,C3644419,vilanterol trifenatate


In [7]:
# Converts the SNOMED-CT CUI Codes from the chunk above into SNOMEDCT_US Codes
base_uri = 'https://uts-ws.nlm.nih.gov'
cui_list = RxNorm_pd["Code Value"]

sabs = 'RXNORM'
RXNORM_name = []
RXNORM_code = []
RXNORM_root = []

for cui in cui_list:
        page = 0
        
        # o.write('SEARCH CUI: ' + cui + '\n' + '\n')
        
        while True:
            page += 1
            path = '/search/'+version
            query = {'apiKey':apikey, 'string':cui, 'sabs':sabs, 'returnIdType':'code', 'pageNumber':page}
            output = requests.get(base_uri+path, params=query)
            output.encoding = 'utf-8'
            #print(output.url)
        
            outputJson = output.json()

            results = (([outputJson['result']])[0])['results']
            
            if len(results) == 0:
                if page == 1:
                    #print('No results found for ' + cui +'\n')
                    # o.write('No results found.' + '\n' + '\n')
                    break
                else:
                    break
                    
            for item in results:
                RXNORM_code.append(item['ui'])
                RXNORM_name.append(item['name'])
                RXNORM_root.append(item['rootSource'])
        else: 
            continue
                
RXNORM_trans_df = pd.DataFrame({"Data Concept": "RxNorm Code", "Data Sub-Concept": "N/A", "Coding Standard": RXNORM_root, "Code Value": RXNORM_code, "Code Description": RXNORM_name})

In [8]:
# RXNORM_trans_df

Unnamed: 0,Data Concept,Data Sub-Concept,Coding Standard,Code Value,Code Description
0,RxNorm Code,,RXNORM,1490053,Adrenalin
1,RxNorm Code,,RXNORM,1490056,Adrenalin Injectable Product
2,RxNorm Code,,RXNORM,1490057,epinephrine 1 MG/ML Injectable Solution [Adren...
3,RxNorm Code,,RXNORM,1660016,1 ML epinephrine 1 MG/ML Injection [Adrenalin]
4,RxNorm Code,,RXNORM,1660015,epinephrine Injection [Adrenalin]
...,...,...,...,...,...
1611,RxNorm Code,,RXNORM,1487519,7 ACTUAT umeclidinium 0.0625 MG/ACTUAT / vilan...
1612,RxNorm Code,,RXNORM,1487527,30 ACTUAT umeclidinium 0.0625 MG/ACTUAT / vila...
1613,RxNorm Code,,RXNORM,1424884,vilanterol
1614,RxNorm Code,,RXNORM,1424883,vilanterol trifenatate


In [9]:
excel_name = f'{Excel_Sheet_Name}' + ".xlsx"

RXNORM_trans_df.to_excel(excel_name)