# Note
It is not recommended to use this script. SNOWSTORM has a limited number of calls it allows each day.  It is recommended you use the UMLS API search.

# SNOMED-CT and ICD-10 Diagnosis Quick Pull 
This document will pull all the descendants of a given parent concept(s), search all results from a given search_name, pull all existing ICD-10 codes from the SNOMED-CT codes gathered, and export them as an excel file.

This script is very thorough and will frequently return more codes than required.  Some codes returned may not be directly relevant to the diagnosis.  It is recommended that the analyst reviews the data returned for accuracy concerning their specific diagnosis.

The ICD-10 codes returned will have have N/A as a name listed next to them.  It is recommended that the returned ICD-10 codes are manually checked for relevance and updated with their official ICD-10 names. 

Additional Note: This works well for small pulls.  Pulls with outputs larger than 500, tend to timeout.  It is recommended to use the UMLS Script for pulls larger than 500 rows. 

## Variable Explanations ##
- parent_code
    - This must be inputted as a Python dictionary (JavaScript object).  Below is an example of proper structure: 
    - {"Name of SNOMED-CT Value 1": 123456, "Name of SNOMED-CT Value 2": 456789123, "Name of SNOMED-CT Value 3": 123789}
    - This variable is responsible for finding the descendants of the parent concepts.  The values placed here will return all descendents of the pairs that you put in the dictionary.  Repeats will be deleted. 

- Excel_Sheet_Name
    - This must be inputted as a string.  A string is required to have quotations around the text: "Apple", "Orange", and "Fruit" are examples of strings. 
    - This variable is responsible for naming the final exported Excel Sheet.
    - Note that the excel sheet will be exported wherever this python file is saved.  If you do not know where this file is saved on your computer, use your system's search bar to locate the file by the name that you assigned the variable Excel_Sheet_Name

- search_name
    - This must be inputted as a string. 
    - This variable is responsible for returning all SNOMED-CT search results from that name. 
    - For example, if you went to the SNOMED-CT Browser and searched "Asthma", you may get "Asthma (disorder)", "Asthma annual review (regime/therapy)", "Asthma control step 5 (procedure)".  "Asthma annual review (regime/therapy)" and "Asthma control step 5 (procedure)" would not be included in the data collected just from the parent_code variable.  This variable allows  
    - SNOMED-CT Browser Link: (https://browser.ihtsdotools.org/?perspective=full&conceptId1=78862003&edition=MAIN/2022-08-31&release=&languages=en&latestRedirect=false) 
 

## Notes and Limitations  
- If you do not change the Excel_Sheet_Name variable and attempt to run the code again, the code will rewrite your existing file with the parent_code variables
- This will return more codes than required.  Some codes returned may not be directly relevant to the diagnosis.  It is recommended that the analyst throughly reviews the data returned.
- The ICD-10 codes returned will have have N/A as a name listed next to them.  It is recommended that the returned ICD-10 codes are manually checked for relevance and updated with their official ICD-10 names.  

## Common Errors
- If you recieve a KeyError when calling the data (running the large loop code), check your parent_code variable's structure.  
    - This error is most likely due to an error in how the values are inputted into the parent_code variable
    - If the error persists, add each parent:code pair individually and try calling the data (running the large loop code).  This will narrow down which parent:code pair is causing the issue or help identify any syntax issues with the parent_code variable. 
    - If one parent:code is causing the error to continue, I recommend trying to pull for just that parent:code, searching an adjacent parent:code that includes the originally desired parent:code, or manually pulling the codes from the SNOMED-CT browser. 
    
- If you recieve a ConnectionError, you have made too many calls to the API and must wait ~24 hours before making another call. 
    - If you consistently recieve this error, check the parent_code variable for any big overlaps between parent:child relations. For example: if the two parents you are looking up are 'Pneumonitis (disorder)' and 'Pnuemonia (disorder)', try inputting only 'Pneumonitis (disorder)' and removing 'Pnuemonia (disorder)' from your search since 'Pnuemonia (disorder)' is a descendent of 'Pneumonitis (disorder)'.
       

In [None]:
import pandas as pd
import numpy as np
import json
import requests
false = False
true = True

In [None]:
# CHANGE THE CODES HERE: 
parent_code = {"Pneumonitis (disorder)": 205237003}

## WHAT DO YOU WANT YOUR EXCEL SHEET NAMED? ##
Excel_Sheet_Name = "Pneumonitis Codes"

### What name do you want to do a general search by? ###
## This pulls all results as if you were to type in "Asthma" for example, all search results would be returned ##
search_name = "Pneumonitis"

In [None]:
#### DO NOT CHANGE BELOW THIS LINE ####

empty_pd_formatted = pd.DataFrame()
test_codes = []
test_values = []
    
for key in parent_code: 
# This collects the JSON for each value in parent_code 
    URL = f"https://snowstorm-training.snomedtools.org/fhir/ValueSet/$expand?url=http://snomed.info/sct?fhir_vs=ecl/<<{parent_code[key]}"
    response = requests.get(URL)
    variable = response.json()

    for code in np.arange(0, len(variable["expansion"]["contains"]), 1):
        codes = variable["expansion"]["contains"][code]["code"]
        test_codes.append(codes)

    for code in np.arange(0, len(variable["expansion"]["contains"]), 1):
        name = variable["expansion"]["contains"][code]["display"]
        test_values.append(name)  
        
# This is where the dataframe stores the previous pandas dataframe iterations     
new_new_row = pd.DataFrame({"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": "SNOMED-CT", "Code Value": test_codes, "Code Description": test_values})
empty_pd_new = pd.concat([new_new_row, empty_pd_formatted.loc[:]]).reset_index(drop=True)

In [None]:
## This pulls all the search_name results ##

URL = f"https://snowstorm-training.snomedtools.org/fhir/ValueSet/$expand?url=http://snomed.info/sct?fhir_vs&filter={search_name}"

response = requests.get(URL)

variable = response.json()

test_codes = []
test_values = []

for code in np.arange(0, len(variable["expansion"]["contains"]), 1):
    codes = variable["expansion"]["contains"][code]["code"]
    test_codes.append(codes)
    
for code in np.arange(0, len(variable["expansion"]["contains"]), 1):
    name = variable["expansion"]["contains"][code]["display"]
    test_values.append(name)

test_table = {"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": "SNOMED-CT", "Code Value": test_codes, "Code Description": test_values}

test_pandas_table = pd.DataFrame(test_table)

general_and_subset_request_pandas = pd.concat([empty_pd_new, test_pandas_table.loc[:]]).reset_index(drop=True)

In [None]:
general_and_subset_request_df = general_and_subset_request_pandas.drop_duplicates(keep="first")
general_and_subset_request_df_clean = general_and_subset_request_df.reset_index(drop=True)
general_and_subset_request_df_clean

In [None]:
empty_list_for_ICD_codes = []
    
for x in np.arange(0,len(general_and_subset_request_df_clean["Code Value"]), 1): 
    codes = int(general_and_subset_request_df_clean["Code Value"][x])
    URL = f"https://snowstorm-training.snomedtools.org/fhir/ConceptMap/$translate?code={codes}&system=http://snomed.info/sct&source=http://snomed.info/sct?fhir_vs&target=http://hl7.org/fhir/sid/icd-10&url=http://snomed.info/sct/900000000000207008/version/20200131?fhir_cm=447562003"
    response = requests.get(URL)
    variable = response.json()
    if 'code' in variable: 
        for each in np.arange(0, len(variable['parameter'][1]['part']), 1):
            empty_list_for_ICD_codes.append(variable['parameter'][1]['part'][each]['valueCoding']['code'])   
    else:
        continue

test_ICD10_table = {"Data Concept": "Diagnosis Code", "Data Sub-Concept": "N/A", "Coding Standard": "ICD-10", "Code Value": empty_list_for_ICD_codes, "Code Description": "N/A"}

test_pandas_ICD10__table = pd.DataFrame(test_ICD10_table)

total_ICD10_SNOMED_pandas = pd.concat([general_and_subset_request_df_clean, test_pandas_ICD10__table.loc[:]]).reset_index(drop=True)

In [None]:
total_ICD10_SNOMED_pandas_no_duplicates = total_ICD10_SNOMED_pandas.drop_duplicates(keep="first")
total_ICD10_SNOMED = total_ICD10_SNOMED_pandas_no_duplicates.reset_index(drop=True)

In [None]:
#### DO NOT CHANGE BELOW THIS LINE ####

excel_name = f'{Excel_Sheet_Name}' + ".xlsx"

total_ICD10_SNOMED.to_excel(excel_name)

#### DO NOT CHANGE ABOVE THIS LINE ####