# ICD-10-PCS conversion

---

This is initially an exercise that was meant to be done using R. The idea _(at least what I gathered from Melissa's explanation - hope I didn't get it wrong)_ is to convert the ICD-10-PCS codes that we currently have for the enhanced list in our database (which uses Bioportal ICD codes) into the official one <sub>(also what I gathered from Melissa's explanation)</sub>

This is going to be done in python since I definitely can't do this using R. 🐍

## Content
 - [ETL](#ETL)
 - [Goal](#Goal)
 - [Creating dictionary](#Creating-dictionary)
 - [Creating the function](#Creating-the-function)
 - [Implementing the function](#Implementing-the-function)
 - [Conclusion](#Conclusion)
 
 ---
 
## ETL

Packages that will be used (or not) throughout this exercise.

In [1]:
# Importing packages for data management 
import pandas as pd    # Importing pandas
import numpy as np     # Importing numpy
import datetime as dt  # Importing datetime
import re              # Importing regular expression
import warnings        # To suppress warning alert
warnings.filterwarnings('ignore')
#Change setting to avoid dataframe from truncating
pd.options.display.max_rows = 500
pd.options.display.width = 500
pd.options.display.max_colwidth = 500
pd.options.display.max_columns = 500

In [2]:
# Reading the datasets 
pcs = pd.read_csv("icd10_pcs_all_codes.csv", usecols = ['code','title'])
pedn = pd.read_excel("2020-04-27 Enhanced list for pediatric neurology.xlsx")

---

## Goal

```
Return the value of ICD-10-PCS title (or in other term ICD-10 display name) 
by inputting ICD-10-PCS code (7 characters) or partial code (< 7 characters)
```

Quick look at the dataset:

In [3]:
pcs.sample(5)

Unnamed: 0,code,title
60985,0UWHX7Z,Medical and Surgical @Female Reproductive System @Revision @Vagina and Cul-de-sac @External @Autologous Tissue Substitute @No Qualifier
34465,0FWB0KZ,Medical and Surgical @Hepatobiliary System and Pancreas @Revision @Hepatobiliary Duct @Open @Nonautologous Tissue Substitute @No Qualifier
65610,0XJH3ZZ,"Medical and Surgical @Anatomical Regions, Upper Extremities @Inspection @Wrist Region, Left @Percutaneous @No Device @No Qualifier"
64426,0WW4X0Z,"Medical and Surgical @Anatomical Regions, General @Revision @Upper Jaw @External @Drainage Device @No Qualifier"
38711,0KN4XZZ,"Medical and Surgical @Muscles @Release @Tongue, Palate, Pharynx Muscle @External @No Device @No Qualifier"


Initially during the group discussion there was a mention of using join/merge function to solve this problem. But I don't think that would work since some of the ICD-10-PCS codes that comes from our list are partial codes. And if we want to populate partial display names, there's no way to do it without modifying the official ICD-10-PCS dataset.

My first thought is to create a dictionary for each hierarchy (nth character) and using a for loop within a function that allows each character to act as a key to return the value of that particular character and eventually put all of them together returning a complete or partial ICD-10 display name. This doesn't work because ICD-10 has a tree structure and creating dictionaries using for loop would leave out a lot of values in the ICD-10-PCS

My next idea is to:
1. Create a dictionary out of all the rows in ICD-10-PCS, using the code & display name tuple as key and value. 
2. Create a function that:
    - Takes a code or partial code that will find all match in the dictionary values using regex 
    - Creates an empty list to store all the values that match the input
    - Uses a for loop to iterate through the dictionary keys and appending the matches to the empty list
    - Returning the display name (the dictionary value) up to the length of the input (up to 7 levels depending on the code length)
3. Apply the function to a subset of the enhanced list dataframe for confirmation

[back to top](#ICD-10-PCS-conversion)

---

## Creating dictionary

In [4]:
# Creating new column to contain list for split display name  
pcs['title'] = pcs['title'].str.lower()
pcs['split_title']=pcs['title'].apply(lambda x: x.split(" @"))

In [5]:
# Creating dictionary for all ICD-10-PCS rows 
pcs_dict={}
for key, val in zip(pcs.code,pcs.split_title):
    pcs_dict[key] = val

---

## Creating function

In [6]:
# Writing function to lookup ICD-10-PCS display name in the dictionary  
def icd10pcs_lookup(x):
    '''
    This function takes code/partial codes that will be 
    matched with the dictionary keys.
    For loop function iterates through the dictionary keys
    and appends the dictionary key if it matches the input.
    It will skip if there's no match found from the regex
    or if the length of the key that is matched is < 7.
    '''
    codes= list(x)
    keys = []
    for element in pcs_dict.keys():
        key = re.findall(r"{}\w*".format(x), element)
        if key == []:                         
            continue
        elif len((list(key[0]))) != 7:
            continue
        else:
            keys.append(key[0])
    result=pcs_dict[keys[0]][:(len(codes))]
    '''
    The result is the value called by the first 7 character
    key that matches the search pattern, in this case the
    code whether it's full or partial.
    String format for the return value is adjusted to 
    follow to the format used in the enhanced list.
    '''
    return str(result).replace("', '"," @ ").replace("['","").replace("']","")

Quick test to see how the function works:

In [7]:
icd10pcs_lookup('0KB')

'medical and surgical @ muscles @ excision'

👆🏽 Looks good. The output gives us 3 levels of the code display name just as we wanted. Next, implementing it to the enhanced list.

[back to top](#ICD-10-PCS-conversion)

---

## Implementing in enhanced specialty dataset

Since this is for testing purpose only, I will use only 1 column of ICD-10-codes from the pediatric neurology enhanced list. I will include the SNOMED display name as a reference and ICD-10 display name to compare the result that'll be given by the function.

To know whether there are different values or not, I'm going to create a new column to show the comparison between the two columns. I'll add a new column called 'no_discrepancy' that will contain boolean value, ```True``` if there's no discrepancy and ```False``` if otherwise. Rows with discrepancy will be highlighted to help point them out if there are any.

To do this I will neet to change all letter to lower case in the 'ICD-10 display name 1' column.

In [8]:
# Creating a new dataframe from the pediatric neurology enhanced list 
test_df = pedn[['SNOMED display name','ICD-10 code 1', 'ICD-10 display name 1']].dropna() #dropna to remove blank row
test_df['ICD-10 code 1'] = test_df['ICD-10 code 1'].astype(str) #changing column value to str 

In [9]:
# Applying the function to the dataset 
test_df["lookup_result"] = test_df['ICD-10 code 1'].apply(icd10pcs_lookup)

In [10]:
# changing the letter case & creating 'no_discrepancy' column 
test_df['ICD-10 display name 1'] = test_df['ICD-10 display name 1'].str.lower()
test_df['no_discrepancy'] = test_df['ICD-10 display name 1'] == test_df['lookup_result']

In [11]:
# def function for row highlighting 
def row_highlight(x):
    if x['no_discrepancy'] == False:
        return ['background-color: #ff7092']*5
    else:
        return ['background-color: #white']*5
test_df.style.apply(row_highlight, axis=1)

Unnamed: 0,SNOMED display name,ICD-10 code 1,ICD-10 display name 1,lookup_result,no_discrepancy
1,electroencephalogram,4A00X4,measurement and monitoring @ physiological systems @ measurement @ central nervous @ external @ electrical activity,measurement and monitoring @ physiological systems @ measurement @ central nervous @ external @ electrical activity,True
2,electromyography,4A0FX,measurement and monitoring @ physiological systems @ measurement @ musculoskeletal @ external,measurement and monitoring @ physiological systems @ measurement @ musculoskeletal @ external,True
3,electronystagmography,F15Z1,"physical rehabilitation and diagnostic audiology @ diagnostic audiology @ vestibular assessment @ none @ bithermal, monaural caloric irrigation","physical rehabilitation and diagnostic audiology @ diagnostic audiology @ vestibular assessment @ none @ bithermal, monaural caloric irrigation",True
4,lumbar puncture,009U3Z,medical and surgical @ central nervous system @ drainage @ spinal canal @ percutaneous @ no device,medical and surgical @ central nervous system and cranial nerves @ drainage @ spinal canal @ percutaneous @ no device,False
5,magnetoencephalography,B030,imaging @ central nervous system @ magnetic resonance imaging (mri) @ brain,imaging @ central nervous system @ magnetic resonance imaging (mri) @ brain,True
6,biopsy of muscle,0KB,medical and surgical @ muscles @ excision,medical and surgical @ muscles @ excision,True
7,myelogram,B03BY0,imaging @ central nervous system @ magnetic resonance imaging (mri) @ spinal cord @ other contrast @ unenhanced and enhanced,imaging @ central nervous system @ magnetic resonance imaging (mri) @ spinal cord @ other contrast @ unenhanced and enhanced,True
8,biopsy of nerve,01B54Z,medical and surgical @ peripheral nervous system @ excision @ median nerve @ percutaneous endoscopic @ no device,medical and surgical @ peripheral nervous system @ excision @ median nerve @ percutaneous endoscopic @ no device,True
9,nerve conduction study,4A01X2,measurement and monitoring @ physiological systems @ measurement @ peripheral nervous @ external @ conductivity,measurement and monitoring @ physiological systems @ measurement @ peripheral nervous @ external @ conductivity,True
10,transcutaneous electrical nerve stimulation,01HY3M,medical and surgical @ peripheral nervous system @ insertion @ peripheral nerve @ percutaneous @ neurostimulator lead,medical and surgical @ peripheral nervous system @ insertion @ peripheral nerve @ percutaneous @ neurostimulator lead,True


We can see that there actually are some difference. In this case, what's written as ```central nervous system``` in our database is actually ```central nervous system and cranial nerves``` in the official ICD-10-PCS list. No other diferrences aside from that.

---

## Conclusion

Overall, I think I've covered pretty much everything and managed to make it work. 
Although I think the function seems to be working as intended, it'll be good if this solution can be tested further to make sure it's robust enough.

[back to top](#ICD-10-PCS-conversion)