<a href="https://colab.research.google.com/github/alofgran/yada_yada/blob/master/YadaYada_ICD10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ICD10 Codes >> Google Info
We want to gather more information on the 100 most common diagnoses issued in Russell's office by pulling ICD10 information, organizing it, and then also supplementing it with information from Google searches.

[CD10 API Documentation](https://pypi.org/project/icd10-cm/)

In [3]:
# !pip install icd10-cm

Collecting icd10-cm
[?25l  Downloading https://files.pythonhosted.org/packages/6b/a5/3059308d94513845e78d701b71a60c55ee4c37fab4b6442e4c58cdb70da1/icd10_cm-0.0.4-py2.py3-none-any.whl (675kB)
[K     |████████████████████████████████| 675kB 2.7MB/s 
[?25hInstalling collected packages: icd10-cm
Successfully installed icd10-cm-0.0.4


In [0]:
import icd10
import os
from google.colab import drive

In [2]:
#Open filepath to existing data
drive.mount('/content/drive', force_remount=True)
root = os.getcwd()
download_destination = 'drive/My Drive/Colab Notebooks'
cwd = os.path.join(root, download_destination)
os.chdir(cwd)
print('Current working directory: ', os.getcwd())

Mounted at /content/drive
Current working directory:  /content/drive/My Drive/Colab Notebooks


In [3]:
import pandas as pd
data_df = pd.read_csv('Pediatric Topics.csv')
data_df.head()

Unnamed: 0,Code,Diagnosis,Symptoms,Doctors
0,J00,Common Cold,"Runny nose/congestion, cough, fever",
1,R05,Cough,,
2,R50.9,Fever,,
3,R09.81,Nasal Congestion,,
4,H92.09,Ear Pain,,


In [0]:
import re
import numpy as np

# example_codes = ['J00', 'R05', 'R50.9', 'R09.81', 'H92.09', 'H66.9', 'R13.1', 'J02.9', 'J02.0', 'J98.8', 'R10.9', 'R10.10', 'R10.11', 'R10.12', 'R10.13', 'R10.30', 'R10.31', 'R10.32', 'R10.33', 'R10.84']

# NOTE: ICD-10-CM R11. 15 is a new 2020 ICD-10-CM code that became effective on October 1, 2019.

def get_code_description(list_of_codes):
    results_dict = {}
    recheck_list = []
    for code in data_df['Code']:
        code = code.strip(' ')
        if icd10.exists(code):# and bool(re.search(r'\d', code)): #Ignore values in columns that aren't codes (words/blanks)
            code_class_obj = icd10.find(code) #Get ICD10 class for associated code
            #Get description
            info = code_class_obj.description #get the description associated with the code
            #Get billability
            billability = code_class_obj.billable
            #Convert code to query for google search
            query = info.replace(', unspecified', '')\
                        .replace('Unspecified ', '')\
                        .replace(' unspecified', '')\
                        .replace('  ', ' ')
            #Save results to dict
            results_dict[code] = {'ICD10_Description': info,
                                  'Query': query,
                                  'Billability': billability}
        else:
            recheck_list.append(code)

    return results_dict, recheck_list

results_dict, recheck_list = get_code_description(data_df['Code'])

In [0]:
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(results_dict)

In [5]:
billable_codes = [{k:v} for k, v in results_dict.items() if v['Billability']==True]

print('Number of codes:', len(results_dict))
print('Number of failed codes: ', len(recheck_list))
print('Number of billable codes: ', len(billable_codes))

Number of codes: 72
Number of failed codes:  20
Number of billable codes:  68


In [0]:
import requests
from bs4 import BeautifulSoup
import time

query = "fever"
#Because google provides different results on desktop and mobile devices we need to define the user_agent accordingly
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"
headers = {"user-agent" : MOBILE_USER_AGENT}

# Running search on entire list...
def scrape_google(query):
    query = query.replace(' ', '+')
    URL = f"https://google.com/search?q={query}"
    time.sleep(5)
    resp = requests.get(URL, headers=headers)
    status_code = resp.status_code
    if status_code == 200:
        print('Successful query of {}: {}'.format(query, status_code))
        soup = BeautifulSoup(resp.content, "html.parser")#can use lxml instead of html.parser...
    else:
        print('Unsuccessful query of {}: {}'.format(query, status_code))
    return soup

#Test Example
# scrape_google('fever')

In [0]:
#Extract the 'self-treatment' and 'seeking medical care' sections from BeautifulSoup object

def extract_treatment_care_info(soup, div_class = 'swqYTd'): #'Self-treatment'=[0] and 'Seeking medical care'=[1]
    temp_results ={}
    for row in soup.find_all('div',attrs={'class': div_class}):
        row = re.split('^(Self-treatment)|^(Seeking medical care)', row.text)
        row = [r for r in row if r != '']
        row = [r for r in row if r != None]
        temp_results[row[0]] = row[1]
    return temp_results

# print(temp_results, '\n')

# import pprint
# pp = pprint.PrettyPrinter(indent=4)
# pp.pprint(temp_results)

In [17]:
#Testing a single search
extract_treatment_care_info(scrape_google('fever'))

Successful query of fever: 200


{'Seeking medical care': "See a doctor immediately if you have a child:Younger than 3 months with a 100.4°F (38°C) or higher fever3 to 6 months old with 102°F (38.9°C) or higher fever6 to 24 months old with a 102°F (38.9°C) or higher fever that lasts more than a day2 years old or older with fever who is listless, irritable, or vomiting repeatedlyOr if you're an adult with a 103°F (39.4°C) or higher fever",
 'Self-treatment': 'Over-the-counter medications such as acetaminophen and ibuprofen may help ease discomfort. Avoid giving children aspirin because this may cause a rare, serious condition.'}

In [0]:
for key, value in results_dict.items():
    results_dict[key].update(extract_treatment_care_info(scrape_google(value['Query'])))

In [27]:
#Get the number of treatment codes recorded from the google search
count = 0
for k, v in results_dict.items():
    # print(v)
    if 'Self-treatment' in v:
        count += 1

print('Percentage of codes with treatment/medical care info: {:.2f}%'.format(count/len(results_dict)*100))

Percentage of codes with treatment/medical care info: 16.67%


## __Data dictionary for our results table__

* __Index__ -  ICD10 code derived from Russell's spreadsheet
* __ICD10_Description__ -  Description in ICD10 dictionary pulled via API
* __Query__ - Modified version of `ICD10_Description` that gives better Google query results
* __Billability__ -  Whether or not the code is billable (a billable code is detailed enough to be used to specify a medical diagnosis) - maybe this will inform whether or not it's too specific
* __Self-treatment__ -  A section from google search results
* __Seeking medical care__ -  A section from google search results

In [31]:
results_df = pd.DataFrame(results_dict).T
results_df.head()

Unnamed: 0,ICD10_Description,Query,Billability,Self-treatment,Seeking medical care
J00,Acute nasopharyngitis [common cold],Acute nasopharyngitis [common cold],True,,
R05,Cough,Cough,True,"Liquids, lozenges, cough drops, vaporizers, an...",Make an appointment to see a doctor if youDeve...
R50.9,"Fever, unspecified",Fever,True,Over-the-counter medications such as acetamino...,See a doctor immediately if you have a child:Y...
R09.81,Nasal congestion,Nasal congestion,True,Using a humidifier at home and rinsing the ins...,Make an appointment to see a doctor if youAre ...
H92.09,"Otalgia, unspecified ear",Otalgia ear,True,"Using a warm, moist compress on the ear may he...",Make an appointment to see a doctor if youDeve...
