## W.6 Wrangle Claim_ICD

Observation: The International Classification of Diseases (ICD) is a classification system of diagnosis codes representing conditions and diseases. There are thousands of alpha/numeric codes organized in 22 disease sections/categories.  The database does not capture the codes by sections, making it difficult to track and measure activity by ICD codes.
     
    
Goal: Extract alpha and numeric components of codes in existing database and categorize consistent with ICD.  If ICD code is missing it will be dropped.  In addition, ICD sub-fields will be dropped as it is extraneous to analytical objectives.
    

### Table of Contents

1. <a href="#1.-Libraries-&-Environment">Libraries & Environment</a>
2. <a href="#2.-Settings">Settings</a>
3. <a href="#3.-Data-Source">Data Source</a>
4. <a href="#4.-ICD-Codes">ICD Codes</a>
5. <a href="#5.-Create-ICD-Dictionary">Create ICD Dictionary</a>
6. <a href="#6.-Remove-Not-Applicable-Rows">Remove Not Applicable Rows</a>
7. <a href="#7.-Extract-ICD-Alpha-Code-Characters">Extract ICD Alpha Code Characters</a>
8. <a href="#8.-ICD-Alpha-to-Numeric-Dictionary-Map-and-Replace">ICD Alpha-to-Numeric Dictionary Map and Replace</a>
9. <a href="#9.-ICD-Code-Description">ICD Code Description</a>
10. <a href="#10.-Quality-Control">Quality Control</a>


### 1. Libraries & Environment

In [1]:
import numpy as np
import pandas as pd


[<a href='#Table-of-Contents'>Table of Contents</a>]

### 2. Settings

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

pd.options.display.width

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

pd.options.display.float_format = '{:,.0f}'.format

import winsound
duration = 1300
freq = 440


[<a href='#Table-of-Contents'>Table of Contents</a>]

### 3. Data Source

In [3]:
df = pd.read_pickle('DataFiles/AnnualpickleMar18')

winsound.Beep(freq,duration)


[<a href='#Table-of-Contents'>Table of Contents</a>]


### 4. ICD Codes



[<a href="https://www.aapc.com/codes/icd-10-codes-range/">American Academy of Professional Coders ICD Code Website</a>]

In [None]:
# PURPOSE:  Prepare ICD Code Table with 22 Codes and ICD Category Name

ICD_Code_Desc = 

pd.read_clipboard
(
1	InfectiousDiseases
2	Neoplasms
3	Blood_Organs
4	Endocrine_Hormone
5	Mental_NeuroDisoders
6	Nervous_System
7	Eye_Adnexa
8	Ear_Mastoid
9	Circulatory
10	Respiratory
11	Digestive
12	Skin_Tissue
13	Musculoskeletal
14	Genitourinary
15	Pregnancy
16	Pregnancy
17	Congenital_Malformations
18	Abnormal_Discoveries
19	Injury_Poisoning
20	Other
21	Other
22	Other
)


[<a href='#Table-of-Contents'>Table of Contents</a>]

### 5. Create ICD Dictionary

In [None]:
# PURPOSE: Use numeric code (int) to map alpha codes in database to numeric 
# ICD codes.  Using datatype int will improve processing speed and flexibilty in machine
# learning algorithms

dict_ICD_Code_Desc = ICD_Code_Desc.set_index('Claim_ICD1.1').to_dict()['Claim_ICD_Cat']
dict_ICD_Code_Desc

[<a href='#Table-of-Contents'>Table of Contents</a>]

### 6. Remove Not Applicable Rows

In [8]:
# PURPOSE:  Exclude pharma and NA rows.

df = df.loc[(df['Claim_ICD1'] != 'PHA')]
df = df.dropna()

[<a href='#Table-of-Contents'>Table of Contents</a>]

### 7. Extract ICD Alpha Code Characters

In [None]:
df['Claim_ICD1.1'] = df['Claim_ICD1'].str[0]

[<a href='#Table-of-Contents'>Table of Contents</a>]

### 8. ICD Alpha-to-Numeric Dictionary Map and Replace

In [None]:
# Identify "key" for Dictionary to map alpha to numberic and replace.

df.Claim_ICD_1.value_counts().sort_index()

ICD_dict= {
    'A':1, 'B':1, 'C': 2, 'D':3,'E':4,'F':5,'G':6,'H':7,'H+':8,
    'I':9,'J':10, 'K':11,'L':12,'M':13,'N':14,'O':15,'P':16,'Q':17,
    'R':18,'S':19,'T':19,'U':22,'V':20,'W':20,'X':20,'Y':20,'Z':21}

df = df.replace({'Claim_ICD1.1': ICD_dict})


[<a href='#Table-of-Contents'>Table of Contents</a>]

### 9. ICD Code Description

In [None]:
# PURPOSE:  To map Code Descriptions to ICD numeric code

df['Claim_ICD_Desc'] = df['Claim_ICD1.1'].map(ICD_Code_Desc)
winsound.Beep(freq,duration)

[<a href='#Table-of-Contents'>Table of Contents</a>]

### 10. Quality Control

In [11]:
qc_ICD = df.filter(['Claim_ICD1.1', 'Claim_ICD_Desc'])

qc_ICD

Unnamed: 0_level_0,Claim_ICD1.1,Claim_ICD_Desc
Index_Claim,Unnamed: 1_level_1,Unnamed: 2_level_1
12,14,Genitourinary
13,10,Respiratory
14,10,Respiratory
15,10,Respiratory
16,18,Abnormal_Discoveries
...,...,...
295377,21,Other
295378,21,Other
295379,21,Other
295380,5,Mental_NeuroDisoders


[<a href='#Table-of-Contents'>Table of Contents</a>]

In [132]:

df.to_pickle('DataFiles/AnnualpickleMar16')
winsound.Beep(freq,duration)