# ICD-9-CM and ICD-10-CM Codes

Python notebook for processing ICD-9-CM and ICD-10-CM codes, including bi-directional mappings, using publicly available datasets.

## 1. ICD-9-CM and ICD-10-CM data

* [ICD-9-CM Codes](https://www.cms.gov/medicare/coding-billing/icd-10-codes/icd-9-cm-diagnosis-procedure-codes-abbreviated-and-full-code-titles)
* [ICD-10-CM Codes](https://www.cdc.gov/nchs/icd/icd-10-cm/index.html)
* [Clinical Classification Software Refined (CCSR)](https://hcup-us.ahrq.gov/toolssoftware/ccsr/ccs_refined.jsp)
  codes from:

**[Healthcare Cost & Utilization Project](https://hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp)**  
Agency for Healthcare Research and Quality  
U.S. Department of Health and Human Services

**Code Descriptions:**

* **[ICD-9-CM](https://hcup-us.ahrq.gov/toolssoftware/ccs/$DXREF%202008_Archive.csv)**
* **[ICD-10-CM](https://hcup-us.ahrq.gov/toolssoftware/ccsr/dxccsr.jsp)**

ICD-10-CM codes are specified with all periods (".") removed.

Note that while ICD-9-CM and ICD-10-CM codes are mostly distinct, there is a small overlap of [duplicate ICD-9-CM and ICD-10-CM codes](https://www.hhs.gov/guidance/sites/default/files/hhs-guidance-documents/2015-duplicate_ICD10CM_codes.pdf).

## 2. Mappings Between ICD-9-CM and ICD-10-CM

Mappings between ICD-9-CM and ICD-10-CM codes are provided by:

**[The National Bureau of Economic Research (NBER)](https://www.nber.org/)**

NBER provides recommendations for a manual mapping process from ICD-9-CM to ICD-10-CM (*forward mapping*), and from ICD-10-CM to ICD-9-CM (*backwards mapping*).

We use the 2018 [General Equivalence Mappings (GEMs)](https://www.cms.gov/medicare/coding-billing/icd-10-codes/2018-icd-10-cm-gem), the [latest year with available data](https://www.cms.gov/medicare/coding-billing/icd-10-codes/2019-icd-10-cm).

Mappings are not a simple 1-1 correspondence between ICD-9-CM and ICD-10-CM codes. There are codes with no matches, codes with multiple matches, and codes without matches in both mappings (9->10, 10->9).

**FAQs from NBER:**

* [Basic FAQ](https://data.nber.org/gem/GEMs-CrosswalksBasicFAQ.pdf)
* [Technical FAQ](https://data.nber.org/gem/GEMs-CrosswalksTechnicalFAQ.pdf)
* [User's Guide](https://www.hhs.gov/guidance/sites/default/files/hhs-guidance-documents/GemsTechDoc_0.pdf)

Note: A large NBER page regarding GEMs can be found [here](https://www.nber.org/research/data/icd-9-cm-and-icd-10-cm-and-icd-10-pcs-crosswalk-or-general-equivalence-mappings), although it does not seem to have been updated with the most recent data.

**Problems with icd-mapper**

Demonstrates several problems with an existing open-source tool for this mapping between ICD codes, [icd-mapper](https://pypi.org/project/icd-mappings/) (first Google result, 9/1/2024, when searching 'icd mapping python').

## 3. Project ICD Categories

Provides examples of identifying matching ICD-9-CM and ICD-10-CM codes for multiple projects. Note that the matching process involves manual review and is NOT automated.
Output files listed below.


### Notebook Outputs:

#### ICD Code Data

* [icd_9_10.csv](data/icd/icd_9_10.csv) - Descriptions of all ICD-9-CM and ICD-10-CM codes.
* [icd9to10.csv](data/icd/icd9to10.csv) - NBER provided mapping from ICD-9-CM to ICD-10-CM, augmented with descriptions.
* [icd10to9.csv](data/icd/icd10to9.csv) - NBER provided mapping from ICD-10-CM to ICD-9-CM, augmented with descriptions.

#### Project Specific Codes

**National (State Data)**

ICD categories by project

* [**State Data**](data/icd/project_icd_categories/state_data.csv) - ICD codes and descriptions for all categories included in the State Data study.



* [**Network (CPQCC)**](data/icd/project_icd_categories/network_cpqcc.csv) - ICD codes and descriptions for ICD categories used in the CPQCC dataset.

In [1]:
#Imports
import os
import pickle
import numpy as np
import pandas as pd
from IPython.display import display, HTML

pd.set_option('display.max_rows', 500)

from open_neo import PATH_DATA_ICD, computeInfo

#Start compute timer and system memory information
compute_info = computeInfo()
compute_info.info()

#Filenames and paths
PATH_DATA = PATH_DATA_ICD
PATH_DATA_RAW = os.path.join(PATH_DATA,'raw')
FILE_ICD9 = os.path.join(PATH_DATA_RAW,
                         'icd_9_ccsr_hhs.csv')
FILE_ICD9_CCSR = os.path.join(PATH_DATA_RAW,
                         'icd_9_ccsr_desc.csv')
FILE_ICD10 = os.path.join(PATH_DATA_RAW,
                         'icd_10_ccsr_hhs.csv')

Elapsed Time: 0 hours, 0 minutes, 0 seconds
Current Memory Usage: 116 MB (0.1 GB)
Memory Change Since Last Call: 0 MB (0.0 GB)
Total Memory Change Since Instantiation: 0 MB (0.0 GB)


In [2]:
HTML

IPython.core.display.HTML

#### 1.  ICD-9-CM and ICD-10-CM codes

##### ICD-9-CM

In [3]:
icd9 = pd.read_csv(FILE_ICD9)
#Remove quotes
for column in ['icd_code', 'ccsr_code', 'ccsr_category']:
    icd9[column] = icd9[column].str.replace("'", "").str.strip()

#Load ccsr definitions
icd9_ccsr = pd.read_csv(FILE_ICD9_CCSR)
icd9 = icd9\
    .drop(columns='ccsr_category')\
    .merge(icd9_ccsr, how='left')
icd9['icd_version'] = 9
icd9.icd_code.value_counts().value_counts()

count
1    15072
Name: count, dtype: int64

In [4]:
#Display a single ICD-9-CM code
icd9[icd9.icd_code=='E8720']

Unnamed: 0,icd_code,ccsr_code,icd_desc,ccsr_category,icd_version
14564,E8720,2616,FAILURE STERILE SURGERY,E Codes: Adverse effects of medical care,9


In [5]:
#Caution, some codes have leading 0's
icd9[icd9.icd_code=='0081']

Unnamed: 0,icd_code,ccsr_code,icd_desc,ccsr_category,icd_version
5453,81,135,ARIZONA ENTERITIS,Intestinal infection,9


##### ICD-10-CM

In [6]:
icd10_full = pd.read_csv(FILE_ICD10)
 #Note this associates ICD-10-CM codes with CCSR categories,
 # and CONTAINS DUPLICATED ICD CODE VALUES.
# # (e.. each diagnosis has a primary category 
#       and possibly secondary categories
#       (for both inpatient/outpatient)).
# We use the primary, inpatient category, whenever applicable.
icd10_full['default_ccrs_inpatient'] = pd.Categorical(\
        icd10_full['default_ccrs_inpatient'],
        categories=['Y', 'N', 'X'], ordered=True)

icd10_full = icd10_full.sort_values(\
        by=['icd_code','default_ccrs_inpatient'])

icd10_full['icd_version'] = 10

icd10_full['inpatient_outpatient_match'] = np.where(\
    icd10_full.default_ccrs_inpatient==icd10_full.default_ccrs_outpatient, 1,0)
 
#Default to CCRS inpatient value
icd10 = icd10_full\
        .copy()\
        .groupby('icd_code').first().reset_index()
icd10_full.icd_code.value_counts().value_counts()

count
1    66120
2     6521
3     1737
4      564
5       43
6        2
Name: count, dtype: int64

In [7]:
#Example CCSR codes
#Primary, secondary
icd_code = 'O23512'
#icd_code = 'A5039'
#icd_code = 'O99825'
icd10_full[icd10_full.icd_code==icd_code]

Unnamed: 0,icd_code,icd_desc,ccsr_code,ccsr_category,default_ccrs_inpatient,default_ccrs_outpatient,icd_version,inpatient_outpatient_match
24578,O23512,"Infections of cervix in pregnancy, second trim...",PRG028,Other specified complications in pregnancy,Y,Y,10,1
24577,O23512,"Infections of cervix in pregnancy, second trim...",GEN018,Inflammatory diseases of female pelvic organs,N,N,10,1


In [8]:
#Extract info for a specific ICD-9-CM code 
# (note, no periods in code)
icd10[icd10.icd_code=='E8589']

Unnamed: 0,icd_code,icd_desc,ccsr_code,ccsr_category,default_ccrs_inpatient,default_ccrs_outpatient,icd_version,inpatient_outpatient_match
4033,E8589,Other amyloidosis,END016,Other specified and unspecified nutritional an...,Y,Y,10,1


#### Create combined ICD-[9/10]-CM dataset

In [9]:
#Create description dataframes for ICD 9/10 codes
icd_desc_columns = ['icd_code','icd_desc', 'ccsr_category'] #'ccsr_code',
icd9_desc = icd9[icd_desc_columns]
icd10_desc = icd10[icd_desc_columns]

icd9_desc.columns.values[0] = 'icd9cm'
icd9_desc = icd9_desc\
    .rename(columns={c: f'{c}9' for c in icd9_desc.columns[1:]})
icd10_desc.columns.values[0] = 'icd10cm'
icd10_desc = icd10_desc\
    .rename(columns={c: f'{c}10' for c in icd10_desc.columns[1:]})

icd_desc_dict = {9:icd9_desc, 10:icd10_desc}
# Construct Single ICD dataframe
icd = pd.concat([icd9\
                 .rename(columns={'ccsr_code':'ccsr_code9',
                    'ccsr_category':'ccsr_category9'}), 
                icd10.rename(columns={'ccsr_code':'ccsr_code10',
                    'ccsr_category':'ccsr_category10'}) ], axis=0)\
                .sort_values(by='icd_code')

temp = icd.icd_code.value_counts()
duplicate_icd_codes = list(temp[temp>1].index)
icd['icd_duplicate'] = np.where(icd.icd_code.isin(duplicate_icd_codes),1,0)         
file_out = os.path.join(PATH_DATA, 'icd_9_10.csv')
icd.to_csv(file_out, index=False)   

### Duplicate ICD Values

In [10]:
icd9[icd9.icd_code=='E8589']

Unnamed: 0,icd_code,ccsr_code,icd_desc,ccsr_category,icd_version
14422,E8589,2613,ACC POISONING-DRUG NOS,E Codes: Poisoning,9


In [11]:
icd10[icd10.icd_code=='E8589']

Unnamed: 0,icd_code,icd_desc,ccsr_code,ccsr_category,default_ccrs_inpatient,default_ccrs_outpatient,icd_version,inpatient_outpatient_match
4033,E8589,Other amyloidosis,END016,Other specified and unspecified nutritional an...,Y,Y,10,1


We identify the same codes as:
https://www.hhs.gov/guidance/sites/default/files/hhs-guidance-documents/2015-duplicate_ICD10CM_codes.pdf


We identify several additional duplicate codes, 
as we remove all periods from codes (e.g. E8589 (*ICD-9-CM*), E85.89 (*ICD-10-CM*)).

## 2. Mappings Between ICD-9-CM and ICD-10-CM

In [12]:
#Note that this is NOT a 1-1 mapping
#Filename, fixed width column specs
icd_cm_mapping_dict = {(9,10):("2018_I9gem.txt",
                         [(0, 6), (6, 14), (14, 19)]),
                    (10, 9):("2018_I10gem.txt",
                         [(0, 8), (8, 14), (14, 19)])}
icd_mapping_dict = dict()
for (icd_from, icd_to), (file_name, file_colspec) in icd_cm_mapping_dict.items():
    from_var=f'icd{icd_from}cm'
    to_var=f'icd{icd_to}cm'
    to_code = f'ccsr_code{icd_to}'

    path_file = os.path.join(PATH_DATA_RAW, file_name)
    icd_mapping = pd.read_fwf(path_file,
        sep='  ', 
    colspecs=file_colspec,
    header=None,
     names=[from_var, to_var, 'flag_values'],
     dtype={'flag_values':str})
    icd_mapping
    # # Using DataFrame apply to simplify flag parsing
    flags = icd_mapping['flag_values']\
        .apply(lambda x: \
            pd.Series(list(x))\
                .astype(int))
    flags.columns = ['flag_approximate', 
                    'flag_nomap', 
                    'flag_combination',
                    'scenario', 
                    'choice']
    icd_mapping = pd.concat([icd_mapping, flags], axis=1)

#    #Merge in description #[[from_var, to_var]]\
    icd_mapping = icd_mapping\
        .merge(icd_desc_dict[icd_from], how='left')\
        .merge(icd_desc_dict[(-icd_from+19)], how='left')\
            .sort_values(by=from_var)
    
    out_file_mapping_desc = os.path.join(PATH_DATA, f'icd{icd_from}to{icd_to}.csv')
    icd_mapping.to_csv(out_file_mapping_desc, index=False)

#   #Save mapping for later usage
    icd_mapping_dict[icd_from] = icd_mapping

icd_mapping.head(2)

Unnamed: 0,icd10cm,icd9cm,flag_values,flag_approximate,flag_nomap,flag_combination,scenario,choice,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9
0,A000,10,0,0,0,0,0,0,"Cholera due to Vibrio cholerae 01, biovar chol...",Intestinal infection,CHOLERA D/T VIB CHOLERAE,Intestinal infection
1,A001,11,0,0,0,0,0,0,"Cholera due to Vibrio cholerae 01, biovar eltor",Intestinal infection,CHOLERA D/T VIB EL TOR,Intestinal infection


In [13]:
#ICD-version mapping helper function
def other_icd_version(icd_version):
    #Maps 9->10, 10->9
    return(-icd_version+19)

In [14]:
#Print all flags for inspection
for icd_version in [9, 10]:
    mapping_use = icd_mapping_dict[icd_version]
    print(icd_version, '->', other_icd_version(icd_version))
    print(mapping_use.flag_values.value_counts())
    print()

9 -> 10
flag_values
10000    18578
00000     3522
10112     1151
10111      853
11000      422
10122      115
10121       71
10132       48
10131       34
10142       26
10141       20
10161        6
10151        6
10152        2
10162        2
10123        2
10133        2
Name: count, dtype: int64

10 -> 9
flag_values
10000    68307
10111     4480
10112     4254
00000     3522
11000      731
10113      159
10122       53
10121       53
10114       26
10132        2
10131        2
10152        1
10142        1
10141        1
10151        1
Name: count, dtype: int64



In [15]:
temp = icd_mapping_dict[9]
temp[temp.icd9cm=='00845']

Unnamed: 0,icd9cm,icd10cm,flag_values,flag_approximate,flag_nomap,flag_combination,scenario,choice,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10
63,845,A0471,10000,1,0,0,0,0,CLOSTRIDIUM DIF (Begin 1992),Intestinal infection,"Enterocolitis due to Clostridium difficile, re...",Intestinal infection
64,845,A0472,10000,1,0,0,0,0,CLOSTRIDIUM DIF (Begin 1992),Intestinal infection,"Enterocolitis due to Clostridium difficile, no...",Intestinal infection


In [16]:
temp = icd_mapping_dict[10]
temp[temp.icd10cm=='A0472']

Unnamed: 0,icd10cm,icd9cm,flag_values,flag_approximate,flag_nomap,flag_combination,scenario,choice,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9
41,A0472,845,10000,1,0,0,0,0,"Enterocolitis due to Clostridium difficile, no...",Intestinal infection,CLOSTRIDIUM DIF (Begin 1992),Intestinal infection


### 3. ICD Categories by project

Import a list of variables defined by lists of ICD codes. Provides all matching ICD-9-CM and ICD-10-CM codes, for manual inspection.

In [17]:
icd_categories = pd.read_csv(
    os.path.join(PATH_DATA_RAW,
        'icd_categories_initial.csv'))

icd_categories = icd_categories\
    .merge(icd[['icd_code',
                'icd_version']],
         how = 'left')
print('Number of ICD codes per category:' ,
      icd_categories.icd_category.value_counts())

Number of ICD codes per category: icd_category
surgery_indicated    140
cmal_severe           24
pvl                   20
rop345                15
hydrocephalus         12
nec                    9
cld                    5
ivh34                  4
laparotomy             2
nec3                   2
ventilation            2
shunt                  1
Name: count, dtype: int64


In [18]:
# Path to save HTML files
html_output_path = os.path.join('data','icd','icd_mapping')

# Ensure the output directory exists
os.makedirs(html_output_path, exist_ok=True)

# Initialize a list to keep track of all the HTML file names
html_links = []

# Inspect each ICD category
for icd_category in np.sort(list(set(icd_categories.icd_category))):
    temp = icd_categories[icd_categories.icd_category == icd_category] \
        .sort_values(by=['icd_version', 'icd_code']) \
        .merge(icd[['icd_code', 'icd_desc']], how='left')
    
    html_file_name = f'{icd_category}_inspection.html'
    html_file = os.path.join(html_output_path, html_file_name)
    html_links.append(f'<a href="{html_file}">{icd_category}</a>')
    
    with open(html_file, 'w') as file:
        file.write(f'<h1>ICD Category: {icd_category}</h1>')
        file.write('<h2>Number of codes from each ICD version:</h2>')
        file.write(temp.icd_version.value_counts().to_frame().to_html())
        
        file.write('<h2>ICD Codes:</h2>')
        file.write(temp[['icd_code']].to_html(index=False))
        
        # Show the translations 9 -> 10, 10 -> 9
        for icd_version in [9, 10]:
            temp_codes = temp[temp.icd_version == icd_version].icd_code
            icd_mapping_temp = icd_mapping_dict[icd_version]
            icd_var = f'icd{icd_version}cm'
            to_version = 10 if icd_version == 9 else 9
            to_var = f'icd{to_version}cm'

            out_print = icd_mapping_temp[icd_mapping_temp[icd_var].isin(temp_codes)].iloc[:, :3] \
                .merge(icd_desc_dict[icd_version], how='left') \
                .merge(icd_desc_dict[to_version], how='left') \
                .sort_values(by=icd_var)

            out_print['in_codes'] = np.where(out_print[to_var].isin(temp.icd_code.values), 1, 0)
            
            file_out = f'{icd_category}_{icd_version}to{to_version}.csv'
            #out_print.to_csv(os.path.join(html_output_path, file_out), index=False)
            
            file.write(f'<h2>ICD {icd_version} to ICD {to_version} Mapping:</h2>')
            file.write(out_print.to_html(index=False))
            file.write('<hr>')

# Display links in the Jupyter notebook output
display(HTML('<h3>ICD categories:</h3>'))
display(HTML('<br>'.join(html_links)))

In [19]:
#Read in manually edited ICD code category file,
# merge in ICD code information.
icd_categories_out = pd.read_csv(os.path.join(PATH_DATA_RAW, 
                             'icd_categories_manual_edit.csv'))\
    .merge(icd[['icd_code','icd_version',
                'icd_desc','icd_duplicate',
                'ccsr_category9','ccsr_category10']], 
           how='left', on = ['icd_code'])\
    .sort_values(by=['icd_category','icd_version'])

icd_categories_out['ccsr_category'] = icd_categories_out.ccsr_category9\
                .fillna(icd_categories_out.ccsr_category10)
icd_categories_out = icd_categories_out.drop(columns=['ccsr_category9','ccsr_category10'])

#Write out
file_out = os.path.join(PATH_DATA, 'icd_categories_desc.csv')
icd_categories_out.to_csv(file_out, index = False)

In [20]:
path_use = os.path.join(PATH_DATA_ICD,
                    'project_icd_categories')
# Loop through the directory and identify .txt files
for filename in os.listdir(path_use):
    if filename.endswith(".txt"):
        path_filename = os.path.join(path_use, filename)
        #file_categories = os.path.join(path_use, filename)
        category_name = os.path.splitext(filename)[0]
        file_out = os.path.join(path_use, f'{category_name}.csv')
        # file_out = os.path.join(file_path, f'{category_name}.txt')
        with open(path_filename, 'r') as file:
            project_icd_categories = file.read().splitlines()
            project_out = icd_categories_out[\
                icd_categories_out.icd_category.isin(project_icd_categories)]
            project_out.to_csv(file_out, index=False)

# Print the list of all words
print(project_icd_categories)

['cld', 'hydrocephalus', 'ivh34', 'laparotomy', 'nec', 'nec3', 'pvl', 'rop345', 'shunt', 'ventilation']


In [21]:
print("""Ensure all ICD codes are non-duplicates, 
    for simplicity in identification.
If there are duplicates, a simple merge may cause errors.""")
icd_categories_out.icd_duplicate.value_counts()

Ensure all ICD codes are non-duplicates, 
    for simplicity in identification.
If there are duplicates, a simple merge may cause errors.


icd_duplicate
0    478
Name: count, dtype: int64

In [22]:
print('Number of ICD codes in each category:')
icd_categories_out.icd_category.value_counts()

Number of ICD codes in each category:


icd_category
surgery_indicated    358
pvl                   37
cmal_severe           28
rop345                15
hydrocephalus         11
nec                    9
cld                    5
ivh34                  4
ventilation            4
laparotomy             2
nec3                   2
shunt                  2
noscomial              1
Name: count, dtype: int64

In [23]:
print('Number of ICD codes by version')
pd.crosstab(icd_categories_out.icd_category, icd_categories_out.icd_version)

Number of ICD codes by version


icd_version,9,10
icd_category,Unnamed: 1_level_1,Unnamed: 2_level_1
cld,1,4
cmal_severe,13,15
hydrocephalus,0,11
ivh34,2,2
laparotomy,1,1
nec,5,4
nec3,1,1
noscomial,0,1
pvl,17,20
rop345,3,12


### Appendix


#### Spot check mappings, using [icd-mapper](https://pypi.org/project/icd-mappings/)  
<a id="cell-icd-map"></a>

Note several shortcomings with this tool:

* 1-1 mapping, simplification that NBER warns against ([FAQ #8](https://data.nber.org/gem/GEMs-CrosswalksBasicFAQ.pdf)).
* Older version of the data.

In [24]:
from icdmappings import Mapper
mapper = Mapper()
icd9code = 'E8614' 
icd9code = '7703' #Hemorrhage, doesnt show multiple options.
#icd9code = '85401'
#9->10
print(icd9code, '->', 
      mapper.map(icd9code, source='icd9', target='icd10'))

7703 -> P268


In [25]:
temp = icd_mapping_dict[9]
temp[temp.icd9cm=='7703']

Unnamed: 0,icd9cm,icd10cm,flag_values,flag_approximate,flag_nomap,flag_combination,scenario,choice,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10
12413,7703,P261,10000,1,0,0,0,0,NB PULMONARY HEMORRHAGE,Other perinatal conditions,Massive pulmonary hemorrhage originating in th...,Respiratory perinatal condition
12414,7703,P268,10000,1,0,0,0,0,NB PULMONARY HEMORRHAGE,Other perinatal conditions,Other pulmonary hemorrhages originating in the...,Respiratory perinatal condition


[Here is the line of code that produces this behavior](https://github.com/snovaisg/ICD-Mappings/blob/main/icdmappings/mappers/icd9_to_icd10.py):


```
58: mapping[icd9] = icd10
```

E.g. stores the last value in a single value dictionary, overwrites any previous values

In [26]:
#10->9
icd10code = 'R402234'
icd10code = 'P269'
icd10code = 'A0472' 
#Package appears to use an [older version of the data](https://github.com/snovaisg/ICD-Mappings/tree/main/icdmappings/data_files)
#A0472 - Added 2018 -  https://www.icd10data.com/ICD10CM/Codes/A00-B99/A00-A09/A04-/A04.72
print(icd10code, '->',
       mapper.map(icd10code, source='icd10', target='icd9') )

A0472 -> None


In [27]:
compute_info.info()

Elapsed Time: 0 hours, 0 minutes, 41 seconds
Current Memory Usage: 206 MB (0.2 GB)
Memory Change Since Last Call: 91 MB (0.1 GB)
Total Memory Change Since Instantiation: 91 MB (0.1 GB)
