### Process ICD Codes

Python notebook for processing ICD-9-CM and ICD-9-CM codes, including mapping between these values, using publically available datasets.

##### 1. ICD-9-CM and ICD-10-CM

[ICD-9-CM](https://www.cms.gov/medicare/coding-billing/icd-10-codes/icd-9-cm-diagnosis-procedure-codes-abbreviated-and-full-code-titles), 
[ICD-10-CM codes](https://www.cdc.gov/nchs/icd/icd-10-cm/index.html), 
and Clinical Classification Software (CCS) codes from:

**[Healthcare Cost & Utilization Project](https://hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp)**  
Agency for Healthcare Research and Quality  
U.S. Department of Health and Human Services

* **[ICD-9-CM](https://hcup-us.ahrq.gov/toolssoftware/ccs/$DXREF%202008_Archive.csv)**
* **[ICD-10-CM](https://hcup-us.ahrq.gov/toolssoftware/ccsr/dxccsr.jsp)**

In this dataset, ICD-10 codes are specified with all periods (".") removed.

Note that ICD-9-CM and ICD-10-CM codes are mostly distinct but have a small overlap:
 [duplicate ICD-9, ICD-10 codes](https://www.hhs.gov/guidance/sites/default/files/hhs-guidance-documents/2015-duplicate_ICD10CM_codes.pdf).

#### 2. Mappings Between ICD-9-CM and ICD-10-CM

Mappings between ICD-9-CM and ICD-10-CM codes provided by:

**[The National Bureau of Economic Research](https://www.nber.org/)** (NBER)

NBER provides recommendations for a manual mapping process from ICD-9-CM to ICD-10-CM (*forward mapping*), and from ICD-10-CM to ICD-9-CM (*backwards mapping*).

We use the 2018 [General Equivalence Mappings (GEMs)](https://www.cms.gov/medicare/coding-billing/icd-10-codes/2018-icd-10-cm-gem), the [latest year with available data](https://www.cms.gov/medicare/coding-billing/icd-10-codes/2019-icd-10-cm).

Mappings are NOT a simple 1-1 mapping between 9-10 codes.
 There are codes with no matches,
 codes with multiple matches,
  and codes without matches in both mappings (9->10, 10->9).

**FAQs from NBER:**

* [Basic FAQ](https://data.nber.org/gem/GEMs-CrosswalksBasicFAQ.pdf)
* [Technical FAQ](https://data.nber.org/gem/GEMs-CrosswalksTechnicalFAQ.pdf)
* [User's guide](https://www.hhs.gov/guidance/sites/default/files/hhs-guidance-documents/GemsTechDoc_0.pdf)


Note: [A large NBER page regarding GEMs](https://www.nber.org/research/data/icd-9-cm-and-icd-10-cm-and-icd-10-pcs-crosswalk-or-general-equivalence-mappings) does not seem to have the most recent data.


#### 3. Tool for identifying matching ICD codes

Provides a tool to match ICD codes in user specified lists.

Demonstrates several problems with an existing open-source tool for this task,  [icd-mapper](https://pypi.org/project/icd-mappings/) (the first result when seaching 'icd mapping python').

Outputs:

#### 4. Augment manually-edited ICD category file with descriptions

### Notebook Outputs (csv files):

* [icd_9_10.csv](data/icd/icd_9_10.csv)  - Descriptions of all ICD-9CM and ICD-10-CM codes
* [icd9t010.csv](data/icd/icd9to10.csv) - NBER provided mapping from ICD-9-CM to ICD-10-CM.
* [icd10to9.csv](data/icd/icd10to9.csv) - NBER provided mapping from ICD-10-CM to ICD-9-CM.
* [icd_categories_desc.csv](data/icd/icd_categories_desc.csv) - ICD codes and descriptions for all categories included in this study.

In [26]:
#Imports
import os
import pickle
import numpy as np
import pandas as pd
from IPython.display import display

pd.set_option('display.max_rows', 500)

#Filenames and paths
PATH_DATA = os.path.join('data','icd')
PATH_DATA_RAW = os.path.join(PATH_DATA,'raw')
FILE_ICD9 = os.path.join(PATH_DATA_RAW,
                         'icd_9_ccsr_hhs.csv')
FILE_ICD9_CCSR = os.path.join(PATH_DATA_RAW,
                         'icd_9_ccsr_desc.csv')
FILE_ICD10 = os.path.join(PATH_DATA_RAW,
                         'icd_10_ccsr_hhs.csv')

#### Import and process ICD-9-CM and ICD-10-CM codes

##### ICD-9-CM

In [27]:
icd9 = pd.read_csv(FILE_ICD9)

#Remove the quotes from values
for column in ['icd_code', 'ccsr_code', 'ccsr_category']:
    icd9[column] = icd9[column].str.replace("'", "").str.strip()

#Load ccsr definitions
icd9_ccsr = pd.read_csv(FILE_ICD9_CCSR)
icd9 = icd9\
    .drop(columns='ccsr_category')\
    .merge(icd9_ccsr, how='left')
icd9['icd_version'] = 9

# icd9.head(2)
icd9.icd_code.value_counts().value_counts()

count
1    15072
Name: count, dtype: int64

In [28]:
#Display a single ICD-9-CM code
icd9[icd9.icd_code=='E8720']

Unnamed: 0,icd_code,ccsr_code,icd_desc,ccsr_category,icd_version
14564,E8720,2616,FAILURE STERILE SURGERY,E Codes: Adverse effects of medical care,9


In [29]:
#Caution, some codes have leading 0's
icd9[icd9.icd_code=='85400']

Unnamed: 0,icd_code,ccsr_code,icd_desc,ccsr_category,icd_version
10859,85400,233,BRAIN INJURY NEC,Intracranial injury,9


##### ICD-10-CM

In [30]:
icd10_full = pd.read_csv(FILE_ICD10)
 #Note this is a file associating ICD 10 codes with CCSR categories,
 # and as such, 
 # CONTAINS DUPLICATED ICD CODE VALUES
# # (e.. each Diagnosis has a primary category 
# (for both inpatient/outpatient))
# We use the primary, inpatient category, whenever applicable.

icd10_full['default_ccrs_inpatient'] = pd.Categorical(\
        icd10_full['default_ccrs_inpatient'],
        categories=['Y', 'N', 'X'], ordered=True)

icd10_full = icd10_full.sort_values(\
        by=['icd_code','default_ccrs_inpatient'])

icd10_full['icd_version'] = 10

icd10_full['inpatient_outpatient_match'] = np.where(\
    icd10_full.default_ccrs_inpatient==icd10_full.default_ccrs_outpatient, 1,0)
 
#We use the default CCRS inpatient 
icd10 = icd10_full\
        .copy()\
        .groupby('icd_code').first().reset_index()
# icd10.head(2)
icd10_full.icd_code.value_counts().value_counts()

count
1    66120
2     6521
3     1737
4      564
5       43
6        2
Name: count, dtype: int64

In [31]:
#Example of different CCSR codes
#Primary, secondary
icd_code = 'O23512'
#icd_code = 'A5039'
#icd_code = 'O99825'
icd10_full[icd10_full.icd_code==icd_code]

Unnamed: 0,icd_code,icd_desc,ccsr_code,ccsr_category,default_ccrs_inpatient,default_ccrs_outpatient,icd_version,inpatient_outpatient_match
24578,O23512,"Infections of cervix in pregnancy, second trim...",PRG028,Other specified complications in pregnancy,Y,Y,10,1
24577,O23512,"Infections of cervix in pregnancy, second trim...",GEN018,Inflammatory diseases of female pelvic organs,N,N,10,1


In [32]:
#Extract info for a specific ICD-9-CM code 
# (note, no periods in code)
icd10[icd10.icd_code=='E8589']

Unnamed: 0,icd_code,icd_desc,ccsr_code,ccsr_category,default_ccrs_inpatient,default_ccrs_outpatient,icd_version,inpatient_outpatient_match
4033,E8589,Other amyloidosis,END016,Other specified and unspecified nutritional an...,Y,Y,10,1


#### Create ICD-[9/10]-CM dataset

In [33]:
#Create description dataframes for ICD 9/10 codes
icd_desc_columns = ['icd_code','icd_desc', 'ccsr_category'] #'ccsr_code',
icd9_desc = icd9[icd_desc_columns]
icd10_desc = icd10[icd_desc_columns]

icd9_desc.columns.values[0] = 'icd9cm'
icd9_desc = icd9_desc\
    .rename(columns={c: f'{c}9' for c in icd9_desc.columns[1:]})
icd10_desc.columns.values[0] = 'icd10cm'
icd10_desc = icd10_desc\
    .rename(columns={c: f'{c}10' for c in icd10_desc.columns[1:]})

icd_desc_dict = {9:icd9_desc, 10:icd10_desc}

# Construct Single ICD dataframe
icd = pd.concat([icd9\
                 .rename(columns={'ccsr_code':'ccsr_code9',
                    'ccsr_category':'ccsr_category9'}), 
                icd10.rename(columns={'ccsr_code':'ccsr_code10',
                    'ccsr_category':'ccsr_category10'}) ], axis=0)\
                .sort_values(by='icd_code')


temp = icd.icd_code.value_counts()
duplicate_icd_codes = list(temp[temp>1].index)
icd['icd_duplicate'] = np.where(icd.icd_code.isin(duplicate_icd_codes),1,0)         
file_out = os.path.join(PATH_DATA, 'icd_9_10.csv')
icd.to_csv(file_out, index=False)   

### Duplicate ICD Values

In [34]:
icd9[icd9.icd_code=='E8589']

Unnamed: 0,icd_code,ccsr_code,icd_desc,ccsr_category,icd_version
14422,E8589,2613,ACC POISONING-DRUG NOS,E Codes: Poisoning,9


In [35]:
icd10[icd10.icd_code=='E8589']

Unnamed: 0,icd_code,icd_desc,ccsr_code,ccsr_category,default_ccrs_inpatient,default_ccrs_outpatient,icd_version,inpatient_outpatient_match
4033,E8589,Other amyloidosis,END016,Other specified and unspecified nutritional an...,Y,Y,10,1


We identify the same codes as:
https://www.hhs.gov/guidance/sites/default/files/hhs-guidance-documents/2015-duplicate_ICD10CM_codes.pdf


We identify several additional duplicate codes, as we remove all periods from codes (e.g. E8589 (*ICD-9-CM*), E85.89 (*ICD-10-CM*)).

In [36]:
#Note that this is NOT a 1-1 mapping

#Filename, fixed width column specs
icd_cm_mapping_dict = {(9,10):("2018_I9gem.txt",
                         [(0, 6), (6, 14), (14, 19)]),
                    (10, 9):("2018_I10gem.txt",
                         [(0, 8), (8, 14), (14, 19)])}
icd_mapping_dict = dict()
for (icd_from, icd_to), (file_name, file_colspec) in icd_cm_mapping_dict.items():
    from_var=f'icd{icd_from}cm'
    to_var=f'icd{icd_to}cm'
    to_code = f'ccsr_code{icd_to}'

    path_file = os.path.join(PATH_DATA_RAW, file_name)
    icd_mapping = pd.read_fwf(path_file,
        sep='  ', 
    colspecs=file_colspec,
    header=None,
     names=[from_var, to_var, 'flag_values'],
     dtype={'flag_values':str})
    icd_mapping
    # # Using DataFrame apply to simplify flag parsing
    flags = icd_mapping['flag_values']\
        .apply(lambda x: \
            pd.Series(list(x))\
                .astype(int))
    flags.columns = ['flag_approximate', 
                    'flag_nomap', 
                    'flag_combination',
                    'scenario', 
                    'choice']
    icd_mapping = pd.concat([icd_mapping, flags], axis=1)

#    #Merge in description #[[from_var, to_var]]\
    icd_mapping = icd_mapping\
        .merge(icd_desc_dict[icd_from], how='left')\
        .merge(icd_desc_dict[(-icd_from+19)], how='left')\
            .sort_values(by=from_var)
    
    out_file_mapping_desc = os.path.join(PATH_DATA, f'icd{icd_from}to{icd_to}.csv')
    icd_mapping.to_csv(out_file_mapping_desc, index=False)

#   #Save mapping for later usage
    icd_mapping_dict[icd_from] = icd_mapping

icd_mapping.head(2)

Unnamed: 0,icd10cm,icd9cm,flag_values,flag_approximate,flag_nomap,flag_combination,scenario,choice,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9
0,A000,10,0,0,0,0,0,0,"Cholera due to Vibrio cholerae 01, biovar chol...",Intestinal infection,CHOLERA D/T VIB CHOLERAE,Intestinal infection
1,A001,11,0,0,0,0,0,0,"Cholera due to Vibrio cholerae 01, biovar eltor",Intestinal infection,CHOLERA D/T VIB EL TOR,Intestinal infection


In [37]:
#Print all flags for inspection
def other_icd_version(icd_version):
    #Map 9->10, 10->9
    return(-icd_version+19)
for icd_version in [9, 10]:
    mapping_use = icd_mapping_dict[icd_version]
    print(icd_version, '->', other_icd_version(icd_version))
    print(mapping_use.flag_values.value_counts())
    print()

9 -> 10
flag_values
10000    18578
00000     3522
10112     1151
10111      853
11000      422
10122      115
10121       71
10132       48
10131       34
10142       26
10141       20
10161        6
10151        6
10152        2
10162        2
10123        2
10133        2
Name: count, dtype: int64

10 -> 9
flag_values
10000    68307
10111     4480
10112     4254
00000     3522
11000      731
10113      159
10122       53
10121       53
10114       26
10132        2
10131        2
10152        1
10142        1
10141        1
10151        1
Name: count, dtype: int64



#### User defined categories

In [38]:
icd_categories = pd.read_csv(
    os.path.join(PATH_DATA_RAW,
        'icd_categories_initial.csv'))

icd_categories = icd_categories\
    .merge(icd[['icd_code',
                'icd_version']],
         how = 'left')
icd_categories.head(2)

Unnamed: 0,icd_category,icd_code,icd_version
0,cmal_severe,7582,9.0
1,cmal_severe,75672,9.0


In [39]:
print('Number of ICD codes per category:' ,
      icd_categories.icd_category.value_counts())

Number of ICD codes per category: icd_category
cmal_severe    24
pvl            20
rop345         15
nec             9
sepsis          6
cld             5
ivh34           4
laparotomy      2
Name: count, dtype: int64


In [40]:
#This is a manual, inspection based step.
for icd_category in list(set(icd_categories.icd_category)):
    print(icd_category)
    temp = icd_categories[\
        icd_categories.icd_category==icd_category]\
        .sort_values(by=['icd_version','icd_code'])\
        .merge(icd[['icd_code','icd_desc']], how='left')

    print('Number of codes from each version:',temp.icd_version.value_counts())
    print('ICD codes:',temp.icd_code.values)
    print()

    #Show the translations 9->10, 10->9
    for icd_version in [9, 10]:
        temp_codes = temp[temp.icd_version==icd_version].icd_code
        #print(temp_codes)
        icd_mapping_temp = icd_mapping_dict[icd_version]
        icd_var = f'icd{icd_version}cm'
        to_version = other_icd_version(icd_version)
        to_var = f'icd{to_version}cm'

        out_print = icd_mapping_temp[icd_mapping_temp[icd_var]\
                                    .isin(temp_codes)].iloc[:,:3]\
        .merge(icd_desc_dict[icd_version], how='left')\
            .merge(icd_desc_dict[to_version], how='left')\
            .sort_values(by=icd_var)
        file_out = f'{icd_category}_{icd_version}to{to_version}.csv'
        out_print['in_codes'] = np.where(out_print[to_var].isin(temp.icd_code.values),1,0)
        out_print.to_csv(os.path.join(PATH_DATA,
                                      'icd_mapping',
                                    file_out),
                index=False)
            
        print('ICD:',icd_version)
        print('Number of unique values: ', out_print[icd_var].value_counts().value_counts())
        print(out_print.flag_values.value_counts())
        display(out_print)
        print()

sepsis
Number of codes from each version: icd_version
10.0    4
9.0     2
Name: count, dtype: int64
ICD codes: ['77181' '99592' 'P364' 'P368' 'P369' 'R6521']

ICD: 9
Number of unique values:  count
1    2
Name: count, dtype: int64
flag_values
10000    2
Name: count, dtype: int64


Unnamed: 0,icd9cm,icd10cm,flag_values,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10,in_codes
0,77181,P369,10000,SEPTICEMIA [SEPSIS] OF NEWBORN (Begin 2002),Septicemia (except in labor),"Bacterial sepsis of newborn, unspecified",Perinatal infections,1
1,99592,R6520,10000,SYS INFLAM / INFECTI W ORGAN DYSFUNCTI (Begin ...,Septicemia (except in labor),Severe sepsis without septic shock,Septicemia,0



ICD: 10
Number of unique values:  count
1    3
2    1
Name: count, dtype: int64
flag_values
10000    3
10111    1
10112    1
Name: count, dtype: int64


Unnamed: 0,icd10cm,icd9cm,flag_values,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9,in_codes
0,P364,77181,10000,Sepsis of newborn due to Escherichia coli,Perinatal infections,SEPTICEMIA [SEPSIS] OF NEWBORN (Begin 2002),Septicemia (except in labor),1
1,P368,77181,10000,Other bacterial sepsis of newborn,Perinatal infections,SEPTICEMIA [SEPSIS] OF NEWBORN (Begin 2002),Septicemia (except in labor),1
2,P369,77181,10000,"Bacterial sepsis of newborn, unspecified",Perinatal infections,SEPTICEMIA [SEPSIS] OF NEWBORN (Begin 2002),Septicemia (except in labor),1
3,R6521,78552,10111,Severe sepsis with septic shock,Septicemia,SEPTIC SHOCK (Begin 2003),Shock,0
4,R6521,99592,10112,Severe sepsis with septic shock,Septicemia,SYS INFLAM / INFECTI W ORGAN DYSFUNCTI (Begin ...,Septicemia (except in labor),1



laparotomy
Number of codes from each version: icd_version
9.0     1
10.0    1
Name: count, dtype: int64
ICD codes: ['V6441' 'Z5331']

ICD: 9
Number of unique values:  count
1    1
Name: count, dtype: int64
flag_values
00000    1
Name: count, dtype: int64


Unnamed: 0,icd9cm,icd10cm,flag_values,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10,in_codes
0,V6441,Z5331,0,LAPAROSCOPIC SURGICAL PROCEDURE CONVERTED TO (...,Residual codes; unclassified,Laparoscopic surgical procedure converted to o...,Other specified status,1



ICD: 10
Number of unique values:  count
1    1
Name: count, dtype: int64
flag_values
00000    1
Name: count, dtype: int64


Unnamed: 0,icd10cm,icd9cm,flag_values,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9,in_codes
0,Z5331,V6441,0,Laparoscopic surgical procedure converted to o...,Other specified status,LAPAROSCOPIC SURGICAL PROCEDURE CONVERTED TO (...,Residual codes; unclassified,1



pvl
Number of codes from each version: icd_version
9.0    17
Name: count, dtype: int64
ICD codes: ['7797' '85400' '85401' '85402' '85403' '85404' '85405' '85406' '85409'
 '85410' '85411' '85412' '85413' '85414' '85415' '85416' '85419' '854'
 '8540' '8541']

ICD: 9
Number of unique values:  count
2    7
4    4
3    4
1    2
Name: count, dtype: int64
flag_values
10000    21
10111    14
10112     9
Name: count, dtype: int64


Unnamed: 0,icd9cm,icd10cm,flag_values,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10,in_codes
0,7797,P912,10000,PREVENTRICULAR LEUKOMALACIA (Begin 2001),Other perinatal conditions,Neonatal cerebral leukomalacia,Neonatal cerebral disorders,0
1,85400,S06890A,10000,BRAIN INJURY NEC,Intracranial injury,Other specified intracranial injury without lo...,"Traumatic brain injury (TBI); concussion, init...",0
2,85401,S069X0A,10000,BRAIN INJURY NEC-NO COMA,Intracranial injury,Unspecified intracranial injury without loss o...,"Traumatic brain injury (TBI); concussion, init...",0
3,85401,S061X0A,10000,BRAIN INJURY NEC-NO COMA,Intracranial injury,Traumatic cerebral edema without loss of consc...,"Traumatic brain injury (TBI); concussion, init...",0
4,85401,S06890A,10000,BRAIN INJURY NEC-NO COMA,Intracranial injury,Other specified intracranial injury without lo...,"Traumatic brain injury (TBI); concussion, init...",0
5,85402,S069X2A,10000,BRAIN INJ NEC-BRIEF COMA,Intracranial injury,Unspecified intracranial injury with loss of c...,"Traumatic brain injury (TBI); concussion, init...",0
6,85402,S069X1A,10000,BRAIN INJ NEC-BRIEF COMA,Intracranial injury,Unspecified intracranial injury with loss of c...,"Traumatic brain injury (TBI); concussion, init...",0
7,85402,S061X1A,10000,BRAIN INJ NEC-BRIEF COMA,Intracranial injury,Traumatic cerebral edema with loss of consciou...,"Traumatic brain injury (TBI); concussion, init...",0
8,85402,S061X2A,10000,BRAIN INJ NEC-BRIEF COMA,Intracranial injury,Traumatic cerebral edema with loss of consciou...,"Traumatic brain injury (TBI); concussion, init...",0
12,85403,S069X4A,10000,BRAIN INJ NEC-MOD COMA,Intracranial injury,Unspecified intracranial injury with loss of c...,"Traumatic brain injury (TBI); concussion, init...",0



ICD: 10
Number of unique values:  Series([], Name: count, dtype: int64)
Series([], Name: count, dtype: int64)


Unnamed: 0,icd10cm,icd9cm,flag_values,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9,in_codes



cld
Number of codes from each version: icd_version
10.0    4
9.0     1
Name: count, dtype: int64
ICD codes: ['7707' 'P270' 'P271' 'P278' 'P279']

ICD: 9
Number of unique values:  count
3    1
Name: count, dtype: int64
flag_values
10000    3
Name: count, dtype: int64


Unnamed: 0,icd9cm,icd10cm,flag_values,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10,in_codes
0,7707,P278,10000,PERINATAL CHR RESP DIS,Other perinatal conditions,Other chronic respiratory diseases originating...,Respiratory perinatal condition,1
1,7707,P270,10000,PERINATAL CHR RESP DIS,Other perinatal conditions,Wilson-Mikity syndrome,Respiratory perinatal condition,1
2,7707,P271,10000,PERINATAL CHR RESP DIS,Other perinatal conditions,Bronchopulmonary dysplasia originating in the ...,Respiratory perinatal condition,1



ICD: 10
Number of unique values:  count
1    4
Name: count, dtype: int64
flag_values
10000    4
Name: count, dtype: int64


Unnamed: 0,icd10cm,icd9cm,flag_values,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9,in_codes
0,P270,7707,10000,Wilson-Mikity syndrome,Respiratory perinatal condition,PERINATAL CHR RESP DIS,Other perinatal conditions,1
1,P271,7707,10000,Bronchopulmonary dysplasia originating in the ...,Respiratory perinatal condition,PERINATAL CHR RESP DIS,Other perinatal conditions,1
2,P278,7707,10000,Other chronic respiratory diseases originating...,Respiratory perinatal condition,PERINATAL CHR RESP DIS,Other perinatal conditions,1
3,P279,7707,10000,Unspecified chronic respiratory disease origin...,Respiratory perinatal condition,PERINATAL CHR RESP DIS,Other perinatal conditions,1



cmal_severe
Number of codes from each version: icd_version
9.0     13
10.0    11
Name: count, dtype: int64
ICD codes: ['7420' '74300' '74510' '74511' '7453' '7467' '74741' '75310' '7566'
 '75672' '7581' '7582' '7594' 'Q012' 'Q201' 'Q203' 'Q204' 'Q208' 'Q234'
 'Q606' 'Q651' 'Q8901' 'Q913' 'Q917']

ICD: 9
Number of unique values:  count
1    11
2     2
Name: count, dtype: int64
flag_values
10000    8
00000    7
Name: count, dtype: int64


Unnamed: 0,icd9cm,icd10cm,flag_values,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10,in_codes
0,7420,Q019,10000,ENCEPHALOCELE,Nervous system congenital anomalies,"Encephalocele, unspecified",Nervous system congenital anomalies,0
1,74300,Q111,0,CLINIC ANOPHTHALMOS NOS,Other congenital anomalies,Other anophthalmos,"Congenital malformations of eye, ear, face, neck",0
2,74510,Q203,10000,COMPL TRANSPOS GREAT VES,Cardiac and circulatory congenital anomalies,Discordant ventriculoarterial connection,Cardiac and circulatory congenital anomalies,1
3,74511,Q201,0,DOUBLE OUTLET RT VENTRIC,Cardiac and circulatory congenital anomalies,Double outlet right ventricle,Cardiac and circulatory congenital anomalies,1
4,7453,Q204,0,COMMON VENTRICLE,Cardiac and circulatory congenital anomalies,Double inlet ventricle,Cardiac and circulatory congenital anomalies,1
5,7467,Q234,0,HYPOPLAS LEFT HEART SYND,Cardiac and circulatory congenital anomalies,Hypoplastic left heart syndrome,Cardiac and circulatory congenital anomalies,1
6,74741,Q262,0,TOT ANOM PULM VEN CONNEC,Cardiac and circulatory congenital anomalies,Total anomalous pulmonary venous connection,Cardiac and circulatory congenital anomalies,0
7,75310,Q6100,10000,CYSTIC KIDNEY DISEAS NOS (Begin 1990),Genitourinary congenital anomalies,"Congenital renal cyst, unspecified",Genitourinary congenital anomalies,0
8,75310,Q619,10000,CYSTIC KIDNEY DISEAS NOS (Begin 1990),Genitourinary congenital anomalies,"Cystic kidney disease, unspecified",Genitourinary congenital anomalies,0
9,7566,Q791,10000,ANOMALIES OF DIAPHRAGM,Other congenital anomalies,Other congenital malformations of diaphragm,Musculoskeletal congenital conditions,0



ICD: 10
Number of unique values:  count
1    9
3    1
2    1
Name: count, dtype: int64
flag_values
10000    10
00000     4
Name: count, dtype: int64


Unnamed: 0,icd10cm,icd9cm,flag_values,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9,in_codes
0,Q012,7420,10000,Occipital encephalocele,Nervous system congenital anomalies,ENCEPHALOCELE,Nervous system congenital anomalies,1
1,Q201,74511,0,Double outlet right ventricle,Cardiac and circulatory congenital anomalies,DOUBLE OUTLET RT VENTRIC,Cardiac and circulatory congenital anomalies,1
2,Q203,74510,10000,Discordant ventriculoarterial connection,Cardiac and circulatory congenital anomalies,COMPL TRANSPOS GREAT VES,Cardiac and circulatory congenital anomalies,1
3,Q203,74519,10000,Discordant ventriculoarterial connection,Cardiac and circulatory congenital anomalies,TRANSPOS GREAT VESS NEC,Cardiac and circulatory congenital anomalies,0
4,Q204,7453,0,Double inlet ventricle,Cardiac and circulatory congenital anomalies,COMMON VENTRICLE,Cardiac and circulatory congenital anomalies,1
5,Q208,74519,10000,Other congenital malformations of cardiac cham...,Cardiac and circulatory congenital anomalies,TRANSPOS GREAT VESS NEC,Cardiac and circulatory congenital anomalies,0
6,Q208,7457,10000,Other congenital malformations of cardiac cham...,Cardiac and circulatory congenital anomalies,COR BILOCULARE,Cardiac and circulatory congenital anomalies,0
7,Q208,7458,10000,Other congenital malformations of cardiac cham...,Cardiac and circulatory congenital anomalies,SEPTAL CLOSURE ANOM NEC,Cardiac and circulatory congenital anomalies,0
8,Q234,7467,0,Hypoplastic left heart syndrome,Cardiac and circulatory congenital anomalies,HYPOPLAS LEFT HEART SYND,Cardiac and circulatory congenital anomalies,1
9,Q606,7530,10000,Potters syndrome,Genitourinary congenital anomalies,RENAL AGENESIS,Genitourinary congenital anomalies,0



nec
Number of codes from each version: icd_version
9.0     5
10.0    4
Name: count, dtype: int64
ICD codes: ['7775' '77750' '77751' '77752' '77753' 'P771' 'P772' 'P773' 'P779']

ICD: 9
Number of unique values:  count
1    4
Name: count, dtype: int64
flag_values
00000    4
Name: count, dtype: int64


Unnamed: 0,icd9cm,icd10cm,flag_values,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10,in_codes
0,77750,P779,0,NEC ENTEROCOLTIS NB NOS (Begin 2008),Other perinatal conditions,"Necrotizing enterocolitis in newborn, unspecified",Neonatal digestive and feeding disorders,1
1,77751,P771,0,STG I NEC ENTEROCOL NB (Begin 2008),Other perinatal conditions,Stage 1 necrotizing enterocolitis in newborn,Neonatal digestive and feeding disorders,1
2,77752,P772,0,STG II NEC ENTEROCOL NB (Begin 2008),Other perinatal conditions,Stage 2 necrotizing enterocolitis in newborn,Neonatal digestive and feeding disorders,1
3,77753,P773,0,STG III NEC ENTEROCOL NB (Begin 2008),Other perinatal conditions,Stage 3 necrotizing enterocolitis in newborn,Neonatal digestive and feeding disorders,1



ICD: 10
Number of unique values:  count
1    4
Name: count, dtype: int64
flag_values
00000    4
Name: count, dtype: int64


Unnamed: 0,icd10cm,icd9cm,flag_values,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9,in_codes
0,P771,77751,0,Stage 1 necrotizing enterocolitis in newborn,Neonatal digestive and feeding disorders,STG I NEC ENTEROCOL NB (Begin 2008),Other perinatal conditions,1
1,P772,77752,0,Stage 2 necrotizing enterocolitis in newborn,Neonatal digestive and feeding disorders,STG II NEC ENTEROCOL NB (Begin 2008),Other perinatal conditions,1
2,P773,77753,0,Stage 3 necrotizing enterocolitis in newborn,Neonatal digestive and feeding disorders,STG III NEC ENTEROCOL NB (Begin 2008),Other perinatal conditions,1
3,P779,77750,0,"Necrotizing enterocolitis in newborn, unspecified",Neonatal digestive and feeding disorders,NEC ENTEROCOLTIS NB NOS (Begin 2008),Other perinatal conditions,1



ivh34
Number of codes from each version: icd_version
9.0     2
10.0    2
Name: count, dtype: int64
ICD codes: ['77213' '77214' 'P5221' 'P5222']

ICD: 9
Number of unique values:  count
1    2
Name: count, dtype: int64
flag_values
00000    2
Name: count, dtype: int64


Unnamed: 0,icd9cm,icd10cm,flag_values,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10,in_codes
0,77213,P5221,0,INTRAVENT HEMORRHAGE GRADE III (Begin 2001),Other perinatal conditions,"Intraventricular (nontraumatic) hemorrhage, gr...",Hemorrhagic and hematologic disorders of newborn,1
1,77214,P5222,0,INTRAVENT HEMORRHAGE GRADE IV (Begin 2001),Other perinatal conditions,"Intraventricular (nontraumatic) hemorrhage, gr...",Hemorrhagic and hematologic disorders of newborn,1



ICD: 10
Number of unique values:  count
1    2
Name: count, dtype: int64
flag_values
00000    2
Name: count, dtype: int64


Unnamed: 0,icd10cm,icd9cm,flag_values,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9,in_codes
0,P5221,77213,0,"Intraventricular (nontraumatic) hemorrhage, gr...",Hemorrhagic and hematologic disorders of newborn,INTRAVENT HEMORRHAGE GRADE III (Begin 2001),Other perinatal conditions,1
1,P5222,77214,0,"Intraventricular (nontraumatic) hemorrhage, gr...",Hemorrhagic and hematologic disorders of newborn,INTRAVENT HEMORRHAGE GRADE IV (Begin 2001),Other perinatal conditions,1



rop345
Number of codes from each version: icd_version
10.0    12
9.0      3
Name: count, dtype: int64
ICD codes: ['36225' '36226' '36227' 'H35141' 'H35142' 'H35143' 'H35149' 'H35151'
 'H35152' 'H35153' 'H35159' 'H35161' 'H35162' 'H35163' 'H35169']

ICD: 9
Number of unique values:  count
1    3
Name: count, dtype: int64
flag_values
10000    3
Name: count, dtype: int64


Unnamed: 0,icd9cm,icd10cm,flag_values,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10,in_codes
0,36225,H35149,10000,RETINOPH PREMATRSTAGE 3 (Begin 2008),Retinal detachments; defects; vascular occlusi...,"Retinopathy of prematurity, stage 3, unspecifi...",Retinal and vitreous conditions,1
1,36226,H35159,10000,RETINOPH PREMATR.STAGE 4 (Begin 2008),Retinal detachments; defects; vascular occlusi...,"Retinopathy of prematurity, stage 4, unspecifi...",Retinal and vitreous conditions,1
2,36227,H35169,10000,RETINOPH PREMATRSTAGE 5 (Begin 2008),Retinal detachments; defects; vascular occlusi...,"Retinopathy of prematurity, stage 5, unspecifi...",Retinal and vitreous conditions,1



ICD: 10
Number of unique values:  count
1    12
Name: count, dtype: int64
flag_values
10000    12
Name: count, dtype: int64


Unnamed: 0,icd10cm,icd9cm,flag_values,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9,in_codes
0,H35141,36225,10000,"Retinopathy of prematurity, stage 3, right eye",Retinal and vitreous conditions,RETINOPH PREMATRSTAGE 3 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
1,H35142,36225,10000,"Retinopathy of prematurity, stage 3, left eye",Retinal and vitreous conditions,RETINOPH PREMATRSTAGE 3 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
2,H35143,36225,10000,"Retinopathy of prematurity, stage 3, bilateral",Retinal and vitreous conditions,RETINOPH PREMATRSTAGE 3 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
3,H35149,36225,10000,"Retinopathy of prematurity, stage 3, unspecifi...",Retinal and vitreous conditions,RETINOPH PREMATRSTAGE 3 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
4,H35151,36226,10000,"Retinopathy of prematurity, stage 4, right eye",Retinal and vitreous conditions,RETINOPH PREMATR.STAGE 4 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
5,H35152,36226,10000,"Retinopathy of prematurity, stage 4, left eye",Retinal and vitreous conditions,RETINOPH PREMATR.STAGE 4 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
6,H35153,36226,10000,"Retinopathy of prematurity, stage 4, bilateral",Retinal and vitreous conditions,RETINOPH PREMATR.STAGE 4 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
7,H35159,36226,10000,"Retinopathy of prematurity, stage 4, unspecifi...",Retinal and vitreous conditions,RETINOPH PREMATR.STAGE 4 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
8,H35161,36227,10000,"Retinopathy of prematurity, stage 5, right eye",Retinal and vitreous conditions,RETINOPH PREMATRSTAGE 5 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1
9,H35162,36227,10000,"Retinopathy of prematurity, stage 5, left eye",Retinal and vitreous conditions,RETINOPH PREMATRSTAGE 5 (Begin 2008),Retinal detachments; defects; vascular occlusi...,1






#### Spot check mappings, using [icd-mapper](https://pypi.org/project/icd-mappings/) 

Note several shortcomings with this tool:

* 1-1 mapping, simplification that NBER warns against ([FAQ #8](https://data.nber.org/gem/GEMs-CrosswalksBasicFAQ.pdf)).
* Older version of the data.

In [41]:
from icdmappings import Mapper
mapper = Mapper()
icd9code = 'E8614' 
icd9code = '7703' #Hemorrhage, doesnt show multiple options.
#icd9code = '85401'
#9->10
print(icd9code, '->', 
      mapper.map(icd9code, source='icd9', target='icd10'))

7703 -> P268


In [42]:
temp = icd_mapping_dict[9]
temp[temp.icd9cm=='7703']

Unnamed: 0,icd9cm,icd10cm,flag_values,flag_approximate,flag_nomap,flag_combination,scenario,choice,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10
12413,7703,P261,10000,1,0,0,0,0,NB PULMONARY HEMORRHAGE,Other perinatal conditions,Massive pulmonary hemorrhage originating in th...,Respiratory perinatal condition
12414,7703,P268,10000,1,0,0,0,0,NB PULMONARY HEMORRHAGE,Other perinatal conditions,Other pulmonary hemorrhages originating in the...,Respiratory perinatal condition


[Here is the line of code that produces this behavior](https://github.com/snovaisg/ICD-Mappings/blob/main/icdmappings/mappers/icd9_to_icd10.py):


```
58: mapping[icd9] = icd10
```

E.g. stores the last value in a single value dictionary, overwrites any previous values

In [43]:
#10->9
icd10code = 'R402234'
icd10code = 'P269'
icd10code = 'A0472' 
#Believe this package uses an [older version of the data](https://github.com/snovaisg/ICD-Mappings/tree/main/icdmappings/data_files)
#A0472 - Added 2018 -  https://www.icd10data.com/ICD10CM/Codes/A00-B99/A00-A09/A04-/A04.72
print(icd10code, '->',
       mapper.map(icd10code, source='icd10', target='icd9') )

A0472 -> None


In [44]:
temp = icd_mapping_dict[10]
temp[temp.icd10cm=='A0472']

Unnamed: 0,icd10cm,icd9cm,flag_values,flag_approximate,flag_nomap,flag_combination,scenario,choice,icd_desc10,ccsr_category10,icd_desc9,ccsr_category9
41,A0472,845,10000,1,0,0,0,0,"Enterocolitis due to Clostridium difficile, no...",Intestinal infection,CLOSTRIDIUM DIF (Begin 1992),Intestinal infection


In [45]:
temp = icd_mapping_dict[9]
temp[temp.icd9cm=='V235']

Unnamed: 0,icd9cm,icd10cm,flag_values,flag_approximate,flag_nomap,flag_combination,scenario,choice,icd_desc9,ccsr_category9,icd_desc10,ccsr_category10
22162,V235,O09291,10000,1,0,0,0,0,PREG W POOR REPRODUCT HX,Other complications of birth; puerperium affec...,Supervision of pregnancy with other poor repro...,Supervision of high-risk pregnancy


In [46]:
icd9[icd9.icd_code=='V235']

Unnamed: 0,icd_code,ccsr_code,icd_desc,ccsr_category,icd_version
7810,V235,195,PREG W POOR REPRODUCT HX,Other complications of birth; puerperium affec...,9


In [47]:
icd10[icd10.icd_code=='O09291']

Unnamed: 0,icd_code,icd_desc,ccsr_code,ccsr_category,default_ccrs_inpatient,default_ccrs_outpatient,icd_version,inpatient_outpatient_match
20169,O09291,Supervision of pregnancy with other poor repro...,PRG008,Supervision of high-risk pregnancy,X,Y,10,0


In [48]:
#REad in manually augmeneted
icd_categories_out = pd.read_csv(os.path.join(PATH_DATA_RAW, 
                             'icd_categories_manual_edit.csv'))\
    .merge(icd[['icd_code','icd_version','icd_desc','icd_duplicate']], 
           how='left')\
    .sort_values(by=['icd_category','icd_version'])
file_out = os.path.join(
    PATH_DATA, 'icd_categories_desc.csv'
)
icd_categories_out.to_csv(file_out, index = False)
icd_categories_out.head(5)

Unnamed: 0,icd_category,icd_code,icd_version,icd_desc,icd_duplicate
98,cld,7707,9.0,PERINATAL CHR RESP DIS,0.0
99,cld,P270,10.0,Wilson-Mikity syndrome,0.0
100,cld,P271,10.0,Bronchopulmonary dysplasia originating in the ...,0.0
101,cld,P278,10.0,Other chronic respiratory diseases originating...,0.0
102,cld,P279,10.0,Unspecified chronic respiratory disease origin...,0.0


In [49]:
icd_categories_out.icd_category.value_counts()

icd_category
cmal_severe               38
pvl                       37
rop345                    15
nec                        9
sepsis                     8
cld                        5
ivh34                      4
laparotomy                 2
pregnancy_history_risk     2
Name: count, dtype: int64

In [50]:
icd_categories_out.icd_duplicate.value_counts()

icd_duplicate
0.0    119
Name: count, dtype: int64