In [4]:
require(tidyverse)
require(feather)
require(stringdist)

data_folder_path <- "C:\\Users\\javier\\WorkSpace"


# Load ICD9fi and ICD9CM

The table with the ICD9fi codes was shared with me privately, so this document do not show this table 

In [2]:
# load ICD10fi
thl_icd9fi <- read_feather(file.path(data_folder_path,
                                      "ICD9fi",
                                      "FINNGEN_ICD9fi.feather") )

In [32]:
# load concept table and get the ICD10who
concept <- read_feather(file.path(data_folder_path,
                                      "OMOP_vocabulary_v5",
                                      "CONCEPT.feather") )
concept_icd9 <- concept  %>% filter(vocabulary_id == "ICD9CM")%>% 
    select(concept_code, concept_name)  %>% arrange(concept_code) 

# Study codes

**Summary ICD9fi coding**

HUOMIO!! this is a personal interpretation, I am not sure this is true !!!

At first ICD9fi seems to mostly match ICD9CM as 
- Match up to the 4th digit level
- The 5th digit is a Letter in ICD9fi. Although many codes the leter convert to number (A->1, B->2, ...) works, for others this does not match. e.i. "^008"

**Summary ICD9CM coding**

https://www.ohdsi.org/web/wiki/doku.php?id=documentation:vocabulary:icd9cm

In [24]:
pattern = "^008"

thl_icd9fi  %>% filter(grepl(pattern,ICD9))

ICD9,ICD9LYH,ICD9TXT
<chr>,<chr>,<chr>
008,ENTERIT P.ORGANIS AL,ENTERITIS PER ORGANISMATA ALIA SPECIFICATA
0080A,ENTER E.COLI TOXINUM,ENTERITIS E. COLI PATHOGENICA PER TOXINUM
0080B,ENTER E.COLI INVASIV,ENTERITIS E. COLI PATHOGENICA INVASIVA
0081A,ENTERITIS ARIZONA,ENTERITIS ARIZONA
0082A,ENTER AEROBAC AEROGE,ENTERITIS AEROBACTER AEROGENES
0083A,ENTERIT PROTEU MIRAB,ENTERITIS PROTEUS MIRABILIS
0084A,ENTERITIS STAPHYLOCC,ENTERITIS STAPHYLOCOCCICA
0084B,ENTERITIS PSEUDOMON,ENTERITIS PSEUDOMONAS
0084C,ENTER YERS ENTEROCOL,ENTERITIS YERSINIA ENTEROCOLITICA
0084D,ENTER YERS PSEUDOTUB,ENTERITIS YERSINIA PSEUDOTUBERCULOSA


In [33]:
concept_icd9 %>% filter(grepl(pattern,concept_code))

concept_code,concept_name
<chr>,<chr>
8.0,Intestinal infections due to other organisms
8.0,Intestinal infection due to escherichia coli [E. coli]
8.0,"Intestinal infection due to E. coli, unspecified"
8.01,Intestinal infection due to enteropathogenic E. coli
8.02,Intestinal infection due to enterotoxigenic E. coli
8.03,Intestinal infection due to enteroinvasive E. coli
8.04,Intestinal infection due to enterohemorrhagic E. coli
8.09,Intestinal infection due to other intestinal E. coli infections
8.1,Intestinal infection due to arizona group of paracolon bacilli
8.2,Intestinal infection due to aerobacter aerogenes


In [26]:
concept_icd9  %>% nrow

In [27]:
thl_icd9fi  %>% nrow

### How many are a direct match at the 4th level ??

In [66]:
thl_icd9fi <- thl_icd9fi %>% 
    #round it to 4th digit and add dot (5th digit if it is an E code )
    mutate(ICD9CM = if_else(str_sub(ICD9, 0, 0)=="E",
                                   paste0(str_sub(ICD9, 0, 4), ".", str_sub(ICD9, 5, 5)),
                                   paste0(str_sub(ICD9, 0, 3), ".", str_sub(ICD9, 4, 4))
                                   )
    ) %>% 
    # remove dot if at the end
    mutate(ICD9CM = str_replace(ICD9CM, "\\.$", "")) 


In [73]:
# match at the 4th level
thl_icd9fi_4 <- inner_join( thl_icd9fi, 
                concept_icd9  %>% rename(ICD9CM = concept_code),
                by = "ICD9CM")

In [86]:
#the ones that dont match 
thl_icd9fi_4no <- thl_icd9fi  %>% filter( !(ICD9 %in% thl_icd9fi_4$ICD9)) 
#thl_icd9fi_4no  

ICD9,ICD9LYH,ICD9TXT,ICD9_procesed
<chr>,<chr>,<chr>,<chr>
0240A,MALLEUS,MALLEUS,24.0
0250A,MELIDIOOSI,MELIDIOOSI,25.0
0350A,ERYSIPELAS STREPTOCO,ERYSIPELAS STREPTOCOCCICA,35.0
0350B,ERYSIPELAS STAPHYLOC,ERYSIPELAS STAPHYLOCOCCICA,35.0
0350X,ERYSIPELAS ALIA/NUD,ERYSIPELAS ALIA DEFINITA SEU NUD,35.0
0370A,TETANUS,TETANUS,37.0


In [78]:
# match at the 3th level
thl_icd9fi_4no <- thl_icd9fi_4no  %>% 
    #round it to 4th digit and add dot (5th digit if it is an E code )
    mutate(ICD9CM = if_else(str_sub(ICD9, 0, 0)=="E",
                                   paste0(str_sub(ICD9, 0, 4)),
                                   paste0(str_sub(ICD9, 0, 3))
                                   )
    ) 

thl_icd9fi_3 <- inner_join( thl_icd9fi_4no, 
                            ICD9CM  %>% rename(ICD9CM = concept_code),
                            by = "ICD9CM")


In [88]:
#the ones that dont match 
thl_icd9fi_3no <- thl_icd9fi  %>% filter( !(ICD9 %in% c(thl_icd9fi_4$ICD9, thl_icd9fi_3$ICD9))) 
#thl_icd9fi_3no 

These that dont match at the 3th level are the "codes for external cause" which we dont care for diagnose 

# Proposed matching  

At the moment :
1. Match ICD9fi to ICD9CM at the 4th level
2. Round up to the 3th levles these that dont match at the 4th level
3. Ignore the no matching E codes

In [82]:
# Join all 
thl_icd9fi_matched <- bind_rows(
    # 4th level
    thl_icd9fi_4 %>% 
    mutate(ICD9CD_match_level = "4_digit_level" ), 
    # 3th level
    thl_icd9fi_3 %>% 
    mutate( ICD9CD_match_level = "3_digit_level" ), 
    # no match 
    thl_icd9fi_3no %>% 
    mutate( ICD9CD_match_level = "no_match" ), 
)%>% arrange(ICD9)  %>% 
mutate(ICD9CD_match_level = factor(ICD9CD_match_level, 
                                     levels = c("4_digit_level", 
                                                "3_digit_level", 
                                                "no_match")
                                    )
      )

In [83]:
thl_icd9fi_matched  %>% count(ICD9CD_match_level)

ICD9CD_match_level,n
<fct>,<int>
4_digit_level,8592
3_digit_level,162
no_match,358


In [85]:
# load ICD10fi
write_feather(thl_icd9fi_matched, file.path(data_folder_path,
                                      "ICD9fi",
                                      "THL_ICD9fi_matched_ICD9CM.feather") )