Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when call icdDxToxxx #36

Open
ningxuca opened this issue Jul 15, 2022 · 18 comments
Open

error when call icdDxToxxx #36

ningxuca opened this issue Jul 15, 2022 · 18 comments

Comments

@ningxuca
Copy link

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in 5441218 rows; more than 5423694 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
My code is
css <- icdDxToCCS(dxDataFile = diag2,idColName = PTID,icdColName = DIAGNOSIS_CD, icdVerColName = type, dateColName = DIAG_DATE)

Dataset diag2 is
PTID ENCID DIAG_DATE DIAGNOSIS_CD type
1 PT608273746 E0000025104028467 2018-12-29 Z0000 10
2 PT229616319 E0000005436124507 2011-01-10 7295 9
3 PT608345956 E0000025163571021 2018-02-28 Z0000 10
4 PT608361660 E0000025142225399 2018-10-22 Z0000 10
5 PT235121286 E0000005387281824 2011-11-28 7295 9
6 PT240024801 E0000005371647449 2011-12-05 7295 9
7 PT240663058 E0000026942225637 2011-06-22 7295 9
8 PT154968782 E0000002253904854 2011-08-04 42833 9
9 PT154968782 E0000002253904855 2011-08-06 42833 9
10 PT154968782 E0000002253904856 2011-08-07 42833 9

@yijutseng
Copy link
Contributor

Hello,

I copy and paste the diag2 into an excel file and test the codes, but I cannot reproduce the error.

library(dxpr)
library(readxl)
diag2 <- read_excel("dxpr_example.xlsx",
                                     col_types = c("text", "text", "date", "text", "numeric"))
css <- icdDxToCCS(dxDataFile = diag2,
                                  idColName = PTID,
                                  icdColName = DIAGNOSIS_CD, 
                                  icdVerColName = type, 
                                  dateColName = DIAG_DATE)

dxpr_example.xlsx
(I made some duplication to test the function)

May I know the sessionInfo() of your environment when you get the error message?

My test environment:

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readxl_1.3.1 dxpr_0.9.0  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.2      rstudioapi_0.13   magrittr_2.0.3    tidyselect_1.1.2 
 [5] munsell_0.5.0     colorspace_2.0-3  R6_2.5.1          rlang_1.0.2      
 [9] fansi_1.0.3       dplyr_1.0.8       tools_4.1.3       grid_4.1.3       
[13] data.table_1.14.2 gtable_0.3.0      utf8_1.2.2        cli_3.3.0        
[17] DBI_1.1.2         ellipsis_0.3.2    assertthat_0.2.1  tibble_3.1.7     
[21] lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4       ggplot2_3.3.6    
[25] vctrs_0.4.1       glue_1.6.2        cellranger_1.1.0  compiler_4.1.3   
[29] pillar_1.7.0      generics_0.1.2    scales_1.2.0      pkgconfig_2.0.3  

@ningxuca
Copy link
Author

ningxuca commented Jul 16, 2022 via email

@yijutseng
Copy link
Contributor

It seems that the versions of data.table and dxpr are the same in our environment. The only difference is the version of R. I test the code with Windows + R4.2.0 but still cannot reproduce your error.

The versions of the codes were listed in the document.

For the ICD-10, CCS can only be used on the version before 2019.
The AHRQ updated the whole CCS coding system and develop a new system called CCSR.
If you want to use the CCS because of the ICD-9, be sure that you check the newly added ICD-10 code, especially for COVID-19.

The main issue of the "E780" code is that this is not a billable code so you get the warning message. Please check your EHR data and see if replacing "E780" with "E7800" is reasonable. The other codes shown in the warning message can be treated in the same way.

However, based on your code, I think the "error" (not warning) might cause by other issues.
I added the "E780" to my sample file and I can still get the output (with a warning message only).
After googling the error message you have (Column 6 ['Short'] is a data.frame or data.table; malformed data.table.), I found it might cause by multiple columns with the same name or other reasons.

In your input data, does it have any other column with the name "Short"?

@ningxuca
Copy link
Author

ningxuca commented Jul 16, 2022 via email

@yijutseng
Copy link
Contributor

Will you combine the CCS or CCSR grouping from ICD-9 and ICD-10 codes in the final analysis?
If so, because the CCS and CCSR are not the same and cannot be analyzed together.
If I need to pool data from both ICD-9 and 10 together, I usually use CCS directly and check how many "new ICD-10 codes" I have in the dataset then try to code it manually (the new codes are not commonly used).

For all the "non-billing" codes, you can use "icdDecimalToShort()" to check is there any suggestion for the edits. Most of the time we can just add one digit 0 or 9 after the original code. You will get suggestions in the output, then you can edit your "non-billing" code based on the suggestions.

decimal$Error
#>        ICD count IcdVersionInFile     WrongType Suggestion
#>  1:  A0.11    20           ICD 10  Wrong format           
#>  2:  V27.0    18           ICD 10 Wrong version           
#>  3:   E114     8           ICD 10  Wrong format           
#>  4: A01.05     8            ICD 9 Wrong version           
#>  5:  42761     7           ICD 10 Wrong version           
#>  6:  Z9.90     6           ICD 10  Wrong format           
#>  7:    F42     6           ICD 10  Wrong format           
#>  8:  V24.1     6           ICD 10 Wrong version           
#>  9:  A0105     5            ICD 9 Wrong version           
#> 10:    001     5            ICD 9  Wrong format       0019
#> 11:  75.52     4            ICD 9  Wrong format           
#> 12:  E03.0     4            ICD 9 Wrong version           
#> 13:    650     4           ICD 10 Wrong version           
#> 14: 123.45     3           ICD 10  Wrong format           
#> 15:  755.2     3            ICD 9  Wrong format     755.29
#> 16:   7552     2            ICD 9  Wrong format      75529

For the error message you have, that would be great if you can share a slice of data that can reproduce the error. We will investigate the cause on our side, too.

Thank you,

@ningxuca
Copy link
Author

ningxuca commented Jul 16, 2022 via email

@ningxuca
Copy link
Author

ningxuca commented Jul 16, 2022 via email

@yijutseng
Copy link
Contributor

Thanks for the test!
I was wondering if it is possible to share the "mycode[200:300,]" in your code?
You can replace all the patient IDs with integer sequences.

We have tested the dxpr package with 953,294 unique patients and 7,948,418 distinct diagnosis records (real-world data), so maybe the number of records is not the only factor to cause the error you have.

@ningxuca
Copy link
Author

ningxuca commented Jul 17, 2022 via email

@yijutseng
Copy link
Contributor

I found that I cannot see the shared attachment.

截圖 2022-07-17 上午9 57 26

Maybe sharing a link with a google drive link would work?

@ningxuca
Copy link
Author

mycode.xlsx
mycode200_300.xlsx
I have attached the two files here. One has all the codes, one has row 200 to 300.

@yijutseng
Copy link
Contributor

Thank you for sharing the data. We've committed a new version 8fcf78e of dxpr package

I test this version of package on your data and it works fine.
The only issue is that your data have ~200 non-billable codes, such as A047.
Our suggestion function is only for ICD-9 because the ICD-10 coding system doesn't have "0 or 9" logic.
For example, the A047, should be modified to the following codes:

  • A04.71 …… recurrent
  • A04.72 …… not specified as recurrent

If you are grouping them into CCS or CCSR, basically A04.7, A04.71, or A04.72 are all defined as the same CCS or CCSR groups. Maybe you can try to impute or append 1 after the codes that are reported as in the wrong format.

Please let me know if you have any questions.

YiJu

@ningxuca
Copy link
Author

Thank you, YiJu!
When do you plan to release the new version? For our project, only about 20 diseases are of interest. We complied a list of ICD9/ICD10 codes for each individual disease, but it's time consuming and not necessary better than the CCS categorization. So I started searching for a R package converting ICD code to CCS. Thanks for sharing the application and debugging with me.
Emily

@yijutseng
Copy link
Contributor

It was just released as version 0.9.1. Feel free to reinstall it from GitHub.

# install.packages("remotes")
remotes::install_github("DHLab-TSENG/dxpr")

Here is the sessionInfo() after I reinstall the package from GitHub.

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] dxpr_0.9.1

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13   magrittr_2.0.3    tidyselect_1.1.2 
 [4] munsell_0.5.0     colorspace_2.0-3  R6_2.5.1         
 [7] rlang_1.0.4       fansi_1.0.3       dplyr_1.0.8      
[10] tools_4.1.3       grid_4.1.3        data.table_1.14.2
[13] gtable_0.3.0      utf8_1.2.2        cli_3.3.0        
[16] DBI_1.1.2         ellipsis_0.3.2    assertthat_0.2.1 
[19] tibble_3.1.7      lifecycle_1.0.1   crayon_1.5.1     
[22] purrr_0.3.4       ggplot2_3.3.6     vctrs_0.4.1      
[25] glue_1.6.2        compiler_4.1.3    pillar_1.7.0     
[28] generics_0.1.2    scales_1.2.0      pkgconfig_2.0.3  

Thank you for reporting the issues.

YiJu

@ningxuca
Copy link
Author

Hi YiJu,

I have the following codes showing as 'wrong code' . The reason is they are not billable?
Screenshot 2022-07-17 150726

@ningxuca
Copy link
Author

Hi YiJu,
It seems that CCSR function didn't work, CCS function could run without error with the same input file
I just want to let you know, but no rush.
Screenshot 2022-07-17 164132

@yijutseng
Copy link
Contributor

Hello,

Thank you for reporting the issue.
I've tested the code with the file "mycode.xls" you shared in the previous reply.

library(dxpr)
library(readxl)
mycode <- read_excel("mycode.xlsx", 
                     col_types = c("text", "numeric", "date", "text"))
icdDxToCCSR(dxDataFile = mycode,idColName = PTID,
            icdColName = code,dateColName = DIAG_DATE,
            icdVerColName = type)

I get outputs without error

Wrong ICD format: total 178 ICD codes (the number of occurrences is in brackets)
c("A047 (1)", "C4312 (1)", "C44102 (1)", "C44112 (1)", "C44119 (1)", "C44122 (1)", "C44129 (1)", "C962 (1)", "D0312 (1)", "D0411 (1)")
	
$groupedDT
         Short     ID     ICD       Date Version                                                CCSR_CATEGORY_DESCRIPTION
    1:    D899 R00001    D899 2016-01-01      10                                                       Immunity disorders
    2:    R413 R00001    R413 2016-01-01      10                                        Nervous system signs and symptoms
    3:    I639 R00001    I639 2016-01-01      10                                                      Cerebral infarction
    4:   R9431 R00001   R9431 2016-01-01      10                                      Abnormal findings without diagnosis
    5:   R0602 R00001   R0602 2016-01-01      10                                           Respiratory signs and symptoms
   ---                                                                                                                   
18838: W171XXA R00001 W171XXA 2016-01-01      10         External cause codes: intent of injury, accidental/unintentional
18839: T84611A R00001 T84611A 2016-01-01      10 Complication of internal orthopedic device or implant, initial encounter
18840: T84629A R00001 T84629A 2016-01-01      10 Complication of internal orthopedic device or implant, initial encounter
18841:   K5229 R00001   K5229 2016-01-01      10                                            Noninfectious gastroenteritis
18842:   K5229 R00001   K5229 2016-01-01      10                                                       Allergic reactions

$summarised_groupedDT
         ID                CCSR_CATEGORY_DESCRIPTION firstCaseDate endCaseDate count period
  1: R00001                       Immunity disorders    2016-01-01  2016-01-01    45 0 days
  2: R00001        Nervous system signs and symptoms    2016-01-01  2016-01-01   122 0 days
  3: R00001                      Cerebral infarction    2016-01-01  2016-01-01   114 0 days
  4: R00001      Abnormal findings without diagnosis    2016-01-01  2016-01-01   143 0 days
  5: R00001           Respiratory signs and symptoms    2016-01-01  2016-01-01    32 0 days
 ---                                                                                       
493: R00001              Neonatal cerebral disorders    2016-01-01  2016-01-01     2 0 days
494: R00001 Neonatal digestive and feeding disorders    2016-01-01  2016-01-01     3 0 days
495: R00001            Neonatal acidemia and hypoxia    2016-01-01  2016-01-01     1 0 days
496: R00001          Maternal intrauterine infection    2016-01-01  2016-01-01     2 0 days
497: R00001               Autoinflammatory syndromes    2016-01-01  2016-01-01     1 0 days

$Error
         ICD count IcdVersionInFile    WrongType Suggestion
  1:    A047     1           ICD 10 Wrong format           
  2:   C4312     1           ICD 10 Wrong format           
  3:  C44102     1           ICD 10 Wrong format           
  4:  C44112     1           ICD 10 Wrong format           
  5:  C44119     1           ICD 10 Wrong format           
 ---                                                       
174: T8585XS     1           ICD 10 Wrong format           
175: T8586XA     1           ICD 10 Wrong format           
176: T8589XA     1           ICD 10 Wrong format           
177: V4752XA     1           ICD 10 Wrong format           
178: W452XXA     1           ICD 10 Wrong format           

警告訊息:
1: The ICD mentioned above matches to "NA" due to the format or other issues. 
2: "Wrong ICD format" means the ICD has wrong format 
3: "Wrong ICD version" means the ICD classify to wrong ICD version (cause the "icd10usingDate" or other issues) 

Maybe you can try to update the dxpr package, reload it and try again?

# install.packages("remotes")
remotes::install_github("DHLab-TSENG/dxpr")

The reason why you get the warning message for "T8586XA" is that we use the ICD-10-CM codes released by CMS. If your codes are not on the list, we will provide a list with a warning message.

Based on the ICD-10 coding structure, I think the dxpr package can provide some rules to deal with the differences in digits 6 or 7 because they usually do not affect the result of grouping.
We need some times to develop the rules and test through our sample file.

@ningxuca
Copy link
Author

I see, you used ICD-10-CM. I used ICD-10-DX. The file mycode I sent is the ICD9 portion in my data. Please see attached for the ICD10 portion which raised the error in CCSR function call.
ICD10.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants