-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error when call icdDxToxxx #36
Comments
Hello, I copy and paste the diag2 into an excel file and test the codes, but I cannot reproduce the error.
dxpr_example.xlsx May I know the sessionInfo() of your environment when you get the error message? My test environment:
|
Hi Yi-Ju,
Here is from sessioninfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United
States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dxpr_0.9.0 icd.data_1.0 icd_4.0.9
stringi_1.7.6 huxtable_5.5.0 labelled_2.9.1 gtsummary_1.6.0
[8] gt_0.6.0 tinytex_0.39 knitr_1.39
openxlsx_4.2.5 scales_1.2.0 ggThemeAssist_0.1.5
sjlabelled_1.2.0
[15] devtools_2.4.3 usethis_2.1.6 sjmisc_2.8.9 cli_3.3.0
janitor_2.1.0 haven_2.5.0 readxl_1.4.0
[22] stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4
readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6
[29] tidyverse_1.3.1 lubridate_1.8.0 vtable_1.3.3
kableExtra_1.3.4 data.table_1.14.2 exact2x2_1.6.6 exactci_1.4-2
[36] testthat_3.1.4 ssanv_1.1 Rcpp_1.0.8.3
forcats_0.5.1
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.3
snakecase_0.11.0 fs_1.5.2 rstudioapi_0.13 remotes_2.4.2
[8] fansi_1.0.3 xml2_1.3.3 cachem_1.0.6
pkgload_1.2.4 jsonlite_1.8.0 broom_0.8.0 dbplyr_2.2.0
[15] shiny_1.7.1 compiler_4.2.0 httr_1.4.3
backports_1.4.1 assertthat_0.2.1 fastmap_1.1.0 later_1.3.0
[22] formatR_1.12 htmltools_0.5.2 prettyunits_1.1.1
tools_4.2.0 gtable_0.3.0 glue_1.6.2
cellranger_1.1.0
[29] vctrs_0.4.1 svglite_2.1.0 broom.helpers_1.7.0
insight_0.17.1 xfun_0.31 ps_1.7.0 brio_1.1.3
[36] rvest_1.0.2 mime_0.12 miniUI_0.1.1.1
lifecycle_1.0.1 hms_1.1.1 promises_1.2.0.1 memoise_2.0.1
[43] highr_0.9 desc_1.4.1 pkgbuild_1.3.1 zip_2.2.0
rlang_1.0.2 pkgconfig_2.0.3 systemfonts_1.0.4
[50] evaluate_0.15 processx_3.6.0 tidyselect_1.1.2
magrittr_2.0.3 R6_2.5.1 generics_0.1.2 DBI_1.1.2
[57] pillar_1.7.0 withr_2.5.0 modelr_0.1.8
crayon_1.5.1 utf8_1.2.2 tzdb_0.3.0 rmarkdown_2.14
[64] grid_4.2.0 callr_3.7.0 reprex_2.0.1
digest_0.6.29 webshot_0.5.3 xtable_1.8-4 httpuv_1.6.5
[71] munsell_0.5.0 viridisLite_0.4.0 sessioninfo_1.2.2
My input file is very large as it's a real EMR data. I suspect it's the
weird ICD code that caused the issue. For example, I don't see leading 0s
in my input file for ICD 9 code. So I tried to merge my input file with
standard ICD9 code excel file and get the codes with leading 0s. It worked
for the ICD9 portion. But for ICD10 portion of my input file, I merged with
the file I downloaded on-line Section111ValidICD10-Jan2022.xlsx. It's the
most recent ICD10 code file. I got the following error message for this
ICD10 portion. I suspect a different ICD10 version was used in your
package. Which version did you use? And the error message caused no output
produced.
Wrong ICD format: total 178 ICD codes (the number of occurrences is in
brackets)
c("E780 (5553)", "E784 (5476)", "I272 (5021)", "M791 (3356)", "M4806
(2907)", "R938 (2332)", "R972 (1235)", "R8299 (1169)", "A047 (987)", "H578
(960)")
Error in `[.data.table`(dxDataFile, Version == 9, ) :
Column 6 ['Short'] is a data.frame or data.table; malformed data.table.
In addition: Warning messages:
1: The ICD mentioned above matches to "NA" due to the format or other
issues.
2: "Wrong ICD format" means the ICD has wrong format
3: "Wrong ICD version" means the ICD classify to wrong ICD version (cause
the "icd10usingDate" or other issues)
Thank you for getting back to you,
Emily
…On Fri, Jul 15, 2022 at 6:39 PM Yi-Ju Tseng ***@***.***> wrote:
Hello,
I copy and paste the diag2 into an excel file and test the codes, but I
cannot reproduce the error.
library(dxpr)
library(readxl)
diag2 <- read_excel("dxpr_example.xlsx",
col_types = c("text", "text", "date", "text", "numeric"))
css <- icdDxToCCS(dxDataFile = diag2,
idColName = PTID,
icdColName = DIAGNOSIS_CD,
icdVerColName = type,
dateColName = DIAG_DATE)
dxpr_example.xlsx
<https://github.com/DHLab-TSENG/dxpr/files/9124895/dxpr_example.xlsx>
(I made some duplication to test the function)
May I know the sessionInfo() of your environment when you get the error
message?
My test environment:
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] readxl_1.3.1 dxpr_0.9.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.2 rstudioapi_0.13 magrittr_2.0.3 tidyselect_1.1.2
[5] munsell_0.5.0 colorspace_2.0-3 R6_2.5.1 rlang_1.0.2
[9] fansi_1.0.3 dplyr_1.0.8 tools_4.1.3 grid_4.1.3
[13] data.table_1.14.2 gtable_0.3.0 utf8_1.2.2 cli_3.3.0
[17] DBI_1.1.2 ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.7
[21] lifecycle_1.0.1 crayon_1.5.1 purrr_0.3.4 ggplot2_3.3.6
[25] vctrs_0.4.1 glue_1.6.2 cellranger_1.1.0 compiler_4.1.3
[29] pillar_1.7.0 generics_0.1.2 scales_1.2.0 pkgconfig_2.0.3
—
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZJNASGIEO62WDUB3ALM3H3VUIHF7ANCNFSM53WVOIGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
It seems that the versions of The versions of the codes were listed in the document. For the ICD-10, CCS can only be used on the version before 2019. The main issue of the "E780" code is that this is not a billable code so you get the warning message. Please check your EHR data and see if replacing "E780" with "E7800" is reasonable. The other codes shown in the warning message can be treated in the same way. However, based on your code, I think the "error" (not warning) might cause by other issues. In your input data, does it have any other column with the name "Short"? |
Hi Yi-Ju,
Thank you for getting back to me so quickly. And thanks for the tips. I am
going to separate the ICD10 portion and use CCSR to map.
The problem with my dataset is that it has lots of errors in the data, It's
not possible for me to check the validity of each code. I have millions of
millions of rows in the data.
The strange thing is that I don't [Short] column in my dataset at all. I
thought it was an intermediate table produced by your package.
I am going to try the new mapping for ICD10 and let you know. ICD9 portion
is good now.
Thanks again,
Emily
…On Fri, Jul 15, 2022 at 8:15 PM Yi-Ju Tseng ***@***.***> wrote:
It seems that the versions of data.table and dxpr are the same in our
environment. The only difference is the version of R. I test the code with
Windows + R4.2.0 but still cannot reproduce your error.
The versions of the codes were listed in the document
<https://dhlab-tseng.github.io/dxpr/articles/Eng_Diagnosis.html>.
For the ICD-10, CCS can only be used on the version before 2019.
The AHRQ updated the whole CCS coding system and develop a new system
called CCSR.
If you want to use the CCS because of the ICD-9, be sure that you check
the newly added ICD-10 code, especially for COVID-19.
The main issue of the "E780" code is that this is not a billable code
<https://www.icd10data.com/ICD10CM/Codes/E00-E89/E70-E88/E78-> so you get
the warning message. Please check your EHR data and see if replacing "E780"
with "E7800" is reasonable. The other codes shown in the warning message
can be treated in the same way.
However, based on your code, I think the "error" (not warning) might cause
by other issues.
I added the "E780" to my sample file and I can still get the output (with
a warning message only).
After googling the error message you have (Column 6 ['Short'] is a
data.frame or data.table; malformed data.table.), I found it might cause
by multiple columns with the same name or other reasons.
In your input data, does it have any other column with the name "Short"?
—
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZJNASHBFGLJTIUT7HBZ5ZDVUISOHANCNFSM53WVOIGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Will you combine the CCS or CCSR grouping from ICD-9 and ICD-10 codes in the final analysis? For all the "non-billing" codes, you can use "icdDecimalToShort()" to check is there any suggestion for the edits. Most of the time we can just add one digit 0 or 9 after the original code. You will get suggestions in the output, then you can edit your "non-billing" code based on the suggestions.
For the error message you have, that would be great if you can share a slice of data that can reproduce the error. We will investigate the cause on our side, too. Thank you, |
I see, the file I have doesn't have a dot in the ICD code. For example
E030, instead of E03.0
Would the function icdDecimalToShort() work?
…On Fri, Jul 15, 2022 at 8:49 PM Yi-Ju Tseng ***@***.***> wrote:
Will you combine the CCS or CCSR grouping from ICD-9 and ICD-10 codes in
the final analysis?
If so, because the CCS and CCSR are not the same and cannot be analyzed
together.
If I need to pool data from both ICD-9 and 10 together, I usually use CCS
directly and check how many "new ICD-10 codes" I have in the dataset then
try to code it manually (the new codes are not commonly used).
For all the "non-billing" codes, you can use "icdDecimalToShort()
<https://dhlab-tseng.github.io/dxpr/articles/Eng_Diagnosis.html#a-2--uniform-short-format>"
to check is there any suggestion for the edits. Most of the time we can
just add one digit 0 or 9 after the original code. You will get suggestions
in the output, then you can edit your "non-billing" code based on the
suggestions.
decimal$Error
#> ICD count IcdVersionInFile WrongType Suggestion
#> 1: A0.11 20 ICD 10 Wrong format
#> 2: V27.0 18 ICD 10 Wrong version
#> 3: E114 8 ICD 10 Wrong format
#> 4: A01.05 8 ICD 9 Wrong version
#> 5: 42761 7 ICD 10 Wrong version
#> 6: Z9.90 6 ICD 10 Wrong format
#> 7: F42 6 ICD 10 Wrong format
#> 8: V24.1 6 ICD 10 Wrong version
#> 9: A0105 5 ICD 9 Wrong version
#> 10: 001 5 ICD 9 Wrong format 0019
#> 11: 75.52 4 ICD 9 Wrong format
#> 12: E03.0 4 ICD 9 Wrong version
#> 13: 650 4 ICD 10 Wrong version
#> 14: 123.45 3 ICD 10 Wrong format
#> 15: 755.2 3 ICD 9 Wrong format 755.29
#> 16: 7552 2 ICD 9 Wrong format 75529
For the error message you have, that would be great if you can share a
slice of data that can reproduce the error. We will investigate the cause
on our side, too.
Thank you,
—
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZJNASDOUX5HZ4MAPGDWNOTVUIWLHANCNFSM53WVOIGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi Yi-Ju,
I tried the icdDecimalToShort, it didn't work as no suggestion was given in
my case.
I also tried to CCSR mapping, same error message:
Error in `[.data.table`(dxDataFile, Version == 10, ) :
Column 6 ['Short'] is a data.frame or data.table; malformed data.table.
I got all the unique ICD10 codes in my project. Other columns in the file
are dummy. Please see attached and I run it with function
css10 <- icdDxToCCSR(dxDataFile = u1,idColName = PTID,icdColName = code,
icdVerColName = type, dateColName = DIAG_DATE)
The strange thing is that I tried to split my files into sections to
pin-point where the issue is.
when u1 = mycode[200:300,] gave an error message, but if I do u1 =
mycode[200:250,], then again u1 =mycode[250:300,], run the function twice,
there was no error message.
So It seems to me that maybe when there are too many records, the issue
occurs. just my guess.
Thank you,
Emily
…On Fri, Jul 15, 2022 at 9:40 PM Emily Xu ***@***.***> wrote:
I see, the file I have doesn't have a dot in the ICD code. For example
E030, instead of E03.0
Would the function icdDecimalToShort() work?
On Fri, Jul 15, 2022 at 8:49 PM Yi-Ju Tseng ***@***.***>
wrote:
> Will you combine the CCS or CCSR grouping from ICD-9 and ICD-10 codes in
> the final analysis?
> If so, because the CCS and CCSR are not the same and cannot be analyzed
> together.
> If I need to pool data from both ICD-9 and 10 together, I usually use CCS
> directly and check how many "new ICD-10 codes" I have in the dataset then
> try to code it manually (the new codes are not commonly used).
>
> For all the "non-billing" codes, you can use "icdDecimalToShort()
> <https://dhlab-tseng.github.io/dxpr/articles/Eng_Diagnosis.html#a-2--uniform-short-format>"
> to check is there any suggestion for the edits. Most of the time we can
> just add one digit 0 or 9 after the original code. You will get suggestions
> in the output, then you can edit your "non-billing" code based on the
> suggestions.
>
> decimal$Error
> #> ICD count IcdVersionInFile WrongType Suggestion
> #> 1: A0.11 20 ICD 10 Wrong format
> #> 2: V27.0 18 ICD 10 Wrong version
> #> 3: E114 8 ICD 10 Wrong format
> #> 4: A01.05 8 ICD 9 Wrong version
> #> 5: 42761 7 ICD 10 Wrong version
> #> 6: Z9.90 6 ICD 10 Wrong format
> #> 7: F42 6 ICD 10 Wrong format
> #> 8: V24.1 6 ICD 10 Wrong version
> #> 9: A0105 5 ICD 9 Wrong version
> #> 10: 001 5 ICD 9 Wrong format 0019
> #> 11: 75.52 4 ICD 9 Wrong format
> #> 12: E03.0 4 ICD 9 Wrong version
> #> 13: 650 4 ICD 10 Wrong version
> #> 14: 123.45 3 ICD 10 Wrong format
> #> 15: 755.2 3 ICD 9 Wrong format 755.29
> #> 16: 7552 2 ICD 9 Wrong format 75529
>
> For the error message you have, that would be great if you can share a
> slice of data that can reproduce the error. We will investigate the cause
> on our side, too.
>
> Thank you,
>
> —
> Reply to this email directly, view it on GitHub
> <#36 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AZJNASDOUX5HZ4MAPGDWNOTVUIWLHANCNFSM53WVOIGQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Thanks for the test! We have tested the dxpr package with 953,294 unique patients and 7,948,418 distinct diagnosis records (real-world data), so maybe the number of records is not the only factor to cause the error you have. |
Here you go
Also in my last email, I have attached all the ICD10 codes in mycode.xlsx
…On Sat, Jul 16, 2022 at 6:40 PM Yi-Ju Tseng ***@***.***> wrote:
Thanks for the test!
I was wondering if it is possible to share the "mycode[200:300,]" in your
code?
You can replace all the patient IDs with integer sequences.
We have tested the dxpr package with 953,294 unique patients and 7,948,418
distinct diagnosis records (real-world data), so maybe the number of
records is not the only factor to cause the error you have.
—
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZJNASE42BGYZOML3PS2XA3VUNQBRANCNFSM53WVOIGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
mycode.xlsx |
Thank you for sharing the data. We've committed a new version 8fcf78e of dxpr package I test this version of package on your data and it works fine. If you are grouping them into CCS or CCSR, basically A04.7, A04.71, or A04.72 are all defined as the same CCS or CCSR groups. Maybe you can try to impute or append 1 after the codes that are reported as in the wrong format. Please let me know if you have any questions. YiJu |
Thank you, YiJu! |
It was just released as version 0.9.1. Feel free to reinstall it from GitHub.
Here is the sessionInfo() after I reinstall the package from GitHub.
Thank you for reporting the issues. YiJu |
Hello, Thank you for reporting the issue.
I get outputs without error
Maybe you can try to update the dxpr package, reload it and try again?
The reason why you get the warning message for "T8586XA" is that we use the ICD-10-CM codes released by CMS. If your codes are not on the list, we will provide a list with a warning message. Based on the ICD-10 coding structure, I think the dxpr package can provide some rules to deal with the differences in digits 6 or 7 because they usually do not affect the result of grouping. |
I see, you used ICD-10-CM. I used ICD-10-DX. The file mycode I sent is the ICD9 portion in my data. Please see attached for the ICD10 portion which raised the error in CCSR function call. |
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in 5441218 rows; more than 5423694 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
My code is
css <- icdDxToCCS(dxDataFile = diag2,idColName = PTID,icdColName = DIAGNOSIS_CD, icdVerColName = type, dateColName = DIAG_DATE)
Dataset diag2 is
PTID ENCID DIAG_DATE DIAGNOSIS_CD type
1 PT608273746 E0000025104028467 2018-12-29 Z0000 10
2 PT229616319 E0000005436124507 2011-01-10 7295 9
3 PT608345956 E0000025163571021 2018-02-28 Z0000 10
4 PT608361660 E0000025142225399 2018-10-22 Z0000 10
5 PT235121286 E0000005387281824 2011-11-28 7295 9
6 PT240024801 E0000005371647449 2011-12-05 7295 9
7 PT240663058 E0000026942225637 2011-06-22 7295 9
8 PT154968782 E0000002253904854 2011-08-04 42833 9
9 PT154968782 E0000002253904855 2011-08-06 42833 9
10 PT154968782 E0000002253904856 2011-08-07 42833 9
The text was updated successfully, but these errors were encountered: