Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_p fails with very large datasets #341

Closed
slobaugh opened this issue Jan 10, 2020 · 0 comments
Closed

add_p fails with very large datasets #341

slobaugh opened this issue Jan 10, 2020 · 0 comments
Milestone

Comments

@slobaugh
Copy link
Contributor

The error is being introduced when add_p() is deciding whether to calculate the chi square test or the Fisher exact test. It is calculating whether any cells in the table have an expected count less than 5. The issue with the data you passed is that the counts are too large for R to calculate the expected cell count the way the code is written.

Rather than multiplying the large counts by each other, we'll need to first convert each count to a probability and multiply the probabilities.

min_exp <-
    expand.grid(table(data[[var]]), table(data[[by_var]])) %>%
    mutate(exp = .data$Var1 * .data$Var2 /
      sum(table(data[[var]], data[[by_var]])))
	
   Var1  Var2       exp
1 77968 27135 25011.369
2  6620 27135  2123.631
3 77968 27939        NA
4  6620 27939  2186.553
5 77968 29514        NA
6  6620 29514  2309.816
Warning message:
In .data$Var1 * .data$Var2 : NAs produced by integer overflow
slobaugh pushed a commit to slobaugh/gtsummary that referenced this issue Jan 10, 2020
@slobaugh slobaugh mentioned this issue Jan 10, 2020
9 tasks
ddsjoberg added a commit that referenced this issue Jan 11, 2020
* #341 add_p bug fix

* Update NEWS.md

* bump version number

Co-authored-by: Daniel Sjoberg <danield.sjoberg@gmail.com>
@ddsjoberg ddsjoberg added this to the v1.2.5 milestone Jan 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants