Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Stratification variable(s) should not contain missing values. #97

Open
Beduiz opened this issue Feb 11, 2023 · 7 comments
Open

Comments

@Beduiz
Copy link

Beduiz commented Feb 11, 2023

Dear Benjamin,

Since I last used table1 in the autumn, code that used to work does no longer with the following error presented:

Error in table1.formula(~variable1 + variable2 + variable3 + : 
  Stratification variable(s) should not contain missing values.

The startification variable I use has 3 levels (for example 1 car, 2 cars and 3 cars), and since I can't get P-values with more than 2 levels, I've created custom stratification variables (stratification_12 with only 1 car and 2 car, stratification 13 with only 1 car and 3 cars, and stratification 23 with only 2 cars and 3 cars), in which the level not included has been replaced with NA. This used to work, but now doesn't.

is there a workaround?

Kind regards

@benjaminrich
Copy link
Owner

I'm sorry for introducing this change that broke your code. It was done in response to feedback from another user (#80). But I obviously hadn't considered your use case and maybe I acted too hastily. In order to help you, I would need a concrete example that I can reproduce. Would it be possible to provide such an example using some simulated data?

@Beduiz
Copy link
Author

Beduiz commented Feb 18, 2023

Dear Benjamin,

Thank you for your help, I'm grateful. Here is a reproducible example:

`s1_vars.NA <- c()
s1_vars.normal.test <- c()
s1_vars.normal.view <- c()

s1_rndr <- function(x, name, ...) {
cont <- ifelse(name %in% s1_vars.normal.view, "Mean (SD)", "Median (Q1-Q3)")
y <- render.default(x, name, render.continuous=cont, ..., digits=1, digits.pct=1, round.integers=T, drop0trailing=F, rounding.fn = round_pad) # Max three digits for median (Q1-Q3) and zero decimals for percentages
if (is.logical(x)) {
y[2]
} else if (is.factor(x)) {
y # Don't exclude any level
} else {
y
}
}
s1_pvalue <- function(x, name, ...) {
x <- x[names(x) != "overall"] # Dont count the "overall" column
y <- unlist(x)
g <- ordered(rep(1:length(x), times=sapply(x, length)))
if (name %in% s2_t1_vars.NA) {
p <- write("NA") # Variables that should not be tested
} else if (is.numeric(y) && (name %in% s1_vars.normal.test)) {
p <- t.test(y ~ g, paired = F, alternative = c("two.sided"))$p.value # Two-samples t-test for normal continuous
} else if (is.numeric(y)) {
p <- wilcox.test(y ~ g, paired=F, alternative = c("two.sided"))$p.value # Mann-Whitney U-test for skewed continuous
} else {
p <- chisq.test(table(y, g), correct=F)$p.value # Chi-square test for categorical
}
c(sub("<", "<", format.pval(p, digits=3, nsmall=3, eps=0.001))) # Format p-value
}
s1_stats <- function(x, name, ...) {
y <- unlist(x)
if (is.numeric(y) && (name %in% s1_vars.normal.view)) {
", mean (SD)"
} else if (is.numeric(y)) {
", median (Q1-Q3)"
} else {
", n (%)"
}
}

table1(~ Sepal.Length + Petal.Length | Species, data=iris, render=s1_rndr, topclass = "Rtable1-zebra", render.missing=NULL, extra.col=list( =s1_stats, P-value=s1_pvalue), extra.col.pos=1, overall=F)`

Kind regards

@benjaminrich
Copy link
Owner

I'm looking at your example. I see that there are 3 strata (3 species), but I don't understand how you want the output to look. Are you trying to create 3 different p-values for each pairwise comparison?

@Beduiz
Copy link
Author

Beduiz commented Mar 1, 2023

Hi Benjamin.

My aim is to make a table with all the 3 strata, a total column, and a column with all the p-values (strata 1 vs 2, 1 vs 3 and 2 vs 3).

What i've done so far to achieve this is create 4 different tables that i later merge manually: In the first 3, I will run one strata to the other (i.e. strata 1 vs 2, 1 vs 3 and 2 vs 3) to get the p values. In the 4th table, I will only run for the descriptives and not ask for p-values.

However, it would definitely be great if it was possible to get all of this in one table. But if not, it would be great if i can again do as detailed above. I was able to do a work-around in whcih i create new darabases for each strata set of 2, ie one strata_1_vs_2-database for that table, strata_1_vs_3-database for that table and so forth. But it would be better if it worked as originally :-)

Kind regards

@vonhyden
Copy link

Dear Benjamin,

Im having the same issue as Beduiz ..
Including the missing values in the descriptive analysis was actually useful to generate a fast overview of the data.

It would be great if you could do something about it.. :)
Thank you !

best regards
Nicolas

@ILHaeu
Copy link

ILHaeu commented Apr 25, 2024

Hi, also having this problem - is there a suggested alterative or workaround when your stratified variable has some missing values?

@Streep
Copy link

Streep commented Apr 27, 2024

Hi all, I have the same problem. I was using renv to keep my packages the same, but now I decided to do an update and this issue broke all my tables.

Most other crosstable packages can deal with NAs. Maybe make it an option to set missing as a separate category?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants