-
-
Notifications
You must be signed in to change notification settings - Fork 17
Fix values_fill, docs
#645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
I revised the function to fill missings. |
This comment was marked as outdated.
This comment was marked as outdated.
etiennebacher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- fixes
values_fillwhenvalues_from> 1
This looks good to me.
- allows
values_fillto be a list of mixed types
I have some questions about the implementation.
- refactors the missing-fill code into a separate function
Thanks, looks good.
- adds docs about different behaviour of
data_to_wide()andpivot_wider().
I think it's missing docs about which NA are filled, which is different from the tidyr implementation. Ideally we would have the same behavior but for now this difference needs to be documented. Maybe something like:
in
tidyr::pivot_wider(),values_filldoesn't apply to all missing values but only to those who were created by the reshaping process because the combinations of ID andnames_fromdidn't exist. Pre-existing explicit missing values are not modified. By contrast,data_to_wide()fills all missing values.
|
Ok, I think I'm almost done. This now works, only the original sorting needs to be restored. |
|
Maybe we could also allow select helpers for |
This comment was marked as outdated.
This comment was marked as outdated.
|
Ok, the current implementation works with multiple That one is tricky, especially for multiple library(datawizard)
long_df <- data.frame(
subject_id = c(1, 1, 2, 2, 3, 5, 4, 4),
time = rep(c(1, 2), 4),
score = c(10, NA, 15, 12, 18, 11, NA, 14),
anxiety = c(5, 7, 6, NA, 8, 4, 5, NA),
test = rep(NA_real_, 8)
)
data_to_wide(
long_df,
id_cols = "subject_id",
names_from = "time",
values_from = c("score", "anxiety", "test")
)
#> subject_id score_1 score_2 anxiety_1 anxiety_2 test_1 test_2
#> 1 1 10 NA 5 7 NA NA
#> 2 2 15 12 6 NA NA NA
#> 3 3 18 NA 8 NA NA NA
#> 4 4 NA 14 5 NA NA NA
#> 5 5 NA 11 NA 4 NA NA
tidyr::pivot_wider(
long_df,
id_cols = "subject_id",
names_from = "time",
values_from = c("score", "anxiety", "test")
)
#> # A tibble: 5 × 7
#> subject_id score_1 score_2 anxiety_1 anxiety_2 test_1 test_2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 10 NA 5 7 NA NA
#> 2 2 15 12 6 NA NA NA
#> 3 3 18 NA 8 NA NA NA
#> 4 5 NA 11 NA 4 NA NA
#> 5 4 NA 14 5 NA NA NA
data_to_wide(
long_df,
id_cols = "subject_id",
names_from = "time",
values_fill = 99,
values_from = c("score", "anxiety", "test")
)
#> subject_id score_1 score_2 anxiety_1 anxiety_2 test_1 test_2
#> 1 1 10 NA 5 7 NA NA
#> 2 2 15 12 6 NA NA NA
#> 3 3 18 99 8 99 NA 99
#> 4 4 NA 14 5 NA NA NA
#> 5 5 99 11 99 4 99 NA
tidyr::pivot_wider(
long_df,
id_cols = "subject_id",
names_from = "time",
values_fill = 99,
values_from = c("score", "anxiety", "test")
)
#> # A tibble: 5 × 7
#> subject_id score_1 score_2 anxiety_1 anxiety_2 test_1 test_2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 10 NA 5 7 NA NA
#> 2 2 15 12 6 NA NA NA
#> 3 3 18 99 8 99 NA 99
#> 4 5 99 11 99 4 99 NA
#> 5 4 NA 14 5 NA NA NA
long_df2 <- data.frame(
subject_id = c(1, 1, 2, 2, 3, 5, 4, 4),
id2 = c(1, 3, 2, 3, 1, 6, 7, 6),
time = rep(c(1, 2), 4),
score = c(10, NA, 15, 12, 18, 11, NA, 14),
anxiety = c(5, 7, 6, NA, 8, 4, 5, NA),
test = rep(NA_real_, 8)
)
data_to_wide(
long_df2,
id_cols = c("subject_id", "id2"),
names_from = "time",
values_from = c("score", "anxiety", "test")
)
#> subject_id id2 score_1 score_2 anxiety_1 anxiety_2 test_1 test_2
#> 1 1 1 10 NA 5 NA NA NA
#> 2 1 3 NA NA NA 7 NA NA
#> 3 2 2 15 NA 6 NA NA NA
#> 4 2 3 NA 12 NA NA NA NA
#> 5 3 1 18 NA 8 NA NA NA
#> 6 4 6 NA 14 NA NA NA NA
#> 7 4 7 NA NA 5 NA NA NA
#> 8 5 6 NA 11 NA 4 NA NA
tidyr::pivot_wider(
long_df2,
id_cols = c("subject_id", "id2"),
names_from = "time",
values_from = c("score", "anxiety", "test")
)
#> # A tibble: 8 × 8
#> subject_id id2 score_1 score_2 anxiety_1 anxiety_2 test_1 test_2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 10 NA 5 NA NA NA
#> 2 1 3 NA NA NA 7 NA NA
#> 3 2 2 15 NA 6 NA NA NA
#> 4 2 3 NA 12 NA NA NA NA
#> 5 3 1 18 NA 8 NA NA NA
#> 6 5 6 NA 11 NA 4 NA NA
#> 7 4 7 NA NA 5 NA NA NA
#> 8 4 6 NA 14 NA NA NA NA
data_to_wide(
long_df2,
id_cols = c("subject_id", "id2"),
names_from = "time",
values_fill = 99,
values_from = c("score", "anxiety", "test")
)
#> subject_id id2 score_1 score_2 anxiety_1 anxiety_2 test_1 test_2
#> 1 1 1 10 99 5 99 NA 99
#> 2 1 3 99 NA 99 7 99 NA
#> 3 2 2 15 99 6 99 NA 99
#> 4 2 3 99 12 99 NA 99 NA
#> 5 3 1 18 99 8 99 NA 99
#> 6 4 6 99 14 99 NA 99 NA
#> 7 4 7 NA 99 5 99 NA 99
#> 8 5 6 99 11 99 4 99 NA
tidyr::pivot_wider(
long_df2,
id_cols = c("subject_id", "id2"),
names_from = "time",
values_fill = 99,
values_from = c("score", "anxiety", "test")
)
#> # A tibble: 8 × 8
#> subject_id id2 score_1 score_2 anxiety_1 anxiety_2 test_1 test_2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 10 99 5 99 NA 99
#> 2 1 3 99 NA 99 7 99 NA
#> 3 2 2 15 99 6 99 NA 99
#> 4 2 3 99 12 99 NA 99 NA
#> 5 3 1 18 99 8 99 NA 99
#> 6 5 6 99 11 99 4 99 NA
#> 7 4 7 NA 99 5 99 NA 99
#> 8 4 6 99 14 99 NA 99 NACreated on 2025-09-05 with reprex v2.1.1 |
|
@etiennebacher I think we can close this PR. While the implementation with I suggest we close this PR, and from the current main, we remove the |
This
PR:values_fillwhenvalues_from> 1values_fillto be a list of mixed typesdata_to_wide()andpivot_wider().