-
-
Notifications
You must be signed in to change notification settings - Fork 17
Fix data_to_wide() on unbalanced panel with multiple variables in values_from
#644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@etiennebacher When ID's in a data frame are not "balanced", This is because in this code block: # create missing combinations
if (not_all_cols_are_selected && incomplete_groups) {
...
}we have # must be rearranged as "B" "B" "A" "A" and not "A" "A" "B" "B"
lookup <- data.frame(
temporary_id = unique(
new_data[!is.na(new_data[[values_from]]), "temporary_id"]
)
)i.e. |
|
One downside currently is that |
|
I'd like to review it more properly tonight, can you just install this version with Also, have you compared this to the output of |
|
Yes, not urgent to be merged. Yes, I have compared to pivot_wider, looks good. Will do a more detailed check later, and also address check failures |
|
Let me convert this into a draft, I think new column names are not yet fixed. |
|
Ok, let's "restart" this PR. I think when we change lookup <- data.frame(
temporary_id = unique(
new_data[!is.na(new_data[[values_from]]), "temporary_id"]
)
)into lookup <- data.frame(
temporary_id = unique(
new_data[!is.na(new_data[values_from]), "temporary_id"]
)
)i.e. It's just that ID's 3 and 5, which only occur once, should be inserted with an Here's an example to check the code. long_df <- data.frame(
subject_id = c(1, 1, 2, 2, 3, 5, 4, 4),
time = rep(c(1, 2), 4),
score = c(10, NA, 15, 12, 18, 11, NA, 14),
anxiety = c(5, 7, 6, NA, 8, 4, 5, NA)
)
data_to_wide(long_df,
id_cols = "subject_id",
names_from = "time",
values_from = c("score", "anxiety")
) |
|
I can't 100% follow your logic, so you may took a look at the code that handles It should be this code block: lookup <- data.frame(
temporary_id = unique(
new_data[!is.na(new_data[values_from]), "temporary_id"]
)
)
lookup$temporary_id_2 <- seq_len(nrow(lookup))
new_data <- data_merge(
new_data, lookup,
by = "temporary_id", join = "left"
)
# creation of missing combinations was done with a temporary id, so need
# to fill columns that are not selected in names_from or values_from
new_data[, id_cols] <- lapply(id_cols, function(x) {
data <- data_arrange(new_data, c("temporary_id_2", x))
ind <- which(!is.na(data[[x]]))
rep_times <- diff(c(ind, length(data[[x]]) + 1))
rep(data[[x]][ind], times = rep_times)
}) |
|
@strengejacke I took the liberty of tweaking the code to fix the column order in the output and to compare to I must say I don't remember much of my implementation, but I know I spent quite some time adding tests to compare to |
data_to_wide() with multiple variables assigned in values_fromdata_to_wide() on unbalanced panel with multiple variables assigned in values_from
data_to_wide() on unbalanced panel with multiple variables assigned in values_from data_to_wide() on unbalanced panel with multiple variables in values_from
This example currently fails, which is fixed by this PR.