Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch: get select working with grouping #390

Merged
merged 10 commits into from
Jan 16, 2024
Merged

patch: get select working with grouping #390

merged 10 commits into from
Jan 16, 2024

Conversation

dsweber2
Copy link
Contributor

@dsweber2 dsweber2 commented Jan 9, 2024

Straightforward patch of select.epi_df so that the grouped version doesn't drop the epi_df type when selecting. closes #388.

@dshemetov
Copy link
Contributor

@brookslogan does this overlap #310 work?

@brookslogan
Copy link
Contributor

brookslogan commented Jan 9, 2024

Yes, this overlaps with #310, but while #310 would lend itself to something more general/proper in some cases, it's probably good to just do the mass-S3-implementation approach for now.

Something for select in particular: we can't just restore the old metadata as we may have dropped keys. We need to preserve the sort of stuff that goes on in [.epi_df and names<-.epi_df [degrading if we drop required columns, adjusting other_keys if we drop or rename them]. I realize now it may not be as simple as some of the other ones with just a dplyr_reconstruct or reclass whatever.

@brookslogan
Copy link
Contributor

@dsweber2 I think the R/ folder changes still need to be pushed?

@dsweber2
Copy link
Contributor Author

dsweber2 commented Jan 9, 2024

lol, whoops. I forgot to push the new file. I was trying to separate out the patch-style methods from the proper/more permanent ones.

Also, there's a ton of warnings which I don't think I caused, and the package has no styler, which means my commits may come with random formatting. Sorry about that.

[degrading if we drop required columns, adjusting other_keys if we drop or rename them]

Good point, I'll add some tests for this and adjust. Though the times I've actually seen select used in our code, it was for dropping a data-column.

Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one test request. Are you planning to do more in this PR?

tests/testthat/test-epi_df.R Outdated Show resolved Hide resolved
@brookslogan
Copy link
Contributor

Also, here or in a separate PR, it'd be nice to have a complete that doesn't drop epi_dfness?

@dsweber2 dsweber2 mentioned this pull request Jan 10, 2024
19 tasks
@dsweber2
Copy link
Contributor Author

I'd rather this not blossom into a weeks long project. I made an issue #391 for patching more generally; if any feel particularly important feel free to rearrange/add some.

@dshemetov
Copy link
Contributor

Probably remove the list from the OP so it doesn't seem like this PR is aiming to check all those boxes.

@dsweber2
Copy link
Contributor Author

So I added some tests for the cases you were talking about @lcbrooks, and it seems like whoever implemented dplyr_reconstruct covered all these bases, since it passed the tests without me changing the core code.

@dsweber2 dsweber2 self-assigned this Jan 10, 2024
@brookslogan
Copy link
Contributor

brookslogan commented Jan 13, 2024

An edge case to fix or file off:

tibble::tibble(geo_value = 1, time_value = 1, age = 1, value = 1) %>%
  as_epi_df(additional_metadata = list(other_keys = "age")) %>%
  select(geo_value, time_value, age_group = age, value) %>%
  attr("metadata")
#> $geo_type
#> [1] "hhs"
#> 
#> $time_type
#> [1] "custom"
#> 
#> $as_of
#> [1] "2024-01-12 16:48:48 PST"
#> 
#> $other_keys
#> character(0)

@brookslogan
Copy link
Contributor

Might be a regression from dplyr_reconstructing in names<-.epi_df. Taking a look.

@brookslogan
Copy link
Contributor

brookslogan commented Jan 13, 2024

Okay, I think I fixed names<-.epi-df. The problem is that it doesn't propagate through correctly via select. Now, based on ?dplyr_extending, you'd expect we could do something like:

select.epi_df <- function(.data, ...) {
  selected <- NextMethod(.data)
  might_decay <- reclass(selected, attr(selected, "metadata"))
  return(dplyr_reconstruct(might_decay, might_decay))
}

but that doesn't seem sufficient. It seems that select.data.frame's out <- set_names(out, names(loc)) just destroys our attributes when we are grouped, preventing us from seeing what was renamed. So we may need to do some eval_selecting ourselves.

[Hmmm, maybe the above isn't accurate. Looks like maybe I didn't account for grouped df's in names<- and that might be the problem. And implementation above does appear to work. Though a reimplementation without all the redirection and checks of dplyr is quite simple:

 select.epi_df <- function(.data, ...) {
   selection <- tidyselect::eval_select(expr(c(...)), .data)
   result <- .data[selection]
   names(result) <- names(selection)
   return (result)
 }

but I've opted to try to use dplyr's checks and any hidden behaviors.]

Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I kind of tricked you with some pre-existing bugs and missing tests + buggy wrappers along these lines. We can't always reconstruct using the input's metadata; dplyr_reconstruct.epi_df tries to account for things like dropped other key columns, but can't detect a rename. I think I've fixed select.epi_df up in the rename case, along with some of the pre-existing things masking the problem, and some other stuff to make tests pass (inconsistencies between NULL vs. character(0L) for representing no other keys).

Please sanity-check these updates & merge.

@dsweber2
Copy link
Contributor Author

Just confirming, other_keys is now by default character(0) if I read this right? Otherwise seems good. Added some trivial annotations, and will then merge.

@dsweber2 dsweber2 merged commit 71e11f7 into dev Jan 16, 2024
2 checks passed
@dajmcdon
Copy link
Contributor

See also @brookslogan's comment on #186 about the character(0) default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

select causes type conversion
4 participants