Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressed the sources of the conflicts between epipredict and epiprocess (see Issue 104 in epipredict) #185

Conversation

rachlobay
Copy link
Collaborator

Updated [.epi_df code to simplify the code and to fix Issue 104 in epipredict as discussed with Maggie and Daniel.

Copy link
Contributor

@dajmcdon dajmcdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much simpler. I wonder why the other version was so complicated?

@rachlobay rachlobay merged commit 4703684 into main Aug 3, 2022
@rachlobay
Copy link
Collaborator Author

rachlobay commented Aug 3, 2022

Well... A couple things... 1) We decided to remove the column subsetting condition, so a couple other lines could be removed that were mainly there for and 2) If we look at [.tsb_ts and [.tbl_df both had a common code structure that I initially thought we should keep for consistency, but after doing some testing I am not seeing how some of those lines were absolutely necessary to have (so probably the code for [.tsb_ts code could be simplified more similar to what we did here).

@brookslogan
Copy link
Contributor

brookslogan commented Aug 5, 2022

Couple of questions while it's fresh in your mind:

  • What was the column subsetting condition?
  • I think some of those lines weren't necessary for [.epi_df because its NextMethod should / is expected to forward to [.tbl_df which does some of these checks. (And now, we don't even need the j arg stuff.)
  • Are the later parts of the not_epi_df condition supposed to catch when rows have been duplicated, or when NA rows have been introduced via row indices that are too high? (Because it'd violate reasonable constraints on the key col values?) [But these checks aren't comprehensive! Duplicates of a single row could be requested a number of times less than the original number of columns and pass the checks but still have duplicates.]

And potential issues:

  • Can the user select away some of the [other] key columns? Then the result's metadata might be wrong.
  • [Can the user duplicate the key columns? Then we might have unexpected behavior and might want to decay to a tibble. Although this is a bit far-fetched and likely indicates that the user is doing something weird/buggy.]

@rachlobay
Copy link
Collaborator Author

@brookslogan answers to each of your qs are below:

  1. As in [.tbl_ts, if there is column subsetting only, then return tibble… This caused problems with some ex. in this package and we ultimately decided to drop that code rather than keep it.
  2. Yeah. I agree. Almost all of the removed lines in [.epi_df are from [.tbl_ts (which also I think calls [.tsb_df using NextMethod()), which is why I am now saying that [.tsb_ts could probably be simplified. That confusing reassigning i to j and making i null, etc. part that was removed here looks to have come into play in the column subsetting condition in [.tsb_ts, so that is one reason why we could remove that part here.
  3. Yeah, as far as I remember from testing, vec_size() here was for the # of rows (so that should catch whatever could cause the number of rows being greater than it was initially). And any i > nr, would be for when someone is trying to index beyond i.
  • So far, we have really only enforced the basic criteria that both a geo and time value must be included to be an epi_df… Do you mean, for ex., like if we paste the code for ex1 here into R and look at attr(ex1[,-3], "metadata"), we still have “county_code” under $other_keys there? If yes, I agree this should be dealt with & an issue could be posted for it.

  • Do you mean, considering the ex. from the previous point, something like attr(ex1[,c(1,1,2:4)], "metadata”)? There it looks like the corresponding metadata doesn’t duplicate along with the duplicated column, which is good. I am not sure what is best to do here, but since duplicating such a key is confusing, we could go down to a tibble if you think it is the safest bet.

@brookslogan
Copy link
Contributor

Thanks for the answers. I couldn't tell what the purpose of some of that old i&j reassignment magic + other logic was.

  1. I've been reading this all backwards; now I see the old column-subsetting-only --> tibble code that's been removed. Current behavior makes sense.
  2. Exactly. Will post an issue.
  3. Going to tibble does sound safer. Not sure if there would be something sensible to do with other_keys. Which brings up another detail:
  4. In many/all cases where we end up with a tibble, some/all of the metadata doesn't make sense anymore. (But this is probably low-priority cleanup because the only metadata attribute that might be preserved through an epi_df -> tibble -> epi_df chain via [ and {new,as}_epi_df looks like as_of, which does continue to make sense.

@rachlobay
Copy link
Collaborator Author

rachlobay commented Aug 8, 2022

I've been looking into the issue about duplicating key cols... If we take ex1[,c(1,1,2:4)] and try to coerce that to a tibble using as_tibble(ex1[,c(1,1,2:4)]), the result is an error:

! Column name `pol` must not be duplicated.
Use .name_repair to specify repair.
Caused by error in `repaired_names()`:
! Names must be unique.

Similarly if we try to duplicate other cols in that df. So, it may be best to abort for col. duplication... I'll re-post this under the relevant issue.

@rachlobay rachlobay deleted the 104-find-the-sources-of-the-conflicts-with-most-recent-version-of-epiprocess branch August 18, 2022 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Find the sources of the conflicts with most recent version of epiprocess
3 participants