Addressed the sources of the conflicts between epipredict and epiprocess (see Issue 104 in epipredict) #185

rachlobay · 2022-08-03T22:13:25Z

Updated [.epi_df code to simplify the code and to fix Issue 104 in epipredict as discussed with Maggie and Daniel.

dajmcdon

This is much simpler. I wonder why the other version was so complicated?

rachlobay · 2022-08-03T23:02:45Z

Well... A couple things... 1) We decided to remove the column subsetting condition, so a couple other lines could be removed that were mainly there for and 2) If we look at [.tsb_ts and [.tbl_df both had a common code structure that I initially thought we should keep for consistency, but after doing some testing I am not seeing how some of those lines were absolutely necessary to have (so probably the code for [.tsb_ts code could be simplified more similar to what we did here).

brookslogan · 2022-08-05T17:03:07Z

Couple of questions while it's fresh in your mind:

What was the column subsetting condition?
I think some of those lines weren't necessary for [.epi_df because its NextMethod should / is expected to forward to [.tbl_df which does some of these checks. (And now, we don't even need the j arg stuff.)
Are the later parts of the not_epi_df condition supposed to catch when rows have been duplicated, or when NA rows have been introduced via row indices that are too high? (Because it'd violate reasonable constraints on the key col values?) [But these checks aren't comprehensive! Duplicates of a single row could be requested a number of times less than the original number of columns and pass the checks but still have duplicates.]

And potential issues:

Can the user select away some of the [other] key columns? Then the result's metadata might be wrong.
[Can the user duplicate the key columns? Then we might have unexpected behavior and might want to decay to a tibble. Although this is a bit far-fetched and likely indicates that the user is doing something weird/buggy.]

rachlobay · 2022-08-05T22:26:36Z

@brookslogan answers to each of your qs are below:

As in [.tbl_ts, if there is column subsetting only, then return tibble… This caused problems with some ex. in this package and we ultimately decided to drop that code rather than keep it.
Yeah. I agree. Almost all of the removed lines in [.epi_df are from [.tbl_ts (which also I think calls [.tsb_df using NextMethod()), which is why I am now saying that [.tsb_ts could probably be simplified. That confusing reassigning i to j and making i null, etc. part that was removed here looks to have come into play in the column subsetting condition in [.tsb_ts, so that is one reason why we could remove that part here.
Yeah, as far as I remember from testing, vec_size() here was for the # of rows (so that should catch whatever could cause the number of rows being greater than it was initially). And any i > nr, would be for when someone is trying to index beyond i.

So far, we have really only enforced the basic criteria that both a geo and time value must be included to be an epi_df… Do you mean, for ex., like if we paste the code for ex1 here into R and look at attr(ex1[,-3], "metadata"), we still have “county_code” under $other_keys there? If yes, I agree this should be dealt with & an issue could be posted for it.
Do you mean, considering the ex. from the previous point, something like attr(ex1[,c(1,1,2:4)], "metadata”)? There it looks like the corresponding metadata doesn’t duplicate along with the duplicated column, which is good. I am not sure what is best to do here, but since duplicating such a key is confusing, we could go down to a tibble if you think it is the safest bet.

brookslogan · 2022-08-08T15:25:53Z

Thanks for the answers. I couldn't tell what the purpose of some of that old i&j reassignment magic + other logic was.

I've been reading this all backwards; now I see the old column-subsetting-only --> tibble code that's been removed. Current behavior makes sense.
Exactly. Will post an issue.
Going to tibble does sound safer. Not sure if there would be something sensible to do with other_keys. Which brings up another detail:
In many/all cases where we end up with a tibble, some/all of the metadata doesn't make sense anymore. (But this is probably low-priority cleanup because the only metadata attribute that might be preserved through an epi_df -> tibble -> epi_df chain via [ and {new,as}_epi_df looks like as_of, which does continue to make sense.

rachlobay · 2022-08-08T19:22:46Z

I've been looking into the issue about duplicating key cols... If we take ex1[,c(1,1,2:4)] and try to coerce that to a tibble using as_tibble(ex1[,c(1,1,2:4)]), the result is an error:

! Column name `pol` must not be duplicated.
Use .name_repair to specify repair.
Caused by error in `repaired_names()`:
! Names must be unique.

Similarly if we try to duplicate other cols in that df. So, it may be best to abort for col. duplication... I'll re-post this under the relevant issue.

admin added 2 commits August 3, 2022 14:51

Addressed issue #104 from epipredict & simplified code

551efef

Removed extra line

a508f68

rachlobay requested review from dajmcdon and brookslogan as code owners August 3, 2022 22:13

devtools::document()

d1b23ce

dajmcdon approved these changes Aug 3, 2022

View reviewed changes

rachlobay merged commit 4703684 into main Aug 3, 2022

rachlobay deleted the 104-find-the-sources-of-the-conflicts-with-most-recent-version-of-epiprocess branch August 18, 2022 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressed the sources of the conflicts between epipredict and epiprocess (see Issue 104 in epipredict) #185

Addressed the sources of the conflicts between epipredict and epiprocess (see Issue 104 in epipredict) #185

rachlobay commented Aug 3, 2022

dajmcdon left a comment

rachlobay commented Aug 3, 2022 •

edited

Loading

brookslogan commented Aug 5, 2022 •

edited

Loading

rachlobay commented Aug 5, 2022

brookslogan commented Aug 8, 2022

rachlobay commented Aug 8, 2022 •

edited

Loading

Addressed the sources of the conflicts between epipredict and epiprocess (see Issue 104 in epipredict) #185

Addressed the sources of the conflicts between epipredict and epiprocess (see Issue 104 in epipredict) #185

Conversation

rachlobay commented Aug 3, 2022

dajmcdon left a comment

Choose a reason for hiding this comment

rachlobay commented Aug 3, 2022 • edited Loading

brookslogan commented Aug 5, 2022 • edited Loading

rachlobay commented Aug 5, 2022

brookslogan commented Aug 8, 2022

rachlobay commented Aug 8, 2022 • edited Loading

rachlobay commented Aug 3, 2022 •

edited

Loading

brookslogan commented Aug 5, 2022 •

edited

Loading

rachlobay commented Aug 8, 2022 •

edited

Loading