Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial layer adjustments #334

Merged
merged 21 commits into from
Jun 17, 2024
Merged

initial layer adjustments #334

merged 21 commits into from
Jun 17, 2024

Conversation

dsweber2
Copy link
Contributor

@dsweber2 dsweber2 commented May 15, 2024

This is to keep the discussion on the layer additions a little bit separate from the other PR, because that is becoming unwieldy in length. The documentation is still a WIP.

@dsweber2 dsweber2 requested a review from dajmcdon as a code owner May 15, 2024 23:17
@dsweber2 dsweber2 requested review from dajmcdon and removed request for dajmcdon May 15, 2024 23:18
@dsweber2
Copy link
Contributor Author

ok, also added in some stuff to deal with inter-geo latency, based on @lcbrooks discussion on the other PR. The option is called epi_keys_checked, and defaults to just "geo_value", but can group by any of the epi_keys

R/step_adjust_latency.R Outdated Show resolved Hide resolved
Comment on lines 41 to 50
#' amount to offset the ahead or lag by. If a single integer, this is used for
#' all columns; if a labeled vector, the labels must correspond to the base
#' column names (before lags/aheads). If `NULL`, the latency is the distance
#' between the `epi_df`'s `max_time_value` and either the
#' `fixed_forecast_date` or the `epi_df`'s `as_of` field (the default for
#' `forecast_date`).
#' @param fixed_forecast_date either a date of the same kind used in the
#' `epi_df`, or `NULL`. Exclusive with `fixed_latency`. If a date, it gives
#' the date from which the forecast is actually occurring. If `NULL`, the
#' `forecast_date` is determined either via the `fixed_latency`, or is set to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the previous parameters) all seems like a lot of complexity/interaction with each other. I'm especially concerned with the "exclusive with" components. But in any case, it's quite the tree (hard to test?).

Maybe this is the cleanest it gets, but it might be worth brainstorming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed_* are mostly about giving the user easy options, depending on which feature of latency they actually know, rather than having to compute it by hand. My explanation may not be particularly good though, and if you have suggestions for a different interface I'd welcome that.

A lot of it shakes out to simple checking for NULL before actually setting a value, since I have to set both latency and forecast_date anyways. The option just replaces the calculated versions with a fixed one.

After thinking through the logic again, epi_keys_checked is always used, either as part of computing the forecast_date given a fixed_latency or as part of computing the latency given a fixed_forecast_date, so it doesn't actually get caught up in this.

Comment on lines 144 to 149
# null and "" don't work in `group_by`
if (!is.null(epi_keys_checked) && epi_keys_checked != "") {
group_by(., get(epi_keys_checked))
} else {
.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern (and same below) is pretty hard to parse. Just for the sake of clarity, maybe break the pipe sequence into a few lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I was just following the convention in the function previously of "in-piping" the logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and by in the function previously, I actually mean in arx_forecaster

r <- r %>%
step_epi_ahead(!!outcome, ahead = args_list$ahead) %>%
step_epi_naomit() %>%
step_training_window(n_recent = args_list$n_training) %>%
{
if (!is.null(args_list$check_enough_data_n)) {
check_enough_train_data(
.,
all_predictors(),
!!outcome,
n = args_list$check_enough_data_n,
epi_keys = args_list$check_enough_data_epi_keys,
drop_na = FALSE
)
} else {
.
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I missed that when reviewing that PR...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, I'll switch them both then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were going to switch these right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot there were 3 instances instead of just 2

Comment on lines 219 to 224
if (inherits(this_recipe$steps[[3]], "step_adjust_latency")) x$as_of
}
) %>% Filter(Negate(is.null), .)
if (length(handpicked_as_of) > 0) {
max_time_value <- handpicked_as_of[[1]]
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significance of the list element by position here ([[3]] and [[1]])? This can be potentially dangerous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the this_recipe$steps[[3]] was left over from development, surprised it didn't cause any errors yet.

handpicked_as_of should have only value (otherwise they have multiple step_adjust_latencys, which shouldn't work). I suppose I should add a check that there's only one step_adjust_latency during step creation

- drop multiline pipes
- better docs
- check exclusive parameters aren't used simultaneously
- inherit typo
- additional placeholders for future tests
@dajmcdon
Copy link
Contributor

@dsweber2 Is this the PR I'm blocking? Is there something in particular I should focus on in review?

@dsweber2
Copy link
Contributor Author

This one and it's parent adjustAhead are both more or less waiting on your review. Couldn't remember when you were back, probably should've been a bit more explicit and given you a summary to work with when I thought it was ready.

As far as where to focus, probably the arguments to layer_add_forecast_date, layer_add_target_date, and the arx_* functions, and the tests are where I would focus first if I were reviewing it, but that's pretty generic.

Rough summary of this PR is:

  • layer_add_forecast_date and layer_add_target_date default to using the date specified by step_adjust_latency
  • adding adjust_latency to arx_args_list, with a default of NULL, to preserve previous behavior.
  • forecast_date and target_date both inherit values from step_adjust_latency's logic if adjust_latency is present. (Just realized I should move this logic to arx_args_list, since it's currently in arx_forecaster but not arx_classifier; will do when no longer traveling)
  • added get_forecast_date_in_layer to get the right forecast date in the layers rather than the steps
  • changes as_of to forecast_date for step_adjust_latency
  • swap out some awkward piping that used braces as a quasi-function

In hindsight, there are some changes I made here that I probably should've made in the other PR and rebased onto, sorry about all the as_of -> forecast_date's in the diffs.

Copy link
Contributor

@dajmcdon dajmcdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a number of comments, but I suspect there are some missing tests. Or else I'm doing something wrong locally. Checks fail.

[I see, remote checks are only running stylr]

R/arx_forecaster.R Outdated Show resolved Hide resolved
R/step_adjust_latency.R Show resolved Hide resolved
fixed_latency = NULL,
fixed_asof = NULL,
fixed_forecast_date = NULL,
default = NA,
skip = FALSE,
columns = NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Is it just populated by the tidy selection? You can inherit these from other step_* functions. (Same applies to skip and id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default for id should be rand_id("adjust_latency")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure which of these were necessary boilerplate for all steps, and which were args only the other step would need. Would gladly drop most of the generic steps if possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been basing it off the instructions here (though they may have changed since I last made one):
https://www.tidymodels.org/learn/develop/recipes/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you mean inherit the documentation probably?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was wondering specifically if columns is necessary. The documentation makes it look like something set by the user, but it's actually ... not used at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yeah columns is definitely a cargo-cult argument, I'll drop it. ... are used to actually restrict the terms used to specific columns (though I don't have a test for that, going to add one to make sure it's working properly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok. In many example step_*(), columns is an argument. I'm not honestly sure why. Take a look at

?recipes::step_lag
View(recipes::step_lag)
View(recipes:::step_lag_new)
View(recipes:::prep.step_lag)

The columns argument gets populated by terms at prep time. I see why they use it in step_lag_new, but I don't understand why it is kept as an argument to step_lag.

Long story short, they say to use it. But it needs to inherit the documentation. You can use

#' @inheritParams recipes::step_lag

R/step_adjust_latency.R Show resolved Hide resolved
@@ -267,9 +299,9 @@ print.step_adjust_latency <-
} else {
terms <- x$terms
}
if (!is.null(x$as_of)) {
if (!is.null(x$forecast_date)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing looks very strange. This is the example recipe above:
Screenshot 2024-06-14 at 10 08 19

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I didn't have any tests for this and changed the format. Surprised printing wasn't throwing errors. I'll let you know when I think this is fixed, leaving open

R/utils-latency.R Outdated Show resolved Hide resolved
R/utils-latency.R Show resolved Hide resolved
Comment on lines 144 to 149
# null and "" don't work in `group_by`
if (!is.null(epi_keys_checked) && epi_keys_checked != "") {
group_by(., get(epi_keys_checked))
} else {
.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were going to switch these right?

R/utils-latency.R Outdated Show resolved Hide resolved
R/utils-latency.R Show resolved Hide resolved
@dsweber2
Copy link
Contributor Author

Checks fail.

The tests at least were running locally for me at the last PR. I'm generally not in a habit of running the full checks locally since I expect the remote to handle that (and they really slow down the feedback loop); I guess b/c this is a PR on a PR it isn't running the full checks.

Doing so, looks like its mostly things not being in the namespace. Check now passes locally, with 4 notes (mostly some local files it should ignore, the ubiquitous global * definition, and some ::: quasi-imports).

I've marked as resolved the things I thought were straightforward in being addressed, and left open things I'm still confused on or working through.

fixed_latency = NULL,
fixed_asof = NULL,
fixed_forecast_date = NULL,
default = NA,
skip = FALSE,
columns = NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was wondering specifically if columns is necessary. The documentation makes it look like something set by the user, but it's actually ... not used at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect these were all OK because there's a library(dplyr) at the top.

@dsweber2 dsweber2 merged commit f36f6fa into adjustAhead Jun 17, 2024
1 check passed
@dshemetov dshemetov deleted the adjustAheadLayerAdditions branch June 18, 2024 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants