-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial layer adjustments #334
Conversation
5fa3b5f
to
5eed24d
Compare
2146526
to
1a84212
Compare
ok, also added in some stuff to deal with inter-geo latency, based on @lcbrooks discussion on the other PR. The option is called |
R/step_adjust_latency.R
Outdated
#' amount to offset the ahead or lag by. If a single integer, this is used for | ||
#' all columns; if a labeled vector, the labels must correspond to the base | ||
#' column names (before lags/aheads). If `NULL`, the latency is the distance | ||
#' between the `epi_df`'s `max_time_value` and either the | ||
#' `fixed_forecast_date` or the `epi_df`'s `as_of` field (the default for | ||
#' `forecast_date`). | ||
#' @param fixed_forecast_date either a date of the same kind used in the | ||
#' `epi_df`, or `NULL`. Exclusive with `fixed_latency`. If a date, it gives | ||
#' the date from which the forecast is actually occurring. If `NULL`, the | ||
#' `forecast_date` is determined either via the `fixed_latency`, or is set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and the previous parameters) all seems like a lot of complexity/interaction with each other. I'm especially concerned with the "exclusive with" components. But in any case, it's quite the tree (hard to test?).
Maybe this is the cleanest it gets, but it might be worth brainstorming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed_*
are mostly about giving the user easy options, depending on which feature of latency they actually know, rather than having to compute it by hand. My explanation may not be particularly good though, and if you have suggestions for a different interface I'd welcome that.
A lot of it shakes out to simple checking for NULL
before actually setting a value, since I have to set both latency
and forecast_date
anyways. The option just replaces the calculated versions with a fixed one.
After thinking through the logic again, epi_keys_checked
is always used, either as part of computing the forecast_date
given a fixed_latency
or as part of computing the latency
given a fixed_forecast_date
, so it doesn't actually get caught up in this.
R/utils-latency.R
Outdated
# null and "" don't work in `group_by` | ||
if (!is.null(epi_keys_checked) && epi_keys_checked != "") { | ||
group_by(., get(epi_keys_checked)) | ||
} else { | ||
. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern (and same below) is pretty hard to parse. Just for the sake of clarity, maybe break the pipe sequence into a few lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I was just following the convention in the function previously of "in-piping" the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and by in the function previously, I actually mean in arx_forecaster
Lines 129 to 146 in de6e1db
r <- r %>% | |
step_epi_ahead(!!outcome, ahead = args_list$ahead) %>% | |
step_epi_naomit() %>% | |
step_training_window(n_recent = args_list$n_training) %>% | |
{ | |
if (!is.null(args_list$check_enough_data_n)) { | |
check_enough_train_data( | |
., | |
all_predictors(), | |
!!outcome, | |
n = args_list$check_enough_data_n, | |
epi_keys = args_list$check_enough_data_epi_keys, | |
drop_na = FALSE | |
) | |
} else { | |
. | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I missed that when reviewing that PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, I'll switch them both then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were going to switch these right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot there were 3 instances instead of just 2
R/utils-latency.R
Outdated
if (inherits(this_recipe$steps[[3]], "step_adjust_latency")) x$as_of | ||
} | ||
) %>% Filter(Negate(is.null), .) | ||
if (length(handpicked_as_of) > 0) { | ||
max_time_value <- handpicked_as_of[[1]] | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the significance of the list element by position here ([[3]]
and [[1]]
)? This can be potentially dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the this_recipe$steps[[3]]
was left over from development, surprised it didn't cause any errors yet.
handpicked_as_of
should have only value (otherwise they have multiple step_adjust_latency
s, which shouldn't work). I suppose I should add a check that there's only one step_adjust_latency
during step creation
- drop multiline pipes - better docs - check exclusive parameters aren't used simultaneously - inherit typo - additional placeholders for future tests
@dsweber2 Is this the PR I'm blocking? Is there something in particular I should focus on in review? |
This one and it's parent As far as where to focus, probably the arguments to Rough summary of this PR is:
In hindsight, there are some changes I made here that I probably should've made in the other PR and rebased onto, sorry about all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a number of comments, but I suspect there are some missing tests. Or else I'm doing something wrong locally. Checks fail.
[I see, remote checks are only running stylr
]
fixed_latency = NULL, | ||
fixed_asof = NULL, | ||
fixed_forecast_date = NULL, | ||
default = NA, | ||
skip = FALSE, | ||
columns = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? Is it just populated by the tidy selection? You can inherit these from other step_*
functions. (Same applies to skip
and id
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default for id
should be rand_id("adjust_latency")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure which of these were necessary boilerplate for all steps, and which were args only the other step would need. Would gladly drop most of the generic steps if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been basing it off the instructions here (though they may have changed since I last made one):
https://www.tidymodels.org/learn/develop/recipes/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh you mean inherit the documentation probably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I was wondering specifically if columns
is necessary. The documentation makes it look like something set by the user, but it's actually ... not used at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, yeah columns
is definitely a cargo-cult argument, I'll drop it. ...
are used to actually restrict the terms used to specific columns (though I don't have a test for that, going to add one to make sure it's working properly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ok. In many example step_*()
, columns is an argument. I'm not honestly sure why. Take a look at
?recipes::step_lag
View(recipes::step_lag)
View(recipes:::step_lag_new)
View(recipes:::prep.step_lag)
The columns argument gets populated by terms
at prep time. I see why they use it in step_lag_new
, but I don't understand why it is kept as an argument to step_lag
.
Long story short, they say to use it. But it needs to inherit the documentation. You can use
#' @inheritParams recipes::step_lag
@@ -267,9 +299,9 @@ print.step_adjust_latency <- | |||
} else { | |||
terms <- x$terms | |||
} | |||
if (!is.null(x$as_of)) { | |||
if (!is.null(x$forecast_date)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I didn't have any tests for this and changed the format. Surprised printing wasn't throwing errors. I'll let you know when I think this is fixed, leaving open
R/utils-latency.R
Outdated
# null and "" don't work in `group_by` | ||
if (!is.null(epi_keys_checked) && epi_keys_checked != "") { | ||
group_by(., get(epi_keys_checked)) | ||
} else { | ||
. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were going to switch these right?
The tests at least were running locally for me at the last PR. I'm generally not in a habit of running the full checks locally since I expect the remote to handle that (and they really slow down the feedback loop); I guess b/c this is a PR on a PR it isn't running the full checks. Doing so, looks like its mostly things not being in the namespace. Check now passes locally, with 4 notes (mostly some local files it should ignore, the ubiquitous global I've marked as resolved the things I thought were straightforward in being addressed, and left open things I'm still confused on or working through. |
fixed_latency = NULL, | ||
fixed_asof = NULL, | ||
fixed_forecast_date = NULL, | ||
default = NA, | ||
skip = FALSE, | ||
columns = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I was wondering specifically if columns
is necessary. The documentation makes it look like something set by the user, but it's actually ... not used at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect these were all OK because there's a library(dplyr)
at the top.
This is to keep the discussion on the layer additions a little bit separate from the other PR, because that is becoming unwieldy in length. The documentation is still a WIP.