-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added test case for a bug where recoding a non-derived variable using a derived variable as a start variable does not work #37
base: main
Are you sure you want to change the base?
Conversation
derived variable as a start variable does not work
variable,dummyVariable,typeEnd,databaseStart,variableStart,typeStart,recEnd,numValidCat,catLabel,catLabelLong,units,recStart,catStartLabel,variableStartShortLabel,variableStartLabel,notes | ||
var_one,,cat,db_one,[var_one_start],cat,copy,2,,,,else,,,, | ||
var_two,,cat,db_one,DerivedVar::[var_one],cat,Func::var_two_func,2,,,,N/A,,,, | ||
var_three,,cat,db_one,[var_two],cat,copy,2,,,,else,,,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yulric Out of curiousity, are users expected to fill databaseStart
as db_one
for the last two rows? Neither var_two
nor var_three
is derived from a variable that exists in db_one
(unless you recursively go back). I'm wondering what the "proper" convention is here since this scenario wasn't discussed in the vignettes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the code currently works, users are required to specify the databaseStart
column for derived variables. I think it makes sense to do that to be very clear about what databases is valid for. Saying that, the code can probably figure out the databseStart value for a derived variable from the start variable for it.
file.path(getwd(), "../../assets/tests/integration/rec-with-table/variable-details-sheet.csv"), | ||
fileEncoding = "UTF-8-BOM") | ||
|
||
variable_details_sheet$variableStart[3] <- "DerivedVar::[var_two]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yulric It looks like this issue can be fixed by changing how we populate the details sheet.
The test was failing earlier because the var_three
column was missing from the expected output. The problem was linked to the fact that the variable required to create var_three
(i.e., var_two
) was only present in the re-coded data, not the original.
You get the expected output when you change the value in variableStart
from "var_two" to "DerivedVar::[var_two]". That is to say, a variable that relies on a derived variable should also be treated as a derived variable, regardless of whether it requires a function for re-coding.
The data types of columns in the expected data were changed to factors to match the output, and expect_equivalent()
was used to ignore attributes in the output data. Let me know if this checks out for you as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think that should be the expected behavior. I think that the DerivedVar
prefix should be used to specify that a variable is a derived variable i.e. a variable that requires a function for its recoding.
For this issue, rather than changing the variable details sheet maybe we should change the way the code works so that "nested variables" also get recoded? So recodeflow would see that var_three depends on var_two which is not in the database, so it would pluck it out of the recoded data.
On another note, if this bug can be resolved by careful organization of the details sheet, we must make this explicitly clear in the vignette. |
@reikookamoto This would be a good bug for you to fix.