Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test case for a bug where recoding a non-derived variable using a derived variable as a start variable does not work #37

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

yulric
Copy link
Collaborator

@yulric yulric commented Nov 21, 2023

@reikookamoto This would be a good bug for you to fix.

derived variable as a start variable does not work
variable,dummyVariable,typeEnd,databaseStart,variableStart,typeStart,recEnd,numValidCat,catLabel,catLabelLong,units,recStart,catStartLabel,variableStartShortLabel,variableStartLabel,notes
var_one,,cat,db_one,[var_one_start],cat,copy,2,,,,else,,,,
var_two,,cat,db_one,DerivedVar::[var_one],cat,Func::var_two_func,2,,,,N/A,,,,
var_three,,cat,db_one,[var_two],cat,copy,2,,,,else,,,,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yulric Out of curiousity, are users expected to fill databaseStart as db_one for the last two rows? Neither var_two nor var_three is derived from a variable that exists in db_one (unless you recursively go back). I'm wondering what the "proper" convention is here since this scenario wasn't discussed in the vignettes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the code currently works, users are required to specify the databaseStart column for derived variables. I think it makes sense to do that to be very clear about what databases is valid for. Saying that, the code can probably figure out the databseStart value for a derived variable from the start variable for it.

file.path(getwd(), "../../assets/tests/integration/rec-with-table/variable-details-sheet.csv"),
fileEncoding = "UTF-8-BOM")

variable_details_sheet$variableStart[3] <- "DerivedVar::[var_two]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yulric It looks like this issue can be fixed by changing how we populate the details sheet.

The test was failing earlier because the var_three column was missing from the expected output. The problem was linked to the fact that the variable required to create var_three (i.e., var_two) was only present in the re-coded data, not the original.

You get the expected output when you change the value in variableStart from "var_two" to "DerivedVar::[var_two]". That is to say, a variable that relies on a derived variable should also be treated as a derived variable, regardless of whether it requires a function for re-coding.

The data types of columns in the expected data were changed to factors to match the output, and expect_equivalent() was used to ignore attributes in the output data. Let me know if this checks out for you as well...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think that should be the expected behavior. I think that the DerivedVar prefix should be used to specify that a variable is a derived variable i.e. a variable that requires a function for its recoding.

For this issue, rather than changing the variable details sheet maybe we should change the way the code works so that "nested variables" also get recoded? So recodeflow would see that var_three depends on var_two which is not in the database, so it would pluck it out of the recoded data.

@reikookamoto
Copy link
Collaborator

On another note, if this bug can be resolved by careful organization of the details sheet, we must make this explicitly clear in the vignette.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants