-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fabricate rewrite PR #41
Conversation
…e, will implement similar strategy in the refactor in next few commits. This will generate a warning or error on the build.
…s build should generate a warning.
…egan process of deprecating old files.
…nit tests to reflect new names, bugfixes to correct unit tests
…t fixes. Also cleaning up documentation
…o imported data single-level
…r into fabricate_rewrite
…avis but not on local builds, related to building the PDF manual.
…umentation for ICC functions, and fixed bug with normal ICC data.
… unique variables by level.
…ather than level calls.
…warning due to a documentation issue.
…by my side, a little bit of optimization is all I need, a little bit of CI bugs is what I see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks excellent and pretty well tested. The stuff with building a working environment could use a second pass (not using lists inside the env could simplifiy quite a bit of the book keeping) but if it's working we can always change it later. 👍 overall.
NAMESPACE
Outdated
export(fabricate) | ||
export(join) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close to overloading dplyr funs, but barely doesnt ;)
R/cross_classify_helpers.R
Outdated
if(is.null(sigma)) { | ||
if(is.atomic(rho) & length(rho)==1) { | ||
if(ndim>2 & rho<0) { | ||
stop("The correlation matrix must be positive semi-definite. In specific, ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Specifically"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg grammar correction in a code review
} | ||
|
||
# Can we use the fast package or are we stuck with the slow one? | ||
use_f = use_f && requireNamespace("mvnfast", quietly = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
R/cross_classify_helpers.R
Outdated
|
||
} else { | ||
# Using mvnfast | ||
correlated_sn = mvnfast::rmvn(N, ncores = 2, mu, sigma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getOption("mc.cores")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, nuts, this is clearly an artifact from putzing around on my machine. Will fix.
R/cross_classify_helpers.R
Outdated
# with right_chol to make it correlated. | ||
correlated_sn <- matrix(rnorm(N * ndim), | ||
nrow = N, | ||
byrow = TRUE) %*% right_chol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for byrow
|
||
# User is adding a new level, but already has a working data frame. | ||
# Shelf the working data frame and move on | ||
if("data_frame_output_" %in% names(working_environment_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is a bit hard to follow, going to meditate on it.
# Any obs that matches the level matching this i will be a duplicate of this i. | ||
index_maps[ | ||
working_environment_$data_frame_output_[[ID_label]] == unique_values_of_level[i] | ||
] = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm really open to a better implementation of this, I'm sure there is one.
# Move the current working data frame into a package | ||
package_df = list(data_frame_output_ = working_environment_$data_frame_output_, | ||
level_ids_ = working_environment_$level_ids_, | ||
variable_names_ = names(working_environment_$data_frame_output_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this snippet pops up a bit, might want to make a helper / constructor function.
R/fabricate.R
Outdated
# Loop over the variable name | ||
|
||
variable_names = by$variable_names | ||
data_frame_indices = numeric(length(variable_names)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length always returns an int I believe.
# What would the indices of the quantiles be if our data was ordered -- | ||
# if the answer is below 0, set it to 1. round will ensure the tie- | ||
# breaking behaviour is random with respect to outcomes | ||
ordered_indices = pmax(1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be factored out of the lapply to a seperate sweep statement -
|
||
test_that("Deliberate failures in cross_level", { | ||
expect_error( | ||
test_next = fabricate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should not have a name (test_next) on this expect_error (test-crossclassified line 96) -
I believe I've implemented all of @nfultz's suggestions that were trivial to do, as well as bumped up the test coverage to hit all the code. The remainder of the code review suggestions I've issued. Provided everyone else is fine, this can be squashed and merged now. Phew. Feels good to have over a month of hard work finally be merged into master. |
This is the PR that will merge in all of the new level syntax, including the cross_level stuff.
Travis:
Travis is currently failing on a specific test even though R CMD Check on my local system is working. @nfultz figures it's a weird Travis quirk, but it's POSSIBLE that there is an actual build issue (see builds 280, 281, 283). I'll be back this evening if I'm needed to address this stuff.
Notes about backwards compatibility:
The one thing that I am not 100% certain about in terms of replicating the existing behaviour of the old level function is the exact conditions under which a new ID label column is added to data. This is not breaking when it comes to any substantive stuff, it's just that a data frame might have an extra ID column relative to some past workflows. I began a discussion with @graemeblair about this in issue #33 -- I just haven't fully doubled back and verified exactly what's going on in the code.
Not done:
Vignettes related to cross-classifying data; draw_discrete changes
Code review / refactoring notes:
Some of the error handling and setup for the level functions is slightly duplicated and could possibly be abstracted into helper functions; the fabricate.r file should be split up into multiple files.