-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fabricate rewrite PR #41
Merged
Merged
Changes from all commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
b6219fd
Complete documentation of fabricate.R so it makes sense to me.
aaronrudkin 42855e2
Detailed documentation of existing level function.
aaronrudkin ee91b2b
Beginning of fabricate rewrite (this will break a build, expect build…
aaronrudkin 40b1c85
nest_level and stub modify_level functionality. This build will gener…
aaronrudkin 073935c
Major speed improvement on modify level calls in the original codebas…
aaronrudkin 35d465d
Implemented modify_level_new and improved speed of several steps. Thi…
aaronrudkin c9e4c52
Switched the working environment to an environment for speed gains. B…
aaronrudkin 715df58
Renamed files to old- prefix and added to rbuildignore
aaronrudkin 4e097d2
Renamed new files to final names.
aaronrudkin 089adca
Renamed functions to take over namespace.
aaronrudkin b49a6b1
Cutoff to new version of fabricate and the level functions, updated u…
aaronrudkin a72ddea
Update to use add_level instead of nest_level, documentation, and tes…
aaronrudkin 41e4090
README.Rmd, update with Getting started example on main Github page.
aaronrudkin 2416330
Bugfixes for row names when resampling, bugfix for adding variables t…
aaronrudkin 5e6e95b
Complete rewrite and expansion of vignette.
aaronrudkin 5e156f8
Added README.Rmd to .Rbuildignore
aaronrudkin e9277af
Fixes #32 and provisionally implements suggestion 1 for #33
aaronrudkin f894d77
Documentation update Nov 20, 2017
aaronrudkin bf27e10
Merge branch 'master' into fabricate_rewrite
aaronrudkin 8b2cde9
Added draw_binary_icc and did line length trimming on variable_creati…
aaronrudkin 95de0b0
Merge branch 'fabricate_rewrite' of github.com:DeclareDesign/fabricat…
aaronrudkin 7ebb9e2
Fixed #35 and sped up ordered data by swapping cut for findInterval
aaronrudkin 31797f6
Added a likert unit test to draw ordered data.
aaronrudkin b5f5113
Fixed issues with last set of tests, added draw_normal_icc
aaronrudkin a61e41d
Changed documentation to remove math which was causing an error on Tr…
aaronrudkin 3b66ee0
Merge commit of doc changes from master into fabricate_rewrite
aaronrudkin 4349550
Fixed documentation to reflect add_level, modify_level, and added doc…
aaronrudkin 354fd67
Documentation push and fixed a bug in draw_normal_icc vignette.
aaronrudkin e473942
Test additions to ICC data and fixed tests for draw_normal_icc
aaronrudkin 472c508
Renamed cluster_ids to clusters and patched up a few more tests
aaronrudkin ce912ca
Fixed bug in error handling in handle_n and added more tests.
aaronrudkin 1521ba7
More test coverage.
aaronrudkin 49782c1
Remaining test coverage for draw_normal_icc
aaronrudkin 2883619
Test coverage for helper functions including symbol lookahead and get…
aaronrudkin 284980d
Test coverage for main fabricate and level methods.
aaronrudkin 3858838
Moved data frame sanity check for imported data into fabricate call r…
aaronrudkin a91e053
Removed an error handler code could never reach and added a few minor…
aaronrudkin bd28cc2
Forgot to commit one character typo fix, broke build.
aaronrudkin 351fa3b
Removed superfluous data checking code in nest, modify, and add
aaronrudkin 593dedc
cross_classify implementation first pass. This build will generate a …
aaronrudkin 99b6344
Changes to cross_level syntax and documentation for cross-classified …
aaronrudkin bd36789
First test pass at cross-classified data, fixed a bug in specifying s…
aaronrudkin ff97198
Fix a bug in specifying rho, added tests for all the cross-classifyin…
aaronrudkin 6e44118
Fixed a bug that made all the tests I just wrote not work.
aaronrudkin 7990a47
Added additional testing for outer wrapper of cross_level
aaronrudkin 8fc19f8
A little bit of test coverage in my life, a little bit of bug fixing …
aaronrudkin 7c6c307
Version bump due to breaking syntax change.
aaronrudkin 289693b
Fixes for test apparatus to work with testthat 2.0.0
aaronrudkin f33c26d
A few more tests and fixed a bug in adding variables after importing …
aaronrudkin ce4080a
Fixes from nfultz's code review.
aaronrudkin ad6b54c
Merge branch 'master' into fabricate_rewrite
aaronrudkin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
join_dfs = function(dfs, variables, N, sigma=NULL, rho=0) { | ||
# Error handling | ||
if(is.data.frame(dfs)) { | ||
stop("You need at least two data frames.") | ||
} | ||
if(length(dfs) != length(variables)) { | ||
stop("You must define which variables to join on.") | ||
} | ||
if(length(variables) < 2) { | ||
stop("You must define at least two variables to join on.") | ||
} | ||
|
||
# Create the data list -- the subset from the dfs of the variables we're | ||
# joining on -- for each df in dfs, map it to a variable. Subset the df to | ||
# that variable. Unlist and unname, creating a vector. Plonk that in a | ||
# data_list | ||
data_list = Map(function(x, y) { | ||
unname(unlist(x[y])) | ||
}, dfs, variables) | ||
|
||
# Do the joint draw | ||
result = joint_draw_ecdf(data_list=data_list, | ||
N=N, | ||
sigma=sigma, | ||
rho=rho) | ||
|
||
# result now contains a matrix of indices. Each column of this matrix is | ||
# the indices for each df of dfs. Subset by row the df. This will return | ||
# a list of new dfs. We need to cbind these dfs to make the merged data. | ||
merged_data = do.call("cbind", | ||
Map(function(df, indices) { df[indices, ] }, | ||
dfs, | ||
result)) | ||
|
||
# Cleanup: remove row names | ||
rownames(merged_data) = NULL | ||
# Re-write the column names to be the original column names from the original | ||
# dfs. | ||
colnames(merged_data) = unname(unlist(lapply(dfs, colnames))) | ||
|
||
merged_data | ||
} | ||
|
||
joint_draw_ecdf = function (data_list, N, ndim=length(data_list), | ||
sigma=NULL, rho=0, use_f = TRUE) { | ||
|
||
# We don't modify data_list, but this is useful to ensure the | ||
# argument is evaluated anyway | ||
force(ndim) | ||
|
||
# Error handling for N | ||
if(is.null(N) || is.na(N) || !is.atomic(N) || length(N) > 1 || N <= 0) { | ||
stop("N must be a single integer that is positive.") | ||
} | ||
|
||
# Error handling for rho, if specified | ||
if(is.null(sigma)) { | ||
if(is.atomic(rho) & length(rho)==1) { | ||
if(ndim>2 & rho<0) { | ||
stop("The correlation matrix must be positive semi-definite. Specifically, ", | ||
"if the number of variables being drawn from jointly is 3 or more, ", | ||
"then the correlation coefficient rho must be non-negative.") | ||
} | ||
|
||
if(rho == 0) { | ||
# Uncorrelated draw would be way faster; just sample each column | ||
return(lapply(seq_along(data_list), | ||
function(vn) { | ||
sample.int(length(data_list[[vn]]), N, replace=TRUE) | ||
})) | ||
} | ||
sigma = matrix(rho, ncol=ndim, nrow=ndim) | ||
diag(sigma) = 1 | ||
} else { | ||
stop("If specified, rho should be a single number") | ||
} | ||
} | ||
|
||
# Error handling for sigma | ||
if(ncol(sigma) != ndim | nrow(sigma) != ndim | any(diag(sigma) != 1)) { | ||
stop("The correlation matrix must be square, with the number of dimensions ", | ||
"equal to the number of dimensions you are drawing from. In addition, ", | ||
"the diagonal of the matrix must be equal to 1.") | ||
} | ||
|
||
# Can we use the fast package or are we stuck with the slow one? | ||
use_f = use_f && requireNamespace("mvnfast", quietly = TRUE) | ||
|
||
# Standard normal = all dimensions mean 0. | ||
mu = rep(0, ndim) | ||
|
||
# Possible options for the joint normal draw | ||
if(!use_f) { | ||
# Below code is a reimplementation of the operative parts of rmvnorm from | ||
# the mvtnorm package so that we don't induce a dependency | ||
|
||
# Right cholesky decomposition (i.e. LR = sigma s.t. L is lower triang, R | ||
# is upper triang.) | ||
right_chol = chol(sigma, pivot=TRUE) | ||
# Order columns by the pivot attribute -- induces numerical stability? | ||
right_chol = right_chol[, order(attr(right_chol, "pivot"))] | ||
# Generate standard normal data and right-multiply by decomposed matrix | ||
# with right_chol to make it correlated. | ||
correlated_sn <- matrix(rnorm(N * ndim), | ||
nrow = N) %*% right_chol | ||
|
||
} else { | ||
# Using mvnfast | ||
correlated_sn = mvnfast::rmvn(N, ncores = getOption("mc.cores", 2L), mu, sigma) | ||
} | ||
|
||
# Z-scores to quantiles | ||
quantiles = pnorm(correlated_sn) | ||
colnames(quantiles) = names(data_list) | ||
|
||
# Quantiles to inverse eCDF. | ||
result = lapply( | ||
seq_along(data_list), | ||
function(vn) { | ||
# What would the indices of the quantiles be if our data was ordered -- | ||
# if the answer is below 0, set it to 1. round will ensure the tie- | ||
# breaking behaviour is random with respect to outcomes | ||
ordered_indices = pmax(1, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this can be factored out of the lapply to a seperate sweep statement - |
||
round(quantiles[, vn] * length(data_list[[vn]])) | ||
) | ||
|
||
# Now get the order permutation vector and map that to the ordered indices | ||
# to get the indices in the original space | ||
indices = order(data_list[[vn]])[ordered_indices] | ||
}) | ||
|
||
|
||
# Set up response | ||
result | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍