Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling speed improvements; continued coverage improvements #24

Merged
merged 8 commits into from
Oct 31, 2017

Conversation

aaronrudkin
Copy link
Contributor

This PR encapsulates the last week or so of work, with an emphasis on the speed improvements for the multi-level bootstrap. I will leave this open for a brief code review -- if you want to tag suggestions or ideas below, I will merge the PR if everyone is happy and then add the suggestions as the first commits on the next feature branch I open.

Copy link
Contributor

@nfultz nfultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -35,12 +35,13 @@ resample_data = function(data, N, ID_labels=NULL) {
.resample_data_internal(data, N, ID_labels)
}

.resample_data_internal = function(data, N, ID_labels=NULL, outer_level=1, use_dt = 0) {
.resample_data_internal = function(data, N, ID_labels=NULL, outer_level=1, use_dt = NA) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the dot prefix, it isn't exported in NAMESPACE anyway.

rmarkdown
rmarkdown,
data.table
FasterWith: data.table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop the FasterWith

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recognize this one is not necessary; but I thought it would be handy to signal the nature of the suggestion (testthat, rmarkdown, etc. seem obviously about dev/building the package by definition) -- upon reading the information about DESCRIPTION files it doesn't sound like it's a problem to add arbitrary fields. What does everyone think about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably a trade-off between "hey, the package devs are trying to tell me something" and "hey, the package devs have created a non-standard field that I need to think about"

"At the top level, ",
ifelse(!is.null(ID_label),
paste0(ID_label, ", "),
""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check null in this case, eg

> foo <- NULL
> stop("Foo is ", foo)
Error: Foo is 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mostly about making clean, legible text. As-is the error was "At the top level, , you must provide..." and I thought the , , section was ugly. By ifelsing here, we avoid the grammar problem.

"If you provide more than one ID_labels to resample data for multilevel data, please provide a vector for N of the same length representing the number to resample at each level."
)
}
.resample_data_internal = function(data, N, ID_labels=NULL, outer_level=1, use_dt = NA) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use TRUE/FALSE for use_dt since it's an internal variable anyway.

use_dt = 1
} else {
use_dt = 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_dt <- use_dt || requireNamespace("data.table", quietly=T)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't seem like this would work if we want a unit test to be able to explicitly override the value, no? If the unit test sets use_dt = FALSE in the argument, then the or operator is going to evaluate to FALSE || requireNamespace("data.table", quietly=T) which should evaluate to true.

Now, I could do this:
use_dt = ifelse(is.na(use_dt), requireNamespace("data.table", quietly=T), use_dt)

But it's not clear if speed or legibility benefit from this?

# of N units by row.
if (missing(N) & is.null(ID_labels)) {
return(bootstrap_single_level(data, dim(data)[1], ID_label=NULL))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this to the outer function resample_data() above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the error checking can be on the outside function, and the inner function just does the real work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you don't really need the outer argument here

# OK, if not, we need to recurse

# Split indices of data frame by the thing we're strapping on
split_data_on_boot_id = split(seq_len(dim(data)[1]), data[,ID_labels[1]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data[[ID_labels[1]]]) will be slightly stricter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nfultz
Copy link
Contributor

nfultz commented Oct 31, 2017 via email

@graemeblair
Copy link
Member

This looks great; merging now. Thanks!

@graemeblair graemeblair merged commit ca85333 into master Oct 31, 2017
@graemeblair graemeblair deleted the profile_and_coverage branch October 31, 2017 20:22
@coveralls
Copy link

Coverage Status

Coverage increased (+1.5%) to 95.778% when pulling 910d51b on profile_and_coverage into 38a2f02 on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+1.5%) to 95.778% when pulling 910d51b on profile_and_coverage into 38a2f02 on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants