Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upSugar function sample with unit tests #610
Conversation
Current coverage is 62.42% (diff: 100%)@@ master #610 diff @@
==========================================
Files 69 69
Lines 4798 4798
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 2994 2995 +1
+ Misses 1804 1803 -1
Partials 0 0
|
|
What you are doing to opening What will this mean for Christian's |
|
Skimming through |
|
Agree mostly -- the fact that @helmingstay (ie Christian) tucked his away carefully in an additional opt-in header file is good as will not clutter by default. User are more likely to find your proposed What I am currently (hey, barely into first coffee of the day) unsure about is whether we should mimick the results of R's |
|
To clarify, I am generally in favor of replicating R's behavior as much as possible, the current situation included. Most importantly I think it is what users typically expect; and of somewhat lesser significance, it makes writing unit tests a bit easier, as the objectives are more defined. My remark about slight algorithmic differences between what is used in RcppArmadillo (and similarly in For example, R's version doesn't actually perform sampling on different base::sample
# function (x, size, replace = FALSE, prob = NULL)
# ...
# else {
# if (missing(size))
# size <- length(x)
# x[sample.int(length(x), size, replace, prob)]
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# ... Similarly, this approach is used in RcppArmadillo: // ...
// copy the results into the return vector
for (ii=0; ii<size; ii++) {
jj = index(ii); // arma
ret[ii] = x[jj]; // templated
}
// ...On the other hand, for each of the index-generating subroutines in // ...
// `ref` is a Vector<RTYPE>
// `ians` is a Vector<RTYPE>::iterator
for ( ; ians != eans; ++ians) {
int j = static_cast<int>(n * unif_rand());
*ians = ref[x[j]];
// ^^^^^^^^^^^^^^^^^^
x[j] = x[--n];
}
// ...This of course required code duplication, but it seemed wasteful to calculate the indices and then make a separate pass to actually look up the result values. Regardless of the approach taken, I think the ultimate goal should be for the result of // [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector arma_sample(CharacterVector x, int sz, bool replace, NumericVector p) {
return RcppArmadillo::sample(x, sz, replace, p);
}
// [[Rcpp::export]]
CharacterVector rcpp_sample(CharacterVector x, int sz, bool replace, NumericVector p) {
return sample(x, sz, replace, p);
}# small sample, no replacement
set.seed(123); s1 <- arma_sample(letters, 10, FALSE, rep(1, 26))
set.seed(123); s2 <- rcpp_sample(letters, 10, FALSE, rep(1, 26))
set.seed(123); s3 <- sample(letters, 10, FALSE, rep(1, 26))
all.equal(s1, s3)
# [1] "10 string mismatches"
all.equal(s2, s3)
# [1] TRUE
# small sample, with replacement
set.seed(123); s1 <- arma_sample(letters, 10, TRUE, rep(1, 26))
set.seed(123); s2 <- rcpp_sample(letters, 10, TRUE, rep(1, 26))
set.seed(123); s3 <- sample(letters, 10, TRUE, rep(1, 26))
all.equal(s1, s3)
# [1] "10 string mismatches"
all.equal(s2, s3)
# [1] TRUE
# should use Walker's Alias method
x <- rep(letters, length.out = 1e4)
px <- rep(1.0, length(x))
set.seed(123); s1 <- arma_sample(x, 10, TRUE, px)
set.seed(123); s2 <- rcpp_sample(x, 10, TRUE, px)
set.seed(123); s3 <- sample(x, 10, TRUE, px)
all.equal(s1, s3)
# [1] "3 string mismatches"
all.equal(s2, s3)
# [1] TRUEI haven't investigated the full scope of the problem, but the RcppArmadillo version seems to be having some trouble with (some of) the versions that perform probabilistic sampling. |
|
@nathan-russell I think I am seeing a regression. Conrad sent me a test release I am currently checking with rev.deps -- using Rcpp from master as well. Package markovchain now goes pair-shaped with
If you have a moment, can you take a peek? Tests are running on Ubuntu 16.04 with g++-5.4.0. |
|
If this is one of these "intersections of |
|
@eddelbuettel Yes I think this is the offending file in markovchain. Do you mean doing something like this in the sugar #if !defined(Rcpp__sugar__sample_h) && !defined(COMPILING_RCPPARMADILLO)
#define Rcpp__sugar__sample_h
// rest of sugar/functions/sample.h as beforeEdit: Scratch that suggestion, I still get the error. |
|
Yes, the
is what we were afraid of. Question is how to move forward. Adding a very simple, very brute
at the top (and a matching Better ideas? |
|
I think that's a great idea, as it will only skip over |
|
I already tested it. I can commit straight -- less work for you. Let me know. But if you're in a playful mood we could add a |
|
Yes if you're ready to commit it then please do. While I do enjoy yelling at people, I think the |
|
Done in 07e045e. |
|
Awesome; thanks for catching this! |
|
Side benefit of the (roughly monthly) RcppArmadillo update. Sadly, that one threw up its own issue in code by Martin Vincent. But we'll get there. |
|
Another issue related to |
|
Martin has code in sglOptim he uses in two other packages; now it is I'm sure it's simple too; we'll get it fixed. |
|
@nathan-russell Another one, though I may be reading the tea leaves wrong. Context: Aforementioned tests of RcppArmadillo (as in master right now); failure with three packages by @vincent-dk which are interdependent. The first is sglOptim which (in GitHub) is currently at 1.3.5. We need that updated (1 of 3) to check his other two lsgl and msgl (2 and 3 of 3). Failure:
I think this also points to your
redefining |
|
Oh boy. I actually just found another couple of issues (on Windows only) -- one related to the inclusion of |
|
Awesome. I cannot make sense of the MinGW probing much appreciated too. |
|
Still trying to narrow things down here, but one of the build errors in // I must keep this because now some depedent packages rely on Rcpp namespace to be available
#include <Rcpp.h>
using namespace Rcpp;While I'm not precisely sure how this relates to
which is defined in |
|
Good work. I noticed that RcppProgress.h was pulled in, but didn't open it. What its author does there is almost criminal. Well, or comical. Depends on your mood. |
|
Now reported here at RcppProgress |
|
Getting back to how the |
|
And I guess it is in conjunction with the (forcefully) flattened namespace--else the R symbols would not bite. |
This PR adds a sugar function
sample, with overloads forbase::sample.int)base::sample)