New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quantile function is not available in sugar #967
Comments
Let's do it, maybe even for subsets of R functionality. Because in R there are nine (!!) ways to do it, see The R source is here: https://github.com/wch/r-source/blob/trunk/src/library/stats/R/quantile.R I guess we'd need to figure want parts you a) really need and b) which ones are actually worth redoing in C(++). |
Ok, I have type=7 coded and it is trivial (several hours of debugging still counts as trivial, right ;o) Three other functions are needed which are potential sugar additions: floor, ceiling and sort (unless I missed them somehow). However, my template foo is insufficient to sweeten those so I have simple workarounds builtin. Here is a minimal working example... library(Rcpp)
sourceCpp(code='
#include <Rcpp.h>
/*
// what am I doing wrong here?
template <typename T>
class Qfloor : public std::unary_function<T,T> {
public:
T operator()( T t) const { return std::floor(t) ; }
} ;
template <typename T>
class Qceiling : public std::unary_function<T,T> {
public:
T operator()( T t) const { return std::ceil(t) ; }
} ;
*/
Rcpp::IntegerVector Qfloor(Rcpp::NumericVector x) {
size_t N=x.size();
Rcpp::IntegerVector y(N);
for(size_t i=0; i<N; ++i) y[i]=std::floor(x[i]);
return y;
}
Rcpp::IntegerVector Qceiling(Rcpp::NumericVector x) {
size_t N=x.size();
Rcpp::IntegerVector y(N);
for(size_t i=0; i<N; ++i) y[i]=std::ceil(x[i]);
return y;
}
Rcpp::NumericVector Qsort(Rcpp::NumericVector x) {
Rcpp::NumericVector y = Rcpp::clone(x);
std::sort(y.begin(), y.end());
return y;
}
// [[Rcpp::export]]
Rcpp::NumericVector Quantile(Rcpp::NumericVector x,
Rcpp::NumericVector probs) {
// implementation of type 7
const int n=x.size(), np=probs.size();
if(n==0) return x;
if(np==0) return probs;
/*
int _np=_probs.size(), np;
np = _np==0 ? 5 : _np;
Rcpp::NumericVector probs(np);
if(_np==0) {
probs[0]=0.00; probs[1]=0.25; probs[2]=0.50; probs[3]=0.75; probs[5]=1.00;
}
Rcpp::IntegerVector lo(Rcpp::sapply(index, Qfloor<double>() )),
hi(Rcpp::sapply(index, Qceiling<double>() ));
*/
Rcpp::NumericVector index=(n-1.)*probs, y=Qsort(x), x_hi(np), qs(np);
Rcpp::IntegerVector lo(Qfloor(index)), hi(Qceiling(index));
for(size_t i=0; i<np; ++i) {
qs[i]=y[lo[i]];
x_hi[i]=y[hi[i]];
if((index[i]>lo[i]) && (x_hi[i] != qs[i])) {
double h;
h=index[i]-lo[i];
qs[i]=(1.-h)*qs[i]+h*x_hi[i];
}
}
return qs;
//return Rcpp::wrap(qs);
}
')
.quantile <-
function(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE,
type = 7, ...)
{
if(is.factor(x)) {
if(is.ordered(x)) {
if(!any(type == c(1L, 3L)))
stop("'type' must be 1 or 3 for ordered factors")
} else
stop("factors are not allowed")
lx <- levels(x)
} else lx <- NULL
if (na.rm)
x <- x[!is.na(x)]
else if (anyNA(x))
stop("missing values and NaN's not allowed if 'na.rm' is FALSE")
eps <- 100*.Machine$double.eps
if (any((p.ok <- !is.na(probs)) & (probs < -eps | probs > 1+eps)))
stop("'probs' outside [0,1]")
n <- length(x)
if(na.p <- any(!p.ok)) { # set aside NA & NaN
o.pr <- probs
probs <- probs[p.ok]
probs <- pmax(0, pmin(1, probs)) # allow for slight overshoot
}
np <- length(probs)
if (n > 0 && np > 0) {
if(type == 7) { # be completely back-compatible
index <- 1 + (n - 1) * probs
lo <- floor(index)
hi <- ceiling(index)
x <- sort(x, partial = unique(c(lo, hi)))
qs <- x[lo]
i <- which(index > lo & x[hi] != qs) # '!=' for '>' working w/ complex
h <- (index - lo)[i] # > 0 by construction
## print(c(h=h))
## qs[i] <- qs[i] + .minus(x[hi[i]], x[lo[i]]) * (index[i] - lo[i])
## qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
qs[i] <- (1 - h) * qs[i] + h * x[hi[i]]
} else {
if (type <= 3) {
## Types 1, 2 and 3 are discontinuous sample qs.
nppm <- if (type == 3) n * probs - .5 # n * probs + m; m = -0.5
else n * probs # m = 0
j <- floor(nppm)
h <- switch(type,
(nppm > j), # type 1
((nppm > j) + 1)/2, # type 2
(nppm != j) | ((j %% 2L) == 1L)) # type 3
} else {
## Types 4 through 9 are continuous sample qs.
switch(type - 3,
{a <- 0; b <- 1}, # type 4
a <- b <- 0.5, # type 5
a <- b <- 0, # type 6
a <- b <- 1, # type 7 (unused here)
a <- b <- 1 / 3, # type 8
a <- b <- 3 / 8) # type 9
## need to watch for rounding errors here
fuzz <- 4 * .Machine$double.eps
nppm <- a + probs * (n + 1 - a - b) # n*probs + m
j <- floor(nppm + fuzz) # m = a + probs*(1 - a - b)
h <- nppm - j
if(any(sml <- abs(h) < fuzz)) h[sml] <- 0
}
x <- sort(x, partial =
unique(c(1, j[j>0L & j<=n], (j+1)[j>0L & j<n], n))
)
x <- c(x[1L], x[1L], x, x[n], x[n])
## h can be zero or one (types 1 to 3), and infinities matter
#### qs <- (1 - h) * x[j + 2] + h * x[j + 3]
## also h*x might be invalid ... e.g. Dates and ordered factors
qs <- x[j+2L]
qs[h == 1] <- x[j+3L][h == 1]
other <- (0 < h) & (h < 1) & (x[j+2L] != x[j+3L]) # '!=' for '<' in complex case
if(any(other)) qs[other] <- ((1-h)*x[j+2L] + h*x[j+3L])[other]
}
} else {
qs <- rep(NA_real_, np)
}
if(is.character(lx))
qs <- factor(qs, levels = seq_along(lx), labels = lx, ordered = TRUE)
if(names && np > 0L) {
##names(qs) <- format_perc(probs)
}
if(na.p) { # do this more elegantly (?!)
o.pr[p.ok] <- qs
names(o.pr) <- rep("", length(o.pr)) # suppress <NA> names
names(o.pr)[p.ok] <- names(qs)
o.pr
} else qs
}
N=100
P=10
set.seed(21)
x=c(matrix(runif(N/P), nrow=N/P, ncol=P))
A=quantile(x, (19:21)/100) ## R
B=.quantile(x, (19:21)/100) ## quantile.default
C=Quantile(x, (19:21)/100) ## Rcpp
all(A==B)
all(A==C)
all(B==C)
cbind(A, B, C) |
Use:
|
BTW R> Rcpp::cppFunction("List foo(NumericVector x) { return List::create(floor(x), ceil(x), x.sort()); }")
R> foo(c(1.23, 4.56, 2.34))
[[1]]
[1] 1 2 4
[[2]]
[1] 2 3 5
[[3]]
[1] 1.23 2.34 4.56
R> |
Or maybe not. Seems for focussed on running (moving) quantiles. |
@rsparapa here is a shorter version of what you did, taking advantage of the sugar functions: #include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector Quantile(Rcpp::NumericVector x, Rcpp::NumericVector probs) {
// implementation of type 7
const size_t n=x.size(), np=probs.size();
if (n==0) return x;
if (np==0) return probs;
Rcpp::NumericVector index = (n-1.)*probs, y=x.sort(), x_hi(np), qs(np);
Rcpp::NumericVector lo = Rcpp::floor(index), hi = Rcpp::ceiling(index);
for (size_t i=0; i<np; ++i) {
qs[i] = y[lo[i]];
x_hi[i] = y[hi[i]];
if ((index[i]>lo[i]) && (x_hi[i] != qs[i])) {
double h;
h = index[i]-lo[i];
qs[i] = (1.-h)*qs[i] + h*x_hi[i];
}
}
return qs;
}
/*** R
N <- 100
P <- 10
set.seed(21)
x <- matrix(runif(N/P), nrow=N/P, ncol=P)
A <- quantile(x, (19:21)/100) ## R
B <- Quantile(x, (19:21)/100) ## Rcpp
all(A==B)
*/ Which, thanks to cute R> Rcpp::sourceCpp("/tmp/quantile.cpp")
R> N <- 100
R> P <- 10
R> set.seed(21)
R> x <- matrix(runif(N/P), nrow=N/P, ncol=P)
R> A <- quantile(x, (19:21)/100) ## R
R> B <- Quantile(x, (19:21)/100) ## Rcpp
R> all(A==B)
[1] TRUE
R> |
Stills needs a char vector with names though to be fully comparable to the R version. |
Ok, I've almost got it now. But, how can we add the % to the end of the names?
Which produces...
|
Off the cuff, I'd say work with C++ Else of course C char vectors, proper sizing, |
Equally quick and dirty: qs_names[i] = std::to_string(static_cast<int>(probs[i]*100)) + std::string("%"); This truncates. Do you need / want |
I forgot: R> Rcpp::sourceCpp("/tmp/rodney.cpp")
R> N <- 100
R> P <- 10
R> set.seed(21)
R> x <- matrix(runif(N/P), nrow=N/P, ncol=P)
R> A <- quantile(x, (19:21)/100) ## R
R> B <- Quantile(x, (19:21)/100) ## Rcpp
R> all(A==B)
[1] TRUE
R> names(A)
[1] "19%" "20%" "21%"
R> names(B)
[1] "19%" "20%" "21%"
R> |
C++11 is the lowest I go since vector is so useful. Admittedly, round() I had not considered. But, I see that R doesn't seem to have an obviously logical shortening...
|
As for the digits, I (personally) would not obsess over matching R here. When I used |
I am sure that you right about C++11 and std::vector being around for awhile. I meant support for std::vector in R packages which requires C++11 support, right? |
No. We used (We interfaced it in a brutish way into R, but all that is history ... and gave to rise to Rcpp. So all good). |
Oh, I was misinformed ;o) So, getting back to rounding. I'm not sure how to do this. Is the round function available with sugar? I don't see it, but I couldn't find floor or ceiling so I'm obviously not looking in the right places. If it is in sugar, then I think this is easy. If not, then should we just add round to sugar too? |
But of course it is :) See below: R> Rcpp::cppFunction("NumericVector myround(NumericVector x, int d) { return(round(x, d)); }")
R> myround(1.23456, 2)
[1] 1.23
R> Searching for functions is still ... suboptimal. The Doxygen interface is searchable, grep'ing in the tests works etc pp. |
Ok, I've got it now...
Yielding...
|
In hindsight, the name attribute would be easier to create if the R function format() existed in Rcpp. Does it? I don't see it in the unitTests. |
I have not had time to look closely at what you do there would |
I never thought you would have recommended C ;o) I'm not familiar with sprintf, but I'm trying to wrap my head around the man page now... |
Before trying sprintf(), I just ran what I had above. That was working, but now I'm seeing this error...
|
|
Ah, got it! Thanks Kevin. Fixed above: why it worked before is a mystery. But, I just don't grok sprintf. I have never read a more confusing man page. They could not have come up with more inscrutable formats if they tried. And what the hell is "char *restrict s"? From the following lines, I get a crash...
|
You need to allocate the memory to > cppFunction('std::string f() { char v[32]; sprintf(v, "foo%-8f", 1.234); return std::string(v); }')
> f()
[1] "foo1.234000"
> |
Aarghh!!! I don't think sprintf works here. What you need to have is a format like this "%1$*2.*3$f" or this "%1$*2$.$*3$f". However, neither appear to be supported.
which yields...
|
I would try to simplify things a little. Remember that The main thing appears to be solved: you have your quantiles calculation. So the final step of finishing off names is just a nice to have detail, no? |
Sorry, I was just ranting about C. Thank god we have C++! Got it now...
... abracadabra ...
|
This issue is stale (365 days without activity) and will be closed in 31 days unless new activity is seen. Please feel free to re-open it is still a concern, possibly with additional data. |
Issue implemented as shown above |
First of all, Rcpp is amazing! After 10 years of working with it, I'm finally capable enough to make most of my wishes come true. One sugar function that is an obvious omission is quantile. I realize that we can't have every single function provided by R within Rcpp. But, this seems like an obvious low-hanging fruit. If someone could give me some pointers on what the best way to implement it is, then I would be happy to contribute a version to Rcpp. Thanks
The text was updated successfully, but these errors were encountered: