Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantile function is not available in sugar #967

Closed
rsparapa opened this issue Jun 4, 2019 · 30 comments
Closed

quantile function is not available in sugar #967

rsparapa opened this issue Jun 4, 2019 · 30 comments

Comments

@rsparapa
Copy link

rsparapa commented Jun 4, 2019

First of all, Rcpp is amazing! After 10 years of working with it, I'm finally capable enough to make most of my wishes come true. One sugar function that is an obvious omission is quantile. I realize that we can't have every single function provided by R within Rcpp. But, this seems like an obvious low-hanging fruit. If someone could give me some pointers on what the best way to implement it is, then I would be happy to contribute a version to Rcpp. Thanks

@eddelbuettel
Copy link
Member

Let's do it, maybe even for subsets of R functionality. Because in R there are nine (!!) ways to do it, see help(quantile) and look for type.

The R source is here: https://github.com/wch/r-source/blob/trunk/src/library/stats/R/quantile.R I guess we'd need to figure want parts you a) really need and b) which ones are actually worth redoing in C(++).

@rsparapa
Copy link
Author

rsparapa commented Jun 5, 2019

Ok, I have type=7 coded and it is trivial (several hours of debugging still counts as trivial, right ;o) Three other functions are needed which are potential sugar additions: floor, ceiling and sort (unless I missed them somehow). However, my template foo is insufficient to sweeten those so I have simple workarounds builtin. Here is a minimal working example...

library(Rcpp)

sourceCpp(code='
#include <Rcpp.h>

/*
// what am I doing wrong here?
template <typename T>
class Qfloor : public std::unary_function<T,T> {
public:
T operator()( T t) const { return std::floor(t) ; }
} ;

template <typename T>
class Qceiling : public std::unary_function<T,T> {
public:
T operator()( T t) const { return std::ceil(t) ; }
} ;
*/

Rcpp::IntegerVector Qfloor(Rcpp::NumericVector x) {
  size_t N=x.size();
  Rcpp::IntegerVector y(N);
  for(size_t i=0; i<N; ++i) y[i]=std::floor(x[i]);
  return y;
}

Rcpp::IntegerVector Qceiling(Rcpp::NumericVector x) {
  size_t N=x.size();
  Rcpp::IntegerVector y(N);
  for(size_t i=0; i<N; ++i) y[i]=std::ceil(x[i]);
  return y;
}

Rcpp::NumericVector Qsort(Rcpp::NumericVector x) {
  Rcpp::NumericVector y = Rcpp::clone(x);
  std::sort(y.begin(), y.end());
  return y;
}

// [[Rcpp::export]]
Rcpp::NumericVector Quantile(Rcpp::NumericVector x,
                             Rcpp::NumericVector probs) {
  // implementation of type 7
  const int n=x.size(), np=probs.size();
  if(n==0) return x;
  if(np==0) return probs;
  /*
    int _np=_probs.size(), np;
    np = _np==0 ? 5 : _np;
    Rcpp::NumericVector probs(np);
    if(_np==0) {
    probs[0]=0.00; probs[1]=0.25; probs[2]=0.50; probs[3]=0.75; probs[5]=1.00;
    }
    Rcpp::IntegerVector lo(Rcpp::sapply(index, Qfloor<double>() )), 
    hi(Rcpp::sapply(index, Qceiling<double>() )); 
  */
  Rcpp::NumericVector index=(n-1.)*probs, y=Qsort(x), x_hi(np), qs(np);
  Rcpp::IntegerVector lo(Qfloor(index)), hi(Qceiling(index));

  for(size_t i=0; i<np; ++i) {
    qs[i]=y[lo[i]];
    x_hi[i]=y[hi[i]];
    if((index[i]>lo[i]) && (x_hi[i] != qs[i])) {
      double h;
      h=index[i]-lo[i];
      qs[i]=(1.-h)*qs[i]+h*x_hi[i];
    }
  }

  return qs;
  //return Rcpp::wrap(qs);
}
')

.quantile <-
    function(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE,
             type = 7, ...)
{
    if(is.factor(x)) {
	if(is.ordered(x)) {
	   if(!any(type == c(1L, 3L)))
	       stop("'type' must be 1 or 3 for ordered factors")
	} else
            stop("factors are not allowed")
        lx <- levels(x)
    } else lx <- NULL
    if (na.rm)
	x <- x[!is.na(x)]
    else if (anyNA(x))
	stop("missing values and NaN's not allowed if 'na.rm' is FALSE")
    eps <- 100*.Machine$double.eps
    if (any((p.ok <- !is.na(probs)) & (probs < -eps | probs > 1+eps)))
	stop("'probs' outside [0,1]")
    n <- length(x)
    if(na.p <- any(!p.ok)) { # set aside NA & NaN
        o.pr <- probs
        probs <- probs[p.ok]
        probs <- pmax(0, pmin(1, probs)) # allow for slight overshoot
    }
    np <- length(probs)
    if (n > 0 && np > 0) {
        if(type == 7) { # be completely back-compatible
            index <- 1 + (n - 1) * probs
            lo <- floor(index)
            hi <- ceiling(index)
            x <- sort(x, partial = unique(c(lo, hi)))
            qs <- x[lo]
	    i <- which(index > lo & x[hi] != qs) # '!=' for '>' working w/ complex
	    h <- (index - lo)[i] # > 0	by construction
            ## print(c(h=h))
##	    qs[i] <- qs[i] + .minus(x[hi[i]], x[lo[i]]) * (index[i] - lo[i])
##	    qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
	    qs[i] <- (1 - h) * qs[i] + h * x[hi[i]]
        } else {
            if (type <= 3) {
                ## Types 1, 2 and 3 are discontinuous sample qs.
                nppm <- if (type == 3) n * probs - .5 # n * probs + m; m = -0.5
                else n * probs          # m = 0
                j <- floor(nppm)
		h <- switch(type,
			    (nppm > j),		# type 1
			    ((nppm > j) + 1)/2, # type 2
			    (nppm != j) | ((j %% 2L) == 1L)) # type 3
            } else {
                ## Types 4 through 9 are continuous sample qs.
                switch(type - 3,
                   {a <- 0; b <- 1},    # type 4
                       a <- b <- 0.5,   # type 5
                       a <- b <- 0,     # type 6
                       a <- b <- 1,     # type 7 (unused here)
                       a <- b <- 1 / 3, # type 8
                       a <- b <- 3 / 8) # type 9
                ## need to watch for rounding errors here
                fuzz <- 4 * .Machine$double.eps
                nppm <- a + probs * (n + 1 - a - b) # n*probs + m
                j <- floor(nppm + fuzz) # m = a + probs*(1 - a - b)
                h <- nppm - j
                if(any(sml <- abs(h) < fuzz)) h[sml] <- 0
            }
            x <- sort(x, partial =
                      unique(c(1, j[j>0L & j<=n], (j+1)[j>0L & j<n], n))
                      )
            x <- c(x[1L], x[1L], x, x[n], x[n])
            ## h can be zero or one (types 1 to 3), and infinities matter
####        qs <- (1 - h) * x[j + 2] + h * x[j + 3]
            ## also h*x might be invalid ... e.g. Dates and ordered factors
            qs <- x[j+2L]
            qs[h == 1] <- x[j+3L][h == 1]
	    other <- (0 < h) & (h < 1) & (x[j+2L] != x[j+3L]) # '!=' for '<' in complex case
            if(any(other)) qs[other] <- ((1-h)*x[j+2L] + h*x[j+3L])[other]
        }
    } else {
	qs <- rep(NA_real_, np)
    }
    if(is.character(lx))
        qs <- factor(qs, levels = seq_along(lx), labels = lx, ordered = TRUE)
    if(names && np > 0L) {
	##names(qs) <- format_perc(probs)
    }
    if(na.p) { # do this more elegantly (?!)
        o.pr[p.ok] <- qs
        names(o.pr) <- rep("", length(o.pr)) # suppress <NA> names
        names(o.pr)[p.ok] <- names(qs)
        o.pr
    } else qs
}

N=100
P=10
set.seed(21)
x=c(matrix(runif(N/P), nrow=N/P, ncol=P))
A=quantile(x, (19:21)/100)  ## R
B=.quantile(x, (19:21)/100) ## quantile.default
C=Quantile(x, (19:21)/100)  ## Rcpp
all(A==B)
all(A==C)
all(B==C)
cbind(A, B, C)

@coatless
Copy link
Contributor

coatless commented Jun 5, 2019

Use:


```cpp
// your c++ code here

```

@eddelbuettel
Copy link
Member

BTW floor and ceil exist in Rcpp sugar, sort() is (just to be different) a member function:

R> Rcpp::cppFunction("List foo(NumericVector x) { return List::create(floor(x), ceil(x), x.sort()); }")
R> foo(c(1.23, 4.56, 2.34))
[[1]]
[1] 1 2 4

[[2]]
[1] 2 3 5

[[3]]
[1] 1.23 2.34 4.56

R> 

@eddelbuettel
Copy link
Member

eddelbuettel commented Jun 5, 2019

And a little googling got me this too: http://systematicinvestor.github.io/Run-Quantile-Rcpp

Or maybe not. Seems for focussed on running (moving) quantiles.

@eddelbuettel
Copy link
Member

@rsparapa here is a shorter version of what you did, taking advantage of the sugar functions:

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericVector Quantile(Rcpp::NumericVector x, Rcpp::NumericVector probs) {
  // implementation of type 7
  const size_t n=x.size(), np=probs.size();
  if (n==0) return x;
  if (np==0) return probs;
  Rcpp::NumericVector index = (n-1.)*probs, y=x.sort(), x_hi(np), qs(np);
  Rcpp::NumericVector lo = Rcpp::floor(index), hi = Rcpp::ceiling(index);

  for (size_t i=0; i<np; ++i) {
    qs[i] = y[lo[i]];
    x_hi[i] = y[hi[i]];
    if ((index[i]>lo[i]) && (x_hi[i] != qs[i])) {
      double h;
      h = index[i]-lo[i];
      qs[i] = (1.-h)*qs[i] + h*x_hi[i];
    }
  }
  return qs;
}

/*** R
N <- 100
P <- 10
set.seed(21)
x <- matrix(runif(N/P), nrow=N/P, ncol=P)
A <- quantile(x, (19:21)/100)  ## R
B <- Quantile(x, (19:21)/100)  ## Rcpp
all(A==B)
*/

Which, thanks to cute /**** R regexp at the end, even runs the demo when built:

R> Rcpp::sourceCpp("/tmp/quantile.cpp")

R> N <- 100

R> P <- 10

R> set.seed(21)

R> x <- matrix(runif(N/P), nrow=N/P, ncol=P)

R> A <- quantile(x, (19:21)/100)  ## R

R> B <- Quantile(x, (19:21)/100)  ## Rcpp

R> all(A==B)
[1] TRUE
R>

@eddelbuettel
Copy link
Member

Stills needs a char vector with names though to be fully comparable to the R version.

@rsparapa
Copy link
Author

rsparapa commented Jun 22, 2019

Ok, I've almost got it now. But, how can we add the % to the end of the names?

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericVector Quantile(Rcpp::NumericVector x, Rcpp::NumericVector probs) {
  // implementation of type 7
  const size_t n=x.size(), np=probs.size();
  if (n==0) return x;
  if (np==0) return probs;
  Rcpp::NumericVector index = (n-1.)*probs, y=x.sort(), x_hi(np), qs(np);
  Rcpp::NumericVector lo = Rcpp::floor(index), hi = Rcpp::ceiling(index);
  Rcpp::CharacterVector qs_names(np);

  for (size_t i=0; i<np; ++i) {
    size_t j;
    qs_names[i] = probs[i]*100.;
    j=qs_names[i].size();
    //j++;
    //qs_names[i].reserve(j);
    //qs_names[i][j]='%'; 
    qs[i] = y[lo[i]];
    x_hi[i] = y[hi[i]];
    if ((index[i]>lo[i]) && (x_hi[i] != qs[i])) {
      double h;
      h = index[i]-lo[i];
      qs[i] = (1.-h)*qs[i] + h*x_hi[i];
    }
  }

  qs.names()=qs_names;
  return qs;
}

/*** R
N <- 100
P <- 10
set.seed(21)
x <- matrix(runif(N/P), nrow=N/P, ncol=P)
A <- quantile(x, (19:21)/100)  ## R
B <- Quantile(x, (19:21)/100)  ## Rcpp
all(A==B)
names(A)
names(B)
*/

Which produces...

> all(A==B)
[1] TRUE

> names(A)
[1] "19%" "20%" "21%"

> names(B)
[1] "19" "20" "21"

@eddelbuettel
Copy link
Member

eddelbuettel commented Jun 22, 2019

Off the cuff, I'd say work with C++ std::string which has a nice + operator.

Else of course C char vectors, proper sizing, concat or sprintf etc pp.

@eddelbuettel
Copy link
Member

Equally quick and dirty:

    qs_names[i] = std::to_string(static_cast<int>(probs[i]*100)) + std::string("%");

This truncates. Do you need / want round() instead? Also, this may imply C++11 which you may or may not want as a minimal level. There are C++98 alternatives to to_string ....

@eddelbuettel
Copy link
Member

I forgot:

R> Rcpp::sourceCpp("/tmp/rodney.cpp")

R> N <- 100

R> P <- 10

R> set.seed(21)

R> x <- matrix(runif(N/P), nrow=N/P, ncol=P)

R> A <- quantile(x, (19:21)/100)  ## R

R> B <- Quantile(x, (19:21)/100)  ## Rcpp

R> all(A==B)
[1] TRUE

R> names(A)
[1] "19%" "20%" "21%"

R> names(B)
[1] "19%" "20%" "21%"
R> 

@rsparapa
Copy link
Author

rsparapa commented Jun 22, 2019

C++11 is the lowest I go since vector is so useful. Admittedly, round() I had not considered. But, I see that R doesn't seem to have an obviously logical shortening...

> names(quantile(pi, probs=pi/100))
[1] "3.141593%"

@eddelbuettel
Copy link
Member

std::vector is a more than a decade older than C++11 :) But yes, I personally have issues with requiring C++11 or newer now that CRAN supports it. Some packages now require C++14 ...

As for the digits, I (personally) would not obsess over matching R here. When I used to_string on a double it have too many digits but you always pick a substring or do something else.

@rsparapa
Copy link
Author

I am sure that you right about C++11 and std::vector being around for awhile. I meant support for std::vector in R packages which requires C++11 support, right?

@eddelbuettel
Copy link
Member

eddelbuettel commented Jun 22, 2019

No. We used std::vector<double> in RQuantLib in 2002. On CRAN. I think I first saw it (via my advisor) about a decade earlier when the STL was still an SGI project (long before the HP acquisition). It did become available in g++ and visual c++ during the 1990s.

(We interfaced it in a brutish way into R, but all that is history ... and gave to rise to Rcpp. So all good).

@rsparapa
Copy link
Author

Oh, I was misinformed ;o) So, getting back to rounding. I'm not sure how to do this. Is the round function available with sugar? I don't see it, but I couldn't find floor or ceiling so I'm obviously not looking in the right places. If it is in sugar, then I think this is easy. If not, then should we just add round to sugar too?

@eddelbuettel
Copy link
Member

eddelbuettel commented Jun 22, 2019

But of course it is :) See below:

R> Rcpp::cppFunction("NumericVector myround(NumericVector x, int d) { return(round(x, d)); }")
R> myround(1.23456, 2)
[1] 1.23
R> 

Searching for functions is still ... suboptimal. The Doxygen interface is searchable, grep'ing in the tests works etc pp.

@rsparapa
Copy link
Author

rsparapa commented Jun 22, 2019

Ok, I've got it now...

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::NumericVector Quantile(Rcpp::NumericVector x, Rcpp::NumericVector probs) {
  // implementation of type 7
  const size_t n=x.size(), np=probs.size();
  if (n==0) return x;
  if (np==0) return probs;
  Rcpp::NumericVector index = (n-1.)*probs, y=x.sort(), x_hi(np), qs(np);
  Rcpp::NumericVector lo = Rcpp::floor(index), hi = Rcpp::ceiling(index);
  Rcpp::NumericVector probs_names=Rcpp::round(probs*100., 6);
  Rcpp::CharacterVector qs_names(np);
  std::string _name, zero=".000000";

  for (size_t i=0; i<np; ++i) {
    _name = std::to_string(static_cast<double>(probs_names[i]));
    if(_name=="100.000000") _name=std::string("100");
    else if(_name.substr(1)==zero) _name=_name.substr(0, 1);
    else if(_name.substr(2)==zero) _name=_name.substr(0, 2);
    else if(_name.substr(3)==zero.substr(2)) _name=_name.substr(0, 3);
    else if(_name.substr(4)==zero.substr(3)) _name=_name.substr(0, 4);
    else if(_name.substr(5)==zero.substr(4)) _name=_name.substr(0, 5);
    else if(_name.substr(6)==zero.substr(5)) _name=_name.substr(0, 6);
    else if(_name.substr(7)==zero.substr(6)) _name=_name.substr(0, 7);
    else if(_name.substr(4)==zero.substr(2)) _name=_name.substr(0, 4);
    else if(_name.substr(5)==zero.substr(3)) _name=_name.substr(0, 5);
    else if(_name.substr(6)==zero.substr(4)) _name=_name.substr(0, 6);
    else if(_name.substr(7)==zero.substr(5)) _name=_name.substr(0, 7);
    else if(_name.substr(8)==zero.substr(6)) _name=_name.substr(0, 8);
/*  // why does this not work?
    else {
      for(size_t j=3; j<8; ++j) {
	if(_name.substr(j)==zero.substr(j-1)) _name=_name.substr(0, j);
	else if(_name.substr(j+1)==zero.substr(j-1)) _name=_name.substr(0, j+1);
      }
    }
*/
    qs_names[i] = _name + std::string("%");
    qs[i] = y[lo[i]];
    x_hi[i] = y[hi[i]];
    if ((index[i]>lo[i]) && (x_hi[i] != qs[i])) {
      double h;
      h = index[i]-lo[i];
      qs[i] = (1.-h)*qs[i] + h*x_hi[i];
    }
  }

  qs.names()=qs_names;
  return qs;
}

/*** R
N <- 100
P <- 10
set.seed(21)
x <- matrix(runif(N/P), nrow=N/P, ncol=P)
A <- quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## R
B <- Quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## Rcpp
all(A==B)
all(names(A)==names(B))
names(A)
names(B)
*/

Yielding...

> A <- quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## R

> B <- Quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## Rcpp

> all(A==B)
[1] TRUE

> all(names(A)==names(B))
[1] TRUE

> names(A)
 [1] "0%"        "1.9%"      "1.99999%"  "3.141593%" "18.9%"     "18.99999%"
 [7] "19%"       "20%"       "21%"       "100%"     

> names(B)
 [1] "0%"        "1.9%"      "1.99999%"  "3.141593%" "18.9%"     "18.99999%"
 [7] "19%"       "20%"       "21%"       "100%"     

@rsparapa
Copy link
Author

rsparapa commented Jul 9, 2019

In hindsight, the name attribute would be easier to create if the R function format() existed in Rcpp. Does it? I don't see it in the unitTests.

@eddelbuettel
Copy link
Member

eddelbuettel commented Jul 9, 2019

I have not had time to look closely at what you do there would sprintf() not be a candidate?

@rsparapa
Copy link
Author

rsparapa commented Jul 9, 2019

I never thought you would have recommended C ;o) I'm not familiar with sprintf, but I'm trying to wrap my head around the man page now...

@rsparapa
Copy link
Author

Before trying sprintf(), I just ran what I had above. That was working, but now I'm seeing this error...

> Rcpp::sourceCpp("~/quantile.cpp")
quantile.cpp: In function ‘Rcpp::NumericVector Quantile(Rcpp::NumericVector, Rcpp::NumericVector)’:
quantile.cpp:17:13: error: ‘to_string’ is not a member of ‘std’
     _name = std::to_string(static_cast<double>(probs_names[i]));
             ^
make: *** [quantile.o] Error 1
g++  -I"/usr/local/R/3.5.0/lib64/R/include" -DNDEBUG   -I"/usr/local/R/3.5.0/lib64/R/library/Rcpp/include" -I"/home/rsparapa" -I/usr/local/include   -fpic  -g -O2 -mtune=native  -c quantile.cpp -o quantile.o
Error in Rcpp::sourceCpp("~/quantile.cpp") : 
  Error 1 occurred building shared library.

@kevinushey
Copy link
Contributor

std::to_string() is only available with C++11 and above, so you'll have to instruct the compiler to use that standard.

@rsparapa
Copy link
Author

Ah, got it! Thanks Kevin. Fixed above: why it worked before is a mystery. But, I just don't grok sprintf. I have never read a more confusing man page. They could not have come up with more inscrutable formats if they tried. And what the hell is "char *restrict s"? From the following lines, I get a crash...

char *s;
sprintf(s, "%-10f", static_cast<double>(probs_names[0]));
Program terminated with signal 11, Segmentation fault.
#0  0x00007f38f1de381c in __GI__IO_str_overflow (fp=0x7ffe22601cd0, c=48) at strops.c:146
146	    *fp->_IO_write_ptr++ = (unsigned char) c;

@eddelbuettel
Copy link
Member

You need to allocate the memory to sprintf into first:

> cppFunction('std::string f() { char v[32]; sprintf(v, "foo%-8f", 1.234); return std::string(v); }')
> f()
[1] "foo1.234000"
> 

@rsparapa
Copy link
Author

rsparapa commented Jul 11, 2019

Aarghh!!! I don't think sprintf works here. What you need to have is a format like this "%1$*2.*3$f" or this "%1$*2$.$*3$f". However, neither appear to be supported.

#include <Rcpp.h>
#include <stdio.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::NumericVector Quantile(Rcpp::NumericVector x, Rcpp::NumericVector probs) {
  // implementation of type 7
  const size_t n=x.size(), np=probs.size();
  if (n==0) return x;
  if (np==0) return probs;
  Rcpp::NumericVector index = (n-1.)*probs, y=x.sort(), x_hi(np), qs(np);
  Rcpp::NumericVector lo = Rcpp::floor(index), hi = Rcpp::ceiling(index);
  Rcpp::NumericVector probs_names=Rcpp::round(probs*100., 6);
  Rcpp::CharacterVector qs_names(np);
  std::string _name, zero=".000000";
  char string[11];
  double probs_full, probs_trunc;

  for (size_t i=0; i<np; ++i) {
    probs_full=static_cast<double>(probs_names[i]);

    if(probs_full==100.) _name=std::string("100");
    else if(probs_full==0.) _name=std::string("0");
    else {
      int k=std::ceil(log10(probs_full));
      if(probs_full==std::floor(probs_full)) 
	sprintf(string, "%1$*2$.0f", probs_full, k);
      else for(size_t j=0; j<7; ++j) {
	  probs_trunc==(std::floor(probs_full*pow(10, j))/pow(10, j)); // the equal sign
	  if(probs_trunc=probs_full)                                   // and its evils
	    sprintf(string, "%1$*2.*3$f", probs_trunc, j+k+1, j);
	}
      _name=string;
    }

    qs_names[i] = _name + std::string("%");
    qs[i] = y[lo[i]];
    x_hi[i] = y[hi[i]];
    if ((index[i]>lo[i]) && (x_hi[i] != qs[i])) {
      double h;
      h = index[i]-lo[i];
      qs[i] = (1.-h)*qs[i] + h*x_hi[i];
    }
  }

  qs.names()=qs_names;
  return qs;
}

/*** R
N <- 100
P <- 10
set.seed(21)
x <- matrix(runif(N/P), nrow=N/P, ncol=P)
A <- quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## R
B <- Quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## Rcpp
all(A==B)
all(names(A)==names(B))
names(A)
names(B)
*/

which yields...

> all(names(A)==names(B))
[1] FALSE

> names(A)
 [1] "0%"        "1.9%"      "1.99999%"  "3.141593%" "18.9%"     "18.99999%"
 [7] "19%"       "20%"       "21%"       "100%"     

> names(B)
 [1] "0%"        "%82.*3$f%" "%82.*3$f%" "%82.*3$f%" "%92.*3$f%" "%92.*3$f%"
 [7] "19%"       "20%"       "21%"       "100%"     

@eddelbuettel
Copy link
Member

I would try to simplify things a little. Remember that *printf patterns can be tried on the command-line (man printf -- there is a Unix command !!) or in R (help(Rprint)). And as I said earlier, nobody forces you to exactly replicate the R names vector. And/or you could use an Rcpp::Function() to have R create the names for you.

The main thing appears to be solved: you have your quantiles calculation. So the final step of finishing off names is just a nice to have detail, no?

@rsparapa
Copy link
Author

rsparapa commented Jul 12, 2019

Sorry, I was just ranting about C. Thank god we have C++! Got it now...

#include <Rcpp.h>
#include <stdio.h>
#include <string>

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::NumericVector Quantile(Rcpp::NumericVector x, Rcpp::NumericVector probs) {
  // implementation of type 7
  const size_t n=x.size(), np=probs.size();
  if (n==0) return x;
  if (np==0) return probs;
  Rcpp::NumericVector index = (n-1.)*probs, y=x.sort(), x_hi(np), qs(np);
  Rcpp::NumericVector lo = Rcpp::floor(index), hi = Rcpp::ceiling(index);
  Rcpp::NumericVector probs_names=Rcpp::round(probs*100., 6);
  Rcpp::CharacterVector qs_names(np);
  std::string _name, format;
  char string[11], width[2], digits[2];
  double probs_full, probs_trunc;

  for (size_t i=0; i<np; ++i) {
    probs_full=static_cast<double>(probs_names[i]);

    if(probs_full==100.) _name=std::string("100");
    else if(probs_full==0.) _name=std::string("0");
    else {
      int k=std::ceil(log10(probs_full));
      if(probs_full==std::floor(probs_full)) {
	sprintf(string, "%1$*2$.0f", probs_full, k);
	_name=string;
      }
      else for(size_t j=6; j>0; --j) {
	  probs_trunc=(std::floor(probs_full*pow(10, j))/pow(10, j)); 
	  if(probs_trunc==probs_full) {
	    sprintf(width, "%1d", j+k+1);
	    sprintf(digits, "%1d", j);
	    format = std::string("%") + width +
	      std::string(".") + digits + std::string("f");
	    sprintf(string, format.c_str(), probs_trunc);
	    _name=string;
	    //_name=format;
	  }
	}
    }

    qs_names[i] = _name + std::string("%");
    qs[i] = y[lo[i]];
    x_hi[i] = y[hi[i]];
    if ((index[i]>lo[i]) && (x_hi[i] != qs[i])) {
      double h;
      h = index[i]-lo[i];
      qs[i] = (1.-h)*qs[i] + h*x_hi[i];
    }
  }

  qs.names()=qs_names;
  return qs;
}

/*** R
N <- 100
P <- 10
set.seed(21)
x <- matrix(runif(N/P), nrow=N/P, ncol=P)
A <- quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## R
B <- Quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## Rcpp
all(A==B)
all(names(A)==names(B))
names(A)
names(B)
*/

... abracadabra ...

> N <- 100

> P <- 10

> set.seed(21)

> x <- matrix(runif(N/P), nrow=N/P, ncol=P)

> A <- quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## R

> B <- Quantile(x, c(0, 1.9, 1.99999, pi, 18.9, 18.99999, 19:21, 100)/100)  ## Rcpp

> all(A==B)
[1] TRUE

> all(names(A)==names(B))
[1] TRUE

> names(A)
 [1] "0%"        "1.9%"      "1.99999%"  "3.141593%" "18.9%"     "18.99999%"
 [7] "19%"       "20%"       "21%"       "100%"     

> names(B)
 [1] "0%"        "1.9%"      "1.99999%"  "3.141593%" "18.9%"     "18.99999%"
 [7] "19%"       "20%"       "21%"       "100%"     

@github-actions
Copy link

This issue is stale (365 days without activity) and will be closed in 31 days unless new activity is seen. Please feel free to re-open it is still a concern, possibly with additional data.

@rsparapa
Copy link
Author

Issue implemented as shown above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants