-
-
Notifications
You must be signed in to change notification settings - Fork 219
Closed
Labels
Description
I believe this is distinct from the collation issue described in #251. Calling std::sort
on an Rcpp::CharacterVector
produces very unexpected results on my machine (Ubuntu 14.04):
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::CharacterVector RcppSort(Rcpp::CharacterVector x) {
Rcpp::CharacterVector y = Rcpp::clone(x);
y.sort();
return y;
}
// [[Rcpp::export]]
Rcpp::CharacterVector StdSort(Rcpp::CharacterVector x) {
Rcpp::CharacterVector y = Rcpp::clone(x);
std::sort(y.begin(), y.end());
return y;
}
// [[Rcpp::export]]
std::vector<std::string> StdSort2(Rcpp::CharacterVector x) {
std::vector<std::string> y = Rcpp::as<std::vector<std::string> >(x);
std::sort(y.begin(), y.end());
return y;
}
/*** R
set.seed(123)
(xx <- sample(c(LETTERS[1:5], letters[1:6]), 11))
#[1] "D" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"
RcppSort(xx)
#[1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"
StdSort(xx)
#[1] "f" "f" "f" "f" "f" "f" "D" "c" "f" "f" "f"
## ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StdSort2(xx)
#[1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"
*/
I'm consistently getting the same strange output from StdSort(xx)
whether compiled with clang
(5.3) or gcc
(4.9.3). Presumably this is the comparator being used in StdSort
bool operator<(const Rcpp::String& other) const {
return strcmp(get_cstring(), other.get_cstring()) < 0;
}
which does not seem to be doing anything unusual. Unfortunately I'm not terribly familiar with the internals of Rcpp::String
/ Rcpp::string_proxy<>
, so I really can't imagine what could be causing this behavior, but it looked like something worth pointing out.
My session info:
#R version 3.2.3 (2015-12-10)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: Ubuntu 14.04.3 LTS
#
#locale:
#[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
#[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
#[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#loaded via a namespace (and not attached):
#[1] tools_3.2.3 Rcpp_0.12.3