New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior with CharacterVector and std::sort #419

Closed
nathan-russell opened this Issue Jan 15, 2016 · 21 comments

Comments

Projects
None yet
5 participants
@nathan-russell
Contributor

nathan-russell commented Jan 15, 2016

I believe this is distinct from the collation issue described in #251. Calling std::sort on an Rcpp::CharacterVector produces very unexpected results on my machine (Ubuntu 14.04):

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::CharacterVector RcppSort(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x);
  y.sort();
  return y;
}

// [[Rcpp::export]]
Rcpp::CharacterVector StdSort(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x); 
  std::sort(y.begin(), y.end());
  return y;
}

// [[Rcpp::export]]
std::vector<std::string> StdSort2(Rcpp::CharacterVector x) {
  std::vector<std::string> y = Rcpp::as<std::vector<std::string> >(x);
  std::sort(y.begin(), y.end());
  return y;
}

/*** R

set.seed(123)
(xx <- sample(c(LETTERS[1:5], letters[1:6]), 11))
#[1] "D" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"

RcppSort(xx)
#[1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

StdSort(xx)
#[1] "f" "f" "f" "f" "f" "f" "D" "c" "f" "f" "f"
##    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

StdSort2(xx)
#[1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

*/

I'm consistently getting the same strange output from StdSort(xx) whether compiled with clang (5.3) or gcc (4.9.3). Presumably this is the comparator being used in StdSort

bool operator<(const Rcpp::String& other) const {
  return strcmp(get_cstring(), other.get_cstring()) < 0;
}

which does not seem to be doing anything unusual. Unfortunately I'm not terribly familiar with the internals of Rcpp::String / Rcpp::string_proxy<>, so I really can't imagine what could be causing this behavior, but it looked like something worth pointing out.

My session info:

#R version 3.2.3 (2015-12-10)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: Ubuntu 14.04.3 LTS
#
#locale:
#[1]  LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
#[4]  LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#[7]  LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
#[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#
#attached base packages:
#[1] stats     graphics  grDevices utils     datasets  methods   base     
#
#loaded via a namespace (and not attached):
#[1] tools_3.2.3 Rcpp_0.12.3
@kevinushey

This comment has been minimized.

Show comment
Hide comment
@kevinushey

kevinushey Jan 15, 2016

Contributor

FWIW, this works correctly for me on OS X. Perhaps we're bumping into a weird libc++ vs. libstdc++ issue?

Contributor

kevinushey commented Jan 15, 2016

FWIW, this works correctly for me on OS X. Perhaps we're bumping into a weird libc++ vs. libstdc++ issue?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jan 15, 2016

Member

It's borked on Ubuntu 15.04 (my server) with g++-4.9.

Member

eddelbuettel commented Jan 15, 2016

It's borked on Ubuntu 15.04 (my server) with g++-4.9.

@nathan-russell

This comment has been minimized.

Show comment
Hide comment
@nathan-russell

nathan-russell Jan 15, 2016

Contributor

Hmm it looks like OS X is the odd man out (or rather, odd man in); I get the same gibberish output ([1] "f" "f" "f" "f" "f" "f" "D" "c" "f" "f" "f") on CentOS 7 (gcc and clang), as well as Windows 7 (x86_64-w64-mingw32/x64 (64-bit)).

Interestingly, std::stable_sort produces a different incorrect output:

// [[Rcpp::export]]
Rcpp::CharacterVector StableSort(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x); 
  std::stable_sort(y.begin(), y.end());
  return y;
}

// "d" "D" "c" "d" "d" "d" "d" "d" "f" "f" "f"
Contributor

nathan-russell commented Jan 15, 2016

Hmm it looks like OS X is the odd man out (or rather, odd man in); I get the same gibberish output ([1] "f" "f" "f" "f" "f" "f" "D" "c" "f" "f" "f") on CentOS 7 (gcc and clang), as well as Windows 7 (x86_64-w64-mingw32/x64 (64-bit)).

Interestingly, std::stable_sort produces a different incorrect output:

// [[Rcpp::export]]
Rcpp::CharacterVector StableSort(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x); 
  std::stable_sort(y.begin(), y.end());
  return y;
}

// "d" "D" "c" "d" "d" "d" "d" "d" "f" "f" "f"
@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Jan 15, 2016

Contributor

I haven't looked at the code, but I would suspect that this has to do with iterator behavior in Rcpp::CharacterVector not being 100% compliant with stl standards. std::sort and std::stable_sort require 100% stl compliant containers and iterators.

Specifically the offset dereference operator comes to mind http://www.cplusplus.com/reference/iterator/

Contributor

dcdillon commented Jan 15, 2016

I haven't looked at the code, but I would suspect that this has to do with iterator behavior in Rcpp::CharacterVector not being 100% compliant with stl standards. std::sort and std::stable_sort require 100% stl compliant containers and iterators.

Specifically the offset dereference operator comes to mind http://www.cplusplus.com/reference/iterator/

@kevinushey

This comment has been minimized.

Show comment
Hide comment
@kevinushey

kevinushey Jan 15, 2016

Contributor

FWIW, it does work correctly with libc++ (tested on Ubuntu):

CXX=clang++-3.6
CXXFLAGS=-g -O3 -Wall -march=native -stdlib=libc++
LDFLAGS=-stdlib=libc++

kevin@KBOX:~/git/Rcpp [master]
$ R --vanilla --slave -e 'Rcpp::sourceCpp("ouch.cpp")'

> set.seed(123)

> (xx <- sample(c(LETTERS[1:5], letters[1:6]), 11))
 [1] "D" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"

> RcppSort(xx)
 [1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

> StdSort(xx)
 [1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

> StdSort2(xx)
 [1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

Given that IIUC libc++ and libstdc++ diverge a bit in their implementations of iterators, I think @dcdillon is probably right.

Contributor

kevinushey commented Jan 15, 2016

FWIW, it does work correctly with libc++ (tested on Ubuntu):

CXX=clang++-3.6
CXXFLAGS=-g -O3 -Wall -march=native -stdlib=libc++
LDFLAGS=-stdlib=libc++

kevin@KBOX:~/git/Rcpp [master]
$ R --vanilla --slave -e 'Rcpp::sourceCpp("ouch.cpp")'

> set.seed(123)

> (xx <- sample(c(LETTERS[1:5], letters[1:6]), 11))
 [1] "D" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"

> RcppSort(xx)
 [1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

> StdSort(xx)
 [1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

> StdSort2(xx)
 [1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

Given that IIUC libc++ and libstdc++ diverge a bit in their implementations of iterators, I think @dcdillon is probably right.

@nathan-russell

This comment has been minimized.

Show comment
Hide comment
@nathan-russell

nathan-russell Jan 16, 2016

Contributor

Is it possible that libstdc++ isn't using the offset dereference operator in their sort implementations, and libc++ is? Because after trying out several modifications to Proxy_Iterator<>::operator[] and not seeing any change from my above results, I recompiled Rcpp with that operator completely commented out, reran the example, and got the same output. So perhaps this is caused by some other aspect of the string_proxy or Proxy_Iterator classes?

Contributor

nathan-russell commented Jan 16, 2016

Is it possible that libstdc++ isn't using the offset dereference operator in their sort implementations, and libc++ is? Because after trying out several modifications to Proxy_Iterator<>::operator[] and not seeing any change from my above results, I recompiled Rcpp with that operator completely commented out, reran the example, and got the same output. So perhaps this is caused by some other aspect of the string_proxy or Proxy_Iterator classes?

@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Jan 16, 2016

Contributor

I can show what's happening (I believe)

// [[Rcpp::export]]
Rcpp::CharacterVector LessTest(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x);

  Rcpp::internal::string_proxy<16> tmp(*y.begin());
  *y.begin() = *(y.begin() + 1);
  *(y.begin() + 1) = tmp;

  return y;
}
> LessTest(xx)
 [1] "c" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"

I don't know why, but it seems that std::sort in gcc is not using std::swap (or possibly not locating the correct std::swap for Rcpp::string_proxy.

Running the same code with std::swap seems to work as evidenced below:

// [[Rcpp::export]]
Rcpp::CharacterVector LessTest(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x);

  //Rcpp::internal::string_proxy<16> tmp(*y.begin());
  //*y.begin() = *(y.begin() + 1);
  //*(y.begin() + 1) = tmp;

  std::swap(*y.begin(), *(y.begin() + 1));

  return y;
}
> LessTest(xx)
 [1] "c" "D" "f" "e" "b" "A" "C" "d" "B" "a" "E"

So the good news is I can explain what is happening. The bad news is I have no idea how to fix it.

Contributor

dcdillon commented Jan 16, 2016

I can show what's happening (I believe)

// [[Rcpp::export]]
Rcpp::CharacterVector LessTest(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x);

  Rcpp::internal::string_proxy<16> tmp(*y.begin());
  *y.begin() = *(y.begin() + 1);
  *(y.begin() + 1) = tmp;

  return y;
}
> LessTest(xx)
 [1] "c" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"

I don't know why, but it seems that std::sort in gcc is not using std::swap (or possibly not locating the correct std::swap for Rcpp::string_proxy.

Running the same code with std::swap seems to work as evidenced below:

// [[Rcpp::export]]
Rcpp::CharacterVector LessTest(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x);

  //Rcpp::internal::string_proxy<16> tmp(*y.begin());
  //*y.begin() = *(y.begin() + 1);
  //*(y.begin() + 1) = tmp;

  std::swap(*y.begin(), *(y.begin() + 1));

  return y;
}
> LessTest(xx)
 [1] "c" "D" "f" "e" "b" "A" "C" "d" "B" "a" "E"

So the good news is I can explain what is happening. The bad news is I have no idea how to fix it.

@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Jan 16, 2016

Contributor

And a brief (and probably incomplete) look through the gcc code would indicate that it's probably using std::iter_swap instead of std::swap which doesn't have a proper overload.

Contributor

dcdillon commented Jan 16, 2016

And a brief (and probably incomplete) look through the gcc code would indicate that it's probably using std::iter_swap instead of std::swap which doesn't have a proper overload.

@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Jan 16, 2016

Contributor

OK. After further research the problem is this. Rcpp::string_proxy needs template specializations for every sequence modifying STL "primitive" (e.g. std::swap, std::move, std::copy) and all their variants.

The reason is that between different versions of libstdc++ and even differences in range sizes, different optimizations are implemented in terms of the "primitives". Taking for instance the example that we had above. On a sequence of length 11 as in the example, std::sort uses an optimization that uses std::copy_backward and std::copy. If we bump that size to 50, it uses std::iter_swap instead. If we switch to C++11, it uses std::move_backward and std::move. The problem just continues and continues.

I am willing to play with this a bit and try to find something that will work, however, without a full blown specialization of every STL sequence modifying "primitive", this will not work across all versions and all compilers.

Let me know what you think.

Contributor

dcdillon commented Jan 16, 2016

OK. After further research the problem is this. Rcpp::string_proxy needs template specializations for every sequence modifying STL "primitive" (e.g. std::swap, std::move, std::copy) and all their variants.

The reason is that between different versions of libstdc++ and even differences in range sizes, different optimizations are implemented in terms of the "primitives". Taking for instance the example that we had above. On a sequence of length 11 as in the example, std::sort uses an optimization that uses std::copy_backward and std::copy. If we bump that size to 50, it uses std::iter_swap instead. If we switch to C++11, it uses std::move_backward and std::move. The problem just continues and continues.

I am willing to play with this a bit and try to find something that will work, however, without a full blown specialization of every STL sequence modifying "primitive", this will not work across all versions and all compilers.

Let me know what you think.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jan 16, 2016

Member

Thanks so much for digging in. And this seems super annoying. Is there another string / container / iterator library we could borrow an idea from?

Member

eddelbuettel commented Jan 16, 2016

Thanks so much for digging in. And this seems super annoying. Is there another string / container / iterator library we could borrow an idea from?

@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Jan 16, 2016

Contributor

We could also simply specialize sort for Rcpp::CharacterVector.

That's actually probably the easiest.

Contributor

dcdillon commented Jan 16, 2016

We could also simply specialize sort for Rcpp::CharacterVector.

That's actually probably the easiest.

@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Jan 16, 2016

Contributor

Actually it might be the only possible way. Another problem is...behind the scenes...the STL implementation assumes that using the copy constructor of a Rcpp::string_proxy creates a copy that is not affected by assigning to the original value. This is not the case given the implementation of Rcpp::string_proxy so even properly specializing ALL the STL stuff won't save it.

I give up.

Contributor

dcdillon commented Jan 16, 2016

Actually it might be the only possible way. Another problem is...behind the scenes...the STL implementation assumes that using the copy constructor of a Rcpp::string_proxy creates a copy that is not affected by assigning to the original value. This is not the case given the implementation of Rcpp::string_proxy so even properly specializing ALL the STL stuff won't save it.

I give up.

@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Jan 17, 2016

Contributor

And because I cannot leave well enough alone (from the libstdc++ code). Please note lines 1849 - 1851. This is unsafe with Rcpp::internals::string_proxy regardless of whether we implement all the sequence modifying STL stuff appropriately.

 1836   /// This is a helper function for the sort routine.
 1837   template<typename _RandomAccessIterator, typename _Compare>
 1838     void
 1839     __insertion_sort(_RandomAccessIterator __first,
 1840              _RandomAccessIterator __last, _Compare __comp)
 1841     {
 1842       if (__first == __last) return;
 1843 
 1844       for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
 1845     {
 1846       if (__comp(__i, __first))
 1847         {
 1848           typename iterator_traits<_RandomAccessIterator>::value_type
 1849         __val = _GLIBCXX_MOVE(*__i);
 1850           _GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);
 1851           *__first = _GLIBCXX_MOVE(__val);
 1852         }
 1853       else
 1854         std::__unguarded_linear_insert(__i,
 1855                 __gnu_cxx::__ops::__val_comp_iter(__comp));
 1856     }
 1857     }
Contributor

dcdillon commented Jan 17, 2016

And because I cannot leave well enough alone (from the libstdc++ code). Please note lines 1849 - 1851. This is unsafe with Rcpp::internals::string_proxy regardless of whether we implement all the sequence modifying STL stuff appropriately.

 1836   /// This is a helper function for the sort routine.
 1837   template<typename _RandomAccessIterator, typename _Compare>
 1838     void
 1839     __insertion_sort(_RandomAccessIterator __first,
 1840              _RandomAccessIterator __last, _Compare __comp)
 1841     {
 1842       if (__first == __last) return;
 1843 
 1844       for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
 1845     {
 1846       if (__comp(__i, __first))
 1847         {
 1848           typename iterator_traits<_RandomAccessIterator>::value_type
 1849         __val = _GLIBCXX_MOVE(*__i);
 1850           _GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);
 1851           *__first = _GLIBCXX_MOVE(__val);
 1852         }
 1853       else
 1854         std::__unguarded_linear_insert(__i,
 1855                 __gnu_cxx::__ops::__val_comp_iter(__comp));
 1856     }
 1857     }
@nathan-russell

This comment has been minimized.

Show comment
Hide comment
@nathan-russell

nathan-russell Jan 17, 2016

Contributor

@dcdillon Thank you for taking the time to dig into this. To clarify, what is the motivation for having the copy constructor of string_proxy behave as such? Is it somehow necessitated by R's internal use of a global string cache, or is it simply a performance feature?

Contributor

nathan-russell commented Jan 17, 2016

@dcdillon Thank you for taking the time to dig into this. To clarify, what is the motivation for having the copy constructor of string_proxy behave as such? Is it somehow necessitated by R's internal use of a global string cache, or is it simply a performance feature?

@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Jan 17, 2016

Contributor

The formal reason why this fails is that std::sort according to the C++ standard is that Rcpp::internal::string_proxy is NOT MoveAssignable (as defined on page 466 of the standard in Table 22). This is because of the way operator=(const string_proxy &rhs) works. The left hand side of that expression is NOT "equivalent" to the right hand side before the operation.

We should mark this as "impossible" for now (without a huge change to string_proxy) and add documentation indicating that using STL algorithms on Rcpp containers is not supported.

Contributor

dcdillon commented Jan 17, 2016

The formal reason why this fails is that std::sort according to the C++ standard is that Rcpp::internal::string_proxy is NOT MoveAssignable (as defined on page 466 of the standard in Table 22). This is because of the way operator=(const string_proxy &rhs) works. The left hand side of that expression is NOT "equivalent" to the right hand side before the operation.

We should mark this as "impossible" for now (without a huge change to string_proxy) and add documentation indicating that using STL algorithms on Rcpp containers is not supported.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jan 17, 2016

Member

I am fine with documenting it. There are some things that remain simply hard because of internal implementation details on either the R (and its C internals) and C++ library side.

Question is where to document this (best) and how. Probably couple of places...

Member

eddelbuettel commented Jan 17, 2016

I am fine with documenting it. There are some things that remain simply hard because of internal implementation details on either the R (and its C internals) and C++ library side.

Question is where to document this (best) and how. Probably couple of places...

@dcdillon

This comment has been minimized.

Show comment
Hide comment
@dcdillon

dcdillon Mar 21, 2016

Contributor

Should we mark this as documentation or close it as currently nothing short of writing a specialization of std::sort can fix it?

Contributor

dcdillon commented Mar 21, 2016

Should we mark this as documentation or close it as currently nothing short of writing a specialization of std::sort can fix it?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Mar 21, 2016

Member

May make sense to document, ie in the Rcpp FAQ. Then at least we can lean back and say you had been warned ...

Member

eddelbuettel commented Mar 21, 2016

May make sense to document, ie in the Rcpp FAQ. Then at least we can lean back and say you had been warned ...

@coatless

This comment has been minimized.

Show comment
Hide comment
@coatless

coatless Jul 15, 2016

Contributor

To Do: Needs documentation flag and add a section under Examples in FAQ.

Contributor

coatless commented Jul 15, 2016

To Do: Needs documentation flag and add a section under Examples in FAQ.

@coatless

This comment has been minimized.

Show comment
Hide comment
@coatless

coatless Mar 24, 2017

Contributor

Third entry for Section: Known Issues

Title: Sorting with STL on a \code{CharacterVector} produces problematic results

Text:

The Standard Template Library (STL) \code{std::sort} algorithm performs adequately for the majority of \pkg{Rcpp} data types. The notable exception that makes what would otherwise be a universal quantifier into an existential quantifier is the \code{CharacterVector} data type. Chiefly, the issue with sorting strings is related to how the \code{CharacterVector} relies upon the use of \code{Rcpp::internal::string_proxy}. In particular, \code{Rcpp::internal::string_proxy} is \textit{not} MoveAssignable since the left hand side of \code{operator=(const string_proxy &rhs)} is \textit{not} viewed as equivalent to the right hand side before the operation \citep[][p. 466, Table 22]{Cpp11}. This further complicates matters when using \code{CharacterVector} with \code{std::swap}, \code{std::move}, \code{std::copy} and their variants.

To avoid unwarranted pain with sorting, the preferred approach is to use the \code{.sort()} member function of \pkg{Rcpp} objects. The member function correctly applies the sorting procedure to \pkg{Rcpp} objects regardless of type. The following code example illustrates the preferred approach alongside the problematic approach:

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::CharacterVector member_sort(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x);
  y.sort();
  return y;
}

// [[Rcpp::export]]
Rcpp::CharacterVector stl_sort(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x); 
  std::sort(y.begin(), y.end());
  return y;
}

/*** R
set.seed(123)
(X <- sample(c(LETTERS[1:5], letters[1:6]), 11))
#[1] "D" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"

member_sort(X)
#[1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

stl_sort(X)
#[1] "f" "f" "f" "f" "f" "f" "D" "c" "f" "f" "f"
*/

In closing, the results of using the STL approach do change depending on whether \code{libc++} or \code{libstdc++} standard library is used to compile the code. When debugging, this does make the issue particularly complex to sort out. Principally, compilation with \code{libc++} and STL has been shown to yield the correct results. However, it is not wise to rely upon this library as a majority of code is compiled against \code{libstdc++} as it more complete.

Contributor

coatless commented Mar 24, 2017

Third entry for Section: Known Issues

Title: Sorting with STL on a \code{CharacterVector} produces problematic results

Text:

The Standard Template Library (STL) \code{std::sort} algorithm performs adequately for the majority of \pkg{Rcpp} data types. The notable exception that makes what would otherwise be a universal quantifier into an existential quantifier is the \code{CharacterVector} data type. Chiefly, the issue with sorting strings is related to how the \code{CharacterVector} relies upon the use of \code{Rcpp::internal::string_proxy}. In particular, \code{Rcpp::internal::string_proxy} is \textit{not} MoveAssignable since the left hand side of \code{operator=(const string_proxy &rhs)} is \textit{not} viewed as equivalent to the right hand side before the operation \citep[][p. 466, Table 22]{Cpp11}. This further complicates matters when using \code{CharacterVector} with \code{std::swap}, \code{std::move}, \code{std::copy} and their variants.

To avoid unwarranted pain with sorting, the preferred approach is to use the \code{.sort()} member function of \pkg{Rcpp} objects. The member function correctly applies the sorting procedure to \pkg{Rcpp} objects regardless of type. The following code example illustrates the preferred approach alongside the problematic approach:

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::CharacterVector member_sort(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x);
  y.sort();
  return y;
}

// [[Rcpp::export]]
Rcpp::CharacterVector stl_sort(Rcpp::CharacterVector x) {
  Rcpp::CharacterVector y = Rcpp::clone(x); 
  std::sort(y.begin(), y.end());
  return y;
}

/*** R
set.seed(123)
(X <- sample(c(LETTERS[1:5], letters[1:6]), 11))
#[1] "D" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"

member_sort(X)
#[1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"

stl_sort(X)
#[1] "f" "f" "f" "f" "f" "f" "D" "c" "f" "f" "f"
*/

In closing, the results of using the STL approach do change depending on whether \code{libc++} or \code{libstdc++} standard library is used to compile the code. When debugging, this does make the issue particularly complex to sort out. Principally, compilation with \code{libc++} and STL has been shown to yield the correct results. However, it is not wise to rely upon this library as a majority of code is compiled against \code{libstdc++} as it more complete.

@coatless

This comment has been minimized.

Show comment
Hide comment
@coatless

coatless Mar 31, 2017

Contributor

@eddelbuettel: This issue can now be closed.

Contributor

coatless commented Mar 31, 2017

@eddelbuettel: This issue can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment