Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upsorting in CharacterVector is not equal to sorting in R #251
Comments
|
The character comparison in Rcpp is made by the In
In
In
|
|
This one is quite thorny, as R does not export |
|
Last entry before I send a PR later tonight. Sixth entry in Section: Known Issues Title: Lexicographic Order of String Sorting Differs Due to Capitalization Text: Comparing strings within \R hinges on the ability to process the locale or native-language environment of the string. In \R, there is a function called \code{Scollate} that performs the comparison on locale. Unfortunately, this function has not been made publicly available and, thus, Rcpp does \textit{not} have access to it within its implementation of \code{StrCmp}. As a result, strings that are sorted under the \code{.sort()} member function are ordered improperly. Specifically, if capitalization is present, then capitalized words are sorted together followed by the sorting of lowercase words instead of a mixture of capitalized and lowercase words. The issue is illustrated by the following code example: #include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::CharacterVector sortcpp(Rcpp::CharacterVector X) {
X.sort();
return X;
}
/*** R
x <- c("B", "b", "c", "A", "a")
# Using R's sort
sort(x)
## "a" "A" "b" "B" "c"
# Using Rcpp's sort
sortcpp(x)
## "A" "B" "a" "b" "c"
*/ |
|
@eddelbuettel: This issue can now be closed. |
Lexicographic order in CharacterVector differs from the result obtained in R. See example.
Is this result the expected behaviour?
Thank you!