-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sorting in CharacterVector is not equal to sorting in R #251
Comments
The character comparison in Rcpp is made by the In
In
In
|
This one is quite thorny, as R does not export |
Last entry before I send a PR later tonight. Sixth entry in Section: Known Issues Title: Lexicographic Order of String Sorting Differs Due to Capitalization Text: Comparing strings within \R hinges on the ability to process the locale or native-language environment of the string. In \R, there is a function called \code{Scollate} that performs the comparison on locale. Unfortunately, this function has not been made publicly available and, thus, Rcpp does \textit{not} have access to it within its implementation of \code{StrCmp}. As a result, strings that are sorted under the \code{.sort()} member function are ordered improperly. Specifically, if capitalization is present, then capitalized words are sorted together followed by the sorting of lowercase words instead of a mixture of capitalized and lowercase words. The issue is illustrated by the following code example: #include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::CharacterVector sortcpp(Rcpp::CharacterVector X) {
X.sort();
return X;
}
/*** R
x <- c("B", "b", "c", "A", "a")
# Using R's sort
sort(x)
## "a" "A" "b" "B" "c"
# Using Rcpp's sort
sortcpp(x)
## "A" "B" "a" "b" "c"
*/ |
@eddelbuettel: This issue can now be closed. |
Lexicographic order in CharacterVector differs from the result obtained in R. See example.
Is this result the expected behaviour?
Thank you!
The text was updated successfully, but these errors were encountered: