Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with text indexing #2114

Closed
JBreidaks opened this issue Apr 13, 2017 · 1 comment · Fixed by #3778
Closed

Problems with text indexing #2114

JBreidaks opened this issue Apr 13, 2017 · 1 comment · Fixed by #3778
Milestone

Comments

@JBreidaks
Copy link

JBreidaks commented Apr 13, 2017

I have vector with characters:
y <- c('B','Ā','Č','D','Ē','E')

When I sort using order for vector
y2 <- y[order(y)]
y2

[1] "Ā" "B" "Č" "D" "E" "Ē"
y2 <- data.table(y = y2)

If I make vector as data.table and sort with setkeyv then can not get the same result as y2.

test2 <- data.table(y)
setkeyv(test2, "y")
test2

identical(y2, test2)

Maybe can give some solution for this situation?

@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 19, 2019

I believe this is due to C ordering:

sapply(c('B','Ā','Č','D','Ē','E'), utf8ToInt)
#   B   Ā   Č   D   Ē   E 
#  66 256 268  68 274  69 

whereas y[order(y)] is using locale sorting, C sorting will be according to utf8ToInt IINM.

Will add a note to the setkey documentation about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants