Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is.sorted is slow for determining whether jval is sorted by key #4498

Open
MichaelChirico opened this issue May 26, 2020 · 2 comments · May be fixed by #4501
Open

is.sorted is slow for determining whether jval is sorted by key #4498

MichaelChirico opened this issue May 26, 2020 · 2 comments · May be fixed by #4501
Labels
breaking-change issues whose solution would require breaking existing behavior High
Milestone

Comments

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented May 26, 2020

From SO:

https://stackoverflow.com/questions/62019120/why-does-data-table-notation-for-column-retrieval-affect-speed/62028864#62028864

x <- as.data.table(as.character(rnorm(20000000,1,0.5)))
setkey(x,V1)

tic(); x[,.(V1)]; toc()
# 25.08 sec elapsed

(timing is even worse on my machine)

The bottleneck appears to be this line:

if (haskey(x) && all(key(x) %chin% names(jval)) && suppressWarnings(is.sorted(jval, by=key(x)))) # TO DO: perhaps this usage of is.sorted should be allowed internally then (tidy up and make efficient)

IINM we can tell the output is sorted because V1 is the key and it appears as a name -- no need to compute the sort order all over again.

@jangorecki
Copy link
Member

@jangorecki jangorecki commented May 26, 2020

could you check if recomputing key in this case will be resolved by #4386?
it won't because jval does not have key/indices anymore

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jun 15, 2020

I think we can move this issue to a next release. Is.sorted is now optimized so the issue is at least less painful now.

@jangorecki jangorecki modified the milestones: 1.12.9, 1.12.11 Jun 15, 2020
@jangorecki jangorecki added High breaking-change issues whose solution would require breaking existing behavior labels Jun 20, 2020
@mattdowle mattdowle modified the milestones: 1.13.1, 1.13.3 Oct 17, 2020
@jangorecki jangorecki modified the milestones: 1.14.3, 1.14.5 Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change issues whose solution would require breaking existing behavior High
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants