Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster i #4585

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Faster i #4585

wants to merge 7 commits into from

Conversation

ColeMiller1
Copy link
Contributor

@ColeMiller1 ColeMiller1 commented Jul 1, 2020

Towards #3735 (maybe closes...?)
Closes this comment in the code:

TODO: Incorporate which_ here on DT[!i] where i is logical. Should avoid i = !i (above) - inefficient.

dt[i, ] is around twice as fast than before.

library(data.table)
## Base
allIterations <- data.frame(v1 = runif(1e5), v2 = runif(1e5))
DoSomething <- function(row) someCalculation <- row[["v1"]] + 1
system.time({for (r in 1:nrow(allIterations)) {DoSomething(allIterations[r, ])}})

##   user  system elapsed 
##   5.67    0.02    5.91 

setDT(allIterations)
system.time({for (r in 1:nrow(allIterations)) {DoSomething(allIterations[r, ])}})

## Before Patch
##   user  system elapsed 
##  17.53    0.58   18.46

## After Patch
##   user  system elapsed 
##   9.53    0.00    9.67 

For dt[!lgl] we see a lot of memory savings with some speed savings:

library(data.table)
set.seed(123L)
n = 1e8L
dt = data.table(rep.int(1L, n))
inds = sample(c(FALSE, TRUE), n, TRUE)
bench::mark(dt[!inds])

## Before Patch
##  expression   min median `itr/sec` mem_alloc
##  <bch:expr> <bch> <bch:>     <dbl> <bch:byt>
##1 dt[!inds]  1150ms  1150ms     0.873    1.12GB

## After Patch
##  expression   min median `itr/sec` mem_alloc
##  <bch:expr> <bch> <bch:>     <dbl> <bch:byt>
##1 dt[!inds]  925ms  925ms      1.08     763MB

CconvertNegAndZeroIdx is also faster and also includes break when threads are now 1. Also, avoiding the OpenMP when threads are set to 1 also improves performance on at least Windows.

Note - there probably could be follow-up PRs related to the default number of threads (for me on only 2T, somewhere between 1E5 and 1E6 is where the break even point is). Secondly, c(0, seq_len(1025L)) is somehow faster than seq_len(1025L) within the function with this early break. It just seems surprising that somehow removing a zero is faster than returning the inds as is.

library(data.table)

## small scenario just over the 1024 row threshold of 2 threads:
inds = seq_len(1025L)
system.time(for (i in 1:100000) .Call(data.table:::CconvertNegAndZeroIdx, inds, 2000L, TRUE))

setDTthreads(1L)
##   user  system elapsed 
##   1.05    0.00    1.22 

setDTthreads(2L)
##   user  system elapsed 
##   2.90    1.52    4.61  

## early break scenario which is best case scenario
inds = c(0L, inds)

system.time(for (i in 1:100000) .Call(data.table:::CconvertNegAndZeroIdx, inds, 2000L, TRUE))

setDTthreads(1L)
##   user  system elapsed 
##   0.62    0.00    0.63 

setDTthreads(2L)
##   user  system elapsed 
##   3.75    1.54    5.73 

## a normal scenario - 1 million row 
inds = seq_len(1e6L)
system.time(for (i in 1:5000) .Call(data.table:::CconvertNegAndZeroIdx, inds, 2000L, TRUE))

setDTthreads(1L)
##   user  system elapsed 
##   4.98    0.00    5.09 

setDTthreads(2L)
##   user  system elapsed 
##   7.27    0.12    4.19 

@codecov
Copy link

codecov bot commented Jul 1, 2020

Codecov Report

Merging #4585 into master will decrease coverage by 0.02%.
The diff coverage is 95.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4585      +/-   ##
==========================================
- Coverage   99.61%   99.58%   -0.03%     
==========================================
  Files          73       73              
  Lines       14119    14120       +1     
==========================================
- Hits        14064    14061       -3     
- Misses         55       59       +4     
Impacted Files Coverage Δ
src/subset.c 98.00% <60.00%> (-2.00%) ⬇️
R/data.table.R 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba32f3c...f5d5f13. Read the comment docs.

tt_isub = substitute(i)
tt_jsub = substitute(j)
if (!is.null(names(sys.call())) && # not relying on nargs() as it considers DT[,] to have 3 arguments, #3163
tryCatch(!is.symbol(tt_isub), error=function(e)TRUE) && # a symbol that inherits missingness from caller isn't missing for our purpose; test 1974
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what errors are being caught here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the test. In this case, cols is missing I believe.

# no error when j is supplied but inherits missingness from caller
DT = data.table(a=1:3, b=4:6)
f = function(cols) DT[,cols]
test(1974.1, f(), output="a.*b.*3:.*6")

edit: I did try removing this branch but it produced errors. It's a real head scratcher but I just kept it. It's only been moved.

R/data.table.R Outdated Show resolved Hide resolved
# #932 related so that !(v1 == 1) becomes v1 == 1 instead of (v1 == 1) after removing "!"
if (isub %iscall% "(" && !is.name(isub[[2L]]))
if (isub %iscall% "eval") { # TO DO: or ..()
isub = eval(.massagei(isub[[2L]]), list(.N = nrow(x)), parent.frame())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add .SD=x to envir arg here & get .SD to work in i just like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just compiled with adding .SD and success!

Note, previously .N was assigned to the parent.frame() and then restoring it if necessary. Because of that, all 4 eval calls related to processing i were largely the same.

While skipping that approach is faster, we now have to deal with associating each of the 4 eval calls with .N or whatever special variable(s) we want to use so there's a little more accounting. In theory we could have also used the previous approach to also assign .SD to the parent.frame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, came across this comment again 🙃

if ((elem < 1 && elem != NA_INTEGER) || elem > max) stop = true;
}
} else {
#pragma omp parallel for num_threads(nth)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenMP loop is what is missing in coverage. I am unsure - I foolishly included Rprintf("OpenMP_Loop") within the loop and during at least one of the tests, my console was full of "OpenMP_Loop" statements. That would suggest that the code coverage bot only has 1 thread, but I would have expected similar issues in #4558 as I incorporated the approach from that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants