Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test and confirm new parallel subset performance #3175

Closed
jangorecki opened this issue Dec 1, 2018 · 5 comments · Fixed by #4484
Closed

test and confirm new parallel subset performance #3175

jangorecki opened this issue Dec 1, 2018 · 5 comments · Fixed by #4484

Comments

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Dec 1, 2018

Matt commented :

data.table/src/subset.c

Lines 27 to 30 in 1847500

// For small n such as 2,3,4 etc we hope OpenMP will be sensible inside it and not create a team with each thread doing just one item. Otherwise,
// call overhead would be too high for highly iterated calls on very small subests. TODO: test and confirm
// Futher, we desire (currently at least) to stress-test the threaded code (especially in latest R-devel) on small data to reduce chance that bugs
// arise only over a threshold of n.

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Jan 24, 2019

Following script tests subset by integer row ids. It also measures the timing of !anyNA branch. For testing openmp overhead it should be enough.

vim dt-parallel-subset.R
args = as.integer(commandArgs(TRUE))
th = args[1L]
N = args[2L]
K = 100L

get_i = function(n.out, n.in) {
  n.out = as.integer(n.out)
  n.in = as.integer(n.in)
  set.seed(n.out)
  sample(n.in, n.out)
}

library(data.table)
cat(sprintf("# datagen %s rows\n", N))
set.seed(108)
DT = data.table(
  id1 = sample(sprintf("id%03d",1:K), N, TRUE),      # large groups (char)
  id2 = sample(sprintf("id%03d",1:K), N, TRUE),      # large groups (char)
  id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
  id4 = sample(K, N, TRUE),                          # large groups (int)
  id5 = sample(K, N, TRUE),                          # large groups (int)
  id6 = sample(N/K, N, TRUE),                        # small groups (int)
  v1 =  sample(5, N, TRUE),                          # int in range [1,5]
  v2 =  sample(5, N, TRUE),                          # int in range [1,5]
  v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 23.5749
)

cat(sprintf("# setDTthreads(%s)\n", th))
setDTthreads(th)

cat("# 0 row (first `[`` call overhead):\n")
system.time(ans<-DT[0L])

cat("# 1 row:\n")
i = get_i(1L, nrow(DT))
system.time(ans<-DT[i])

cat("# 2 rows:\n")
i = get_i(2L, nrow(DT))
system.time(ans<-DT[i])

cat("# 5 rows:\n")
i = get_i(5L, nrow(DT))
system.time(ans<-DT[i])

cat("# 10% of rows:\n")
i = get_i(nrow(DT)*0.1, nrow(DT))
system.time(ans<-DT[i])

q("no")
Rscript dt-parallel-subset.R 1 1e6

timings coming soon

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Jan 24, 2019

1th 1e7

> Rscript dt-parallel-subset.R 1 1e7
# datagen 10000000 rows
# setDTthreads(1)
# 0 row (first `[`` call overhead):
   user  system elapsed 
  0.005   0.000   0.005 
# 1 row:
   user  system elapsed 
      0       0       0 
# 2 rows:
   user  system elapsed 
  0.000   0.000   0.001 
# 5 rows:
   user  system elapsed 
  0.000   0.000   0.001 
# 10% of rows:
   user  system elapsed 
  0.153   0.012   0.165 

20th 1e7

> Rscript dt-parallel-subset.R 20 1e7
# datagen 10000000 rows
# setDTthreads(20)
# 0 row (first `[`` call overhead):
   user  system elapsed 
  0.033   0.000   0.007 
# 1 row:
   user  system elapsed 
      0       0       0 
# 2 rows:
   user  system elapsed 
  0.001   0.000   0.000 
# 5 rows:
   user  system elapsed 
      0       0       0 
# 10% of rows:
   user  system elapsed 
  0.440   0.039   0.103 

1th 1e8

> Rscript dt-parallel-subset.R 1 1e8
# datagen 100000000 rows
# setDTthreads(1)
# 0 row (first `[`` call overhead):
   user  system elapsed 
  0.006   0.000   0.005 
# 1 row:
   user  system elapsed 
  0.001   0.000   0.000 
# 2 rows:
   user  system elapsed 
      0       0       0 
# 5 rows:
   user  system elapsed 
  0.001   0.000   0.000 
# 10% of rows:
   user  system elapsed 
  2.393   0.132   2.524 

20th 1e8

> Rscript dt-parallel-subset.R 20 1e8
# datagen 100000000 rows
# setDTthreads(20)
# 0 row (first `[`` call overhead):
   user  system elapsed 
  0.054   0.004   0.010 
# 1 row:
   user  system elapsed 
  0.000   0.000   0.001 
# 2 rows:
   user  system elapsed 
  0.001   0.000   0.000 
# 5 rows:
   user  system elapsed 
  0.000   0.000   0.001 
# 10% of rows:
   user  system elapsed 
  4.218   0.284   1.265 

1th 1e9

> Rscript dt-parallel-subset.R 1 1e9
# datagen 1000000000 rows
# setDTthreads(1)
# 0 row (first `[`` call overhead):
   user  system elapsed
  0.005   0.000   0.006
# 1 row:
   user  system elapsed
  0.001   0.000   0.000
# 2 rows:
   user  system elapsed 
  0.000   0.000   0.001 
# 5 rows:
   user  system elapsed 
      0       0       0 
# 10% of rows:
   user  system elapsed 
 33.478   1.460  34.938 

20th 1e9

> Rscript dt-parallel-subset.R 20 1e9
# datagen 1000000000 rows
# setDTthreads(20)
# 0 row (first `[`` call overhead):
   user  system elapsed
  0.057   0.000   0.009
# 1 row:
   user  system elapsed
  0.001   0.000   0.001
# 2 rows:
   user  system elapsed
      0       0       0
# 5 rows:
   user  system elapsed
      0       0       0
# 10% of rows:
   user  system elapsed 
 58.295   2.454  20.285 

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Jan 24, 2019

During the timings above I observed that team of threads was started even for 1, 2, 5 rows. Still it did not result in noticeable overhead. All subsets of 1, 2, 5 rows were 0.000-0.001.

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Jan 29, 2019

Above checks were using single subset operation. I encounter some noticeable difference when I loop over subset operation.

library(data.table)
m = matrix(1L, nrow=1e8, ncol=10)
DT = as.data.table(m)
setDTthreads(20)
system.time(for (i in 1:1000) DT[i,])
#   user  system elapsed 
#  4.210   0.000   0.229 
setDTthreads(1)
system.time(for (i in 1:1000) DT[i,])
#   user  system elapsed 
#  0.107   0.007   0.114 

@mattdowle does it quality for reopen?

@mattdowle
Copy link
Member

@mattdowle mattdowle commented Jun 18, 2020

PR #4484 closes this one.

v1.12.8 to confirm Jan's result:

> m = matrix(1L, nrow=1e8, ncol=10)
> DT = as.data.table(m)
> setDTthreads(0)
> system.time(for (i in 1:1000) DT[i,])
   user  system elapsed 
  1.512   0.000   0.143
> setDTthreads(1)
> system.time(for (i in 1:1000) DT[i,])
   user  system elapsed 
  0.083   0.000   0.083 

With #4484 :

> setDTthreads(0)
> system.time(for (i in 1:1000) DT[i,])
   user  system elapsed 
  0.071   0.000   0.071 
> setDTthreads(1)
> system.time(for (i in 1:1000) DT[i,])
   user  system elapsed 
  0.072   0.000   0.072 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants