Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread devel slower in parallel on file with lots of char columns #2091

Closed
arunsrinivasan opened this issue Mar 30, 2017 · 7 comments
Closed

fread devel slower in parallel on file with lots of char columns #2091

arunsrinivasan opened this issue Mar 30, 2017 · 7 comments
Labels

Comments

@arunsrinivasan
Copy link
Member

@arunsrinivasan arunsrinivasan commented Mar 30, 2017

require(data.table) # commit 2768
# Loading required package: data.table
# data.table 1.10.5 IN DEVELOPMENT built 2017-03-30 10:20:36 UTC; travis
#   The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
#   Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
#   Release notes, videos and slides: http://r-datatable.com
set.seed(1L)
N <- 1e6L
R <- 5e5L
C <- 30L
str <- apply(matrix(sample(letters, R*C, TRUE), ncol=C), 1, paste, collapse="")
dt <- data.table(A = sample(str, N, TRUE))
dt[, `:=`(B = substr(A, 1L, 20L),
          C = substr(A, 5L, 15L),
          D = substr(A, 2L, 3L),
          E = substr(A, 20L, 20L),
          F = substr(A, 7L, 10L),
          G = substr(A, 2L, 22L),
          H = substr(A, 3L, 28L))]
print(object.size(dt), units="Mb")
# 215.9 Mb

fname <- "~/Downloads/tmp.csv"
system.time(fwrite(dt, fname, sep=",")) # 0.4s on 4 threads, works fine

#### TESTING FREAD
setDTthreads(4L)
system.time(ans1 <- fread(fname))
# Read 1000000 rows x 8 columns from 0.115GB file in 00:07.919 wall clock time (can be slowed down by any other open apps even if seemingly idle)
#    user  system elapsed 
#  24.359   0.579   7.925 
setDTthreads(2L)
system.time(ans2 <- fread(fname))
# Read 1000000 rows x 8 columns from 0.115GB file in 00:04.049 wall clock time (can be slowed down by any other open apps even if seemingly idle)
#    user  system elapsed 
#   7.581   0.164   4.065 
setDTthreads(1L)
system.time(ans3 <- fread(fname))
#    user  system elapsed 
#   3.003   0.126   3.195 

identical(ans1, ans2) # [1] TRUE
identical(ans1, ans3) # [1] TRUE

UPDATE: Rerunning with 4,2 and 1 thread(s) but each on a new session, timings are 14s, 9.2s and 7.4s.

@mattdowle
Copy link
Member

@mattdowle mattdowle commented Apr 1, 2017

Great test. The SET_STRING_ELT was being called inside a critical in the field processor. So the overhead of the critical section was coming into play. I raised it up now. So it's back to no-benefit of parallelization in this case where all the columns are character and they contain quite unique values. R's global character cache is single threaded.

The data looks like this :

> ans1
                                      A                    B           C  D E    F                     G                          H
      1: jptokakysooopwtmlkeimzbgpeinhy jptokakysooopwtmlkei kakysooopwt pt i kyso ptokakysooopwtmlkeimz tokakysooopwtmlkeimzbgpein
      2: bchguwmynjhecsxpxldyzlemavmwvz bchguwmynjhecsxpxldy uwmynjhecsx ch y mynj chguwmynjhecsxpxldyzl hguwmynjhecsxpxldyzlemavmw
      3: qbbudwlbdbzclzrbeimpkqttmexkzl qbbudwlbdbzclzrbeimp dwlbdbzclzr bb p lbdb bbudwlbdbzclzrbeimpkq budwlbdbzclzrbeimpkqttmexk
      4: zjcigiqbtlzuyletkrxwxfnbztfvzb zjcigiqbtlzuyletkrxw giqbtlzuyle jc w qbtl jcigiqbtlzuyletkrxwxf cigiqbtlzuyletkrxwxfnbztfv
      5: vsgohkcriagjumqrgdqhjkmidvaker vsgohkcriagjumqrgdqh hkcriagjumq sg h cria sgohkcriagjumqrgdqhjk gohkcriagjumqrgdqhjkmidvak
     ---                                                                                                                           
 999996: befdnqijswvcuvjrsphhmpebsqdwrx befdnqijswvcuvjrsphh nqijswvcuvj ef h ijsw efdnqijswvcuvjrsphhmp fdnqijswvcuvjrsphhmpebsqdw
 999997: oojlsntpsqpnciagractbmueyiscem oojlsntpsqpnciagract sntpsqpncia oj t tpsq ojlsntpsqpnciagractbm jlsntpsqpnciagractbmueyisc
 999998: evmvltzyiawiogbiqdtywldifogdbt evmvltzyiawiogbiqdty ltzyiawiogb vm y zyia vmvltzyiawiogbiqdtywl mvltzyiawiogbiqdtywldifogd
 999999: fqkjvuugtolthhagcyrxwzybtsixuc fqkjvuugtolthhagcyrx vuugtolthha qk x ugto qkjvuugtolthhagcyrxwz kjvuugtolthhagcyrxwzybtsix
1000000: uxehrqlmsrveremxfxiyyozzbiqlgy uxehrqlmsrveremxfxiy rqlmsrverem xe y lmsr xehrqlmsrveremxfxiyyo ehrqlmsrveremxfxiyyozzbiql

@arunsrinivasan
Copy link
Member Author

@arunsrinivasan arunsrinivasan commented Apr 1, 2017

Brilliant! Works fine now on Ubuntu 16.04, gcc v5.4.

But since my upgrade to MacOS 10.12.4, my brew installation of llvm results in package installation issues.

I get this warning on MacOS 10.12.4 (only with llvm clang) from fread.c:

fread.c:1391:31: warning: format specifies type 'void *' but the argument has type 'const char *' [-Wformat-pedantic]
                jump-1, jump, prevThreadEnd, STRLIM(prevThreadEnd,50), prevThreadEnd,
                              ^~~~~~~~~~~~~
/usr/include/secure/_stdio.h:57:62: note: expanded from macro 'snprintf'
  __builtin___snprintf_chk (str, len, 0, __darwin_obsz(str), __VA_ARGS__)
                                                             ^~~~~~~~~~~
1 warning generated.

and this error:

Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/data.table/libs/datatable.so':
  dlopen(/Library/Frameworks/R.framework/Versions/3.3/Resources/library/data.table/libs/datatable.so, 6): Symbol not found: ___kmpc_barrier
  Referenced from: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/data.table/libs/datatable.so
  Expected in: flat namespace
 in /Library/Frameworks/R.framework/Versions/3.3/Resources/library/data.table/libs/datatable.so
Error: loading failed

I get similar barrier error with gcc-6 as well.

Will try once I've managed to fix that (and update installation page under wiki) and write back.

@arunsrinivasan
Copy link
Member Author

@arunsrinivasan arunsrinivasan commented Apr 1, 2017

Fixed the error with installation and updated wiki. Here are the new timings:

setDTthreads(4L)
system.time(ans1 <- fread(fname))
#    user  system elapsed 
#   8.574   0.209   2.412 
setDTthreads(2L)
system.time(ans2 <- fread(fname))
#    user  system elapsed 
#   4.339   0.100   2.287 
setDTthreads(1L)
system.time(ans3 <- fread(fname))
#    user  system elapsed 
#   2.809   0.119   2.935 

identical(ans1, ans2) # [1] TRUE
identical(ans1, ans3) # [1] TRUE

The warning still exists with llvm-clang.

@arunsrinivasan
Copy link
Member Author

@arunsrinivasan arunsrinivasan commented Apr 1, 2017

Aside: it's interesting that the user time is different with gcc.. when run with multiple threads, even though the task should've run under a single thread. Also tested on MacOS 10.12.4.

setDTthreads(4L)
system.time(ans1 <- fread(fname))
#    user  system elapsed 
#   3.026   0.108   2.795 
setDTthreads(2L)
system.time(ans2 <- fread(fname))
#    user  system elapsed 
#   2.988   0.098   2.750 
setDTthreads(1L)
system.time(ans3 <- fread(fname))
#    user  system elapsed 
#   2.875   0.128   3.044 

identical(ans1, ans2) # [1] TRUE
identical(ans1, ans3) # [1] TRUE

@dselivanov
Copy link

@dselivanov dselivanov commented Nov 7, 2017

EDIT - apologize for disturbing - I think that the issue is in my setup.

I just google for Symbol not found: ___kmpc_barrier error and found this thread (mb worth to open separate issue).

@arunsrinivasan , @mattdowle - the issue is that according to wiki openmp flag -fopenmp is needed to set in compiler definition CC line. This looks very very strange. If flag is represented only at CFLAGS line (where it supposed to be) - compilation hangs with Symbol not found: ___kmpc_barrier error. I personally believe that there is something incorrect with such setting where -fopenmp is added to compiler definition line instead of standard CFLAGS.

@st-pasha
Copy link
Contributor

@st-pasha st-pasha commented Nov 7, 2017

@dselivanov -fopenmp flag has to be added both to the compiler, and to the linker. So the "right" way is to add -fopenmp to variables CFLAGS, LDFLAGS, (and perhaps to CXXFLAGS and CPPFLAGS as well).
However since R uses CC both as the compiler and the linker, adding a flag there works just as well (although it may be more confusing).

@dselivanov
Copy link

@dselivanov dselivanov commented Nov 7, 2017

@st-pasha indeed adding -fopenmp to LDFLAGS helped. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants