[8.100] Thread-safety of the read access to elements of sparse matrix #179

dselivanov · 2017-10-12T07:31:37Z

I have this chunk of code where I read elements of arma::sp_mat sparse matrix from many threads. With Armadillo 7.* series it worked fine, with latest 8.100 it crashes with some weird traceback. Any thoughts?

*** longjmp causes uninitialized stack frame ***: /usr/lib64/R/bin/exec/R terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fb6a6844d87]
/lib64/libc.so.6(+0x10fcad)[0x7fb6a6844cad]
/lib64/libc.so.6(__longjmp_chk+0x29)[0x7fb6a6844c09]
/usr/lib64/R/lib/libR.so(+0x2111c6)[0x7fb6a961b1c6]
/lib64/libc.so.6(+0x35270)[0x7fb6a676a270]
/lib64/libpthread.so.0(+0xe42b)[0x7fb6a6b0642b]
/lib64/libpthread.so.0(+0x9dcb)[0x7fb6a6b01dcb]
/lib64/libpthread.so.0(pthread_mutex_lock+0x68)[0x7fb6a6b01c98]
/usr/lib64/libjemalloc.so.1(+0x267c2)[0x7fb6a9a587c2]
/usr/lib64/libjemalloc.so.1(free+0x350)[0x7fb6a9a36640]
/home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(_ZNSt8_Rb_treeIySt4pairIKydESt10_Select1stIS2_ESt4lessIyESaIS2_EE8_M_eraseEPSt13_Rb_tree_nodeIS2_E+0x13a)[0x7fb68b7242aa]
/home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(_ZN4arma6MapMatIdEaSERKNS_5SpMatIdEE+0x10f)[0x7fb68b73afdf]
/home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(+0x306fb)[0x7fb68b7396fb]
/home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(_Z13dotprod_top_kRKN4arma3MatIdEES3_jjRN4Rcpp8NullableIKNS_5SpMatIdEEEE+0x4f3)[0x7fb68b73a5d3]
/home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(_reco_dotprod_top_k+0x2b5)[0x7fb68b71e095]
/usr/lib64/R/lib/libR.so(+0xdcedd)[0x7fb6a94e6edd]
/usr/lib64/R/lib/libR.so(+0x1106bd)[0x7fb6a951a6bd]
/usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fb6a952a218]
/usr/lib64/R/lib/libR.so(+0x12228f)[0x7fb6a952c28f]
/usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fb6a952a3d4]
/usr/lib64/R/lib/libR.so(+0x12409e)[0x7fb6a952e09e]
/usr/lib64/R/lib/libR.so(Rf_eval+0x585)[0x7fb6a952a605]
/usr/lib64/R/lib/libR.so(Rf_ReplIteration+0x232)[0x7fb6a9553e42]
/usr/lib64/R/lib/libR.so(+0x14a221)[0x7fb6a9554221]
/usr/lib64/R/lib/libR.so(run_Rmainloop+0x4f)[0x7fb6a95542df]
/usr/lib64/R/bin/exec/R(main+0x1b)[0x40080b]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb6a6756c05]

The text was updated successfully, but these errors were encountered:

dselivanov · 2017-10-12T13:26:48Z

Thanks, tested, problem gone.

eddelbuettel · 2017-10-12T14:03:01Z

Good report, and helpful and very timely answer.

Any idea how we could document the need for const better on our side? Time to start a FAQ vignette for RcppArmadillo just as we do for Rcpp?

coatless · 2017-10-12T14:12:55Z

'Tis always a good time to start an FAQ ;-)

eddelbuettel · 2017-10-12T14:16:38Z

Point 1: Don't buy a mac.

Point 2: Seriously, just don't.

Point 3: Reference to

dselivanov · 2017-11-08T06:57:25Z

Unfortunately I'm going to reopen this ticket because of 2 problems for random access to sparse matrix elements (not sure what happened since last time, but I remember everything worked fine!):

Significant performance degradation for single thread (0.7.960 -> 0.8.100) - 10x and more
Crash of R session when multiple threads try to have read access (0.8.100)

Minimal reproducible example on gist and here:

#include <RcppArmadillo.h>
#include <queue>
#include <iostream>
#include <vector>
#ifdef _OPENMP
#include <omp.h>
#endif

#define GRAIN_SIZE 10

using namespace Rcpp;
using namespace RcppArmadillo;
using namespace arma;

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
double test_spmat(const arma::sp_mat &x, IntegerVector I, IntegerVector J, int n_threads) {
  int *i_ptr = I.begin();
  int *j_ptr = J.begin();

  double sum = 0;
  #ifdef _OPENMP
  #pragma omp parallel for num_threads(n_threads) schedule(dynamic, GRAIN_SIZE) reduction(+:sum)
  #endif
  for(int k = 0; k < J.size(); k++) {
    //adjust to 0-based indexes
    int i = i_ptr[k] - 1;
    int j = j_ptr[k] - 1;
    sum += x(i, j);
  }
  return(sum);
}

library(Rcpp)
library(Matrix)

n = 100000
m = 10000
nnz = 0.001 * n * m
set.seed(1)

x = sparseMatrix(i = sample(n, nnz, T), j = sample(m, nnz, T), x = 1, dims = c(n, m))
i = sample(n, nnz * 10, T)
j = sample(m, nnz * 10, T) 

install.packages("~/Downloads/RcppArmadillo_0.7.960.1.2.tar.gz", repos = NULL, type = "source")
sourceCpp("~/Downloads/tst-arma.cpp", rebuild = T)

system.time(temp <- test_spmat(x, i, j, 1))
# user  system elapsed 
# 0.568   0.003   0.572
temp
# 9830
system.time(temp <- test_spmat(x, i, j, 4))
# user  system elapsed 
# 0.636   0.004   0.164 
temp
# 9830

install.packages("~/Downloads/RcppArmadillo_0.8.100.1.0.tar.gz", repos = NULL, type = "source")
sourceCpp("~/Downloads/tst-arma.cpp", rebuild = T)

system.time(temp <- test_spmat(x, i, j, 1))
# user  system elapsed 
# 6.199   0.037   6.253 
temp
# 9830

# this one crash R session
system.time(temp <- test_spmat(x, i, j, 4))

eddelbuettel · 2017-11-08T12:18:54Z

Can you force a deep copy via clone() or an alike into a new non-R allocated variable, and then try again?

That would essentially be the lesson from RcppParallel -- any R-owned memory object may get gc()-ed.

dselivanov · 2017-11-08T13:02:47Z

Same crash with const arma::sp_mat x2 = arma::sp_mat(x). And this doesn't explain slow-down... so my guess is that this is not related to R's gc(). I have feeling that something goes wrong in Armadillo/RcppArmadillo.

PS I realize that in general element-by-element access to sparse matrix should be avoided, but in my case according to benchmark it wasn't bottleneck (initially I've panned to convert in to hash map of triplets).

eddelbuettel · 2017-11-08T13:23:04Z

I can't help you here. There is nothing as far as I can see that the package does to get in the way.

"If it doesn't work, it doesn't work." Use an older (Rcpp)Armadillo or do something. Multithreading and R require a lot of care.

I suggest we close this, and I would propose you work out if a plain C++ example (no R) also crashes. In which case you need to talk Conrad.

dselivanov · 2017-11-08T15:52:04Z

I will try to narrow down the problem and create pure c++ example. 8 нояб. 2017 г. 17:23 пользователь "Dirk Eddelbuettel" < notifications@github.com> написал:

…

I can't help you here. There is nothing as far as I can see that the package does to get in the way. "If it doesn't work, it doesn't work." Use an older (Rcpp)Armadillo or do something. Multithreading and R require a lot of care. I suggest we close this, and I would propose you work out if a plain C++ example (no R) also crashes. In which case you need to talk Conrad. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#179 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE4u3SiTmiW4HrCOwBbM8MLG8xYO5FgRks5s0as4gaJpZM4P2lQ6> .

eddelbuettel · 2017-11-08T15:54:07Z

I was also thinking that ... maybe the fact that you use threading, and that Conrad switched to more OpenMP use can get into each others way?

dselivanov · 2017-11-08T16:08:01Z

In a package I have PKG_CXXFLAGS = -DARMA_DONT_USE_OPENMP 8 нояб. 2017 г. 19:54 пользователь "Dirk Eddelbuettel" < notifications@github.com> написал:

…

I was also thinking that ... maybe the fact that you use threading, and that Conrad switched to more OpenMP use can get into each others way? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#179 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE4u3R79YwyAr-zajk5dGJFNfhg7smmTks5s0c6ggaJpZM4P2lQ6> .

kevinushey · 2017-11-08T17:52:18Z

FWIW it all runs fine on my macOS machine, with clang-5.0 + sanitizers. I do have OpenMP disabled though, and I'm not using or linking to the R-LLVM toolchain.

> system.time(temp <- test_spmat(x, i, j, 4))
   user  system elapsed 
  2.043   0.017   2.072

> sessionInfo() R Under development (unstable) (2017-10-12 r73548) Platform: x86_64-apple-darwin17.0.0 (64-bit) Running under: macOS High Sierra 10.13.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /usr/local/Cellar/lapack/3.7.1/lib/liblapack.3.7.1.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] Matrix_1.2-11 Rcpp_0.12.13.2 testthat_1.0.2.9000 rmarkdown_1.6 knitr_1.17 roxygen2_6.0.1 devtools_1.13.3

loaded via a namespace (and not attached):
[1] xml2_1.1.1 magrittr_1.5 lattice_0.20-35 R6_2.2.2 RcppArmadillo_0.8.200.1.1
[6] rlang_0.1.2 stringr_1.2.0 tools_3.5.0 grid_3.5.0 withr_2.0.0
[11] htmltools_0.3.6 commonmark_1.4 yaml_2.1.14 rprojroot_1.2 digest_0.6.12
[16] crayon_1.3.4 memoise_1.1.0 evaluate_0.10.1 stringi_1.1.5 compiler_3.5.0
[21] backports_1.1.1

dselivanov · 2017-11-08T19:46:43Z

I think that problem is in Armadillo 0.8.100 because code works fine with latest 0.8.200.
Here is pure c++ code which works fine with arma 8.200 and produce segmentation fault with 0.8.100

#include "include/armadillo"
#include <stdio.h>
#define GRAIN_SIZE 10

int main() {
  int n_threads = 4;
  int n = 1000000;
  int m = 10000;
  double nnz_prop = 0.001;
  int nnz = n * m * nnz_prop;
  const arma::sp_mat x = arma::sprandu(n, m, nnz_prop);
  arma::ivec i = arma::randi<arma::ivec>(nnz, arma::distr_param(0, n - 1));
  arma::ivec j = arma::randi<arma::ivec>(nnz, arma::distr_param(0, m - 1));

  double sum = 0;
  #pragma omp parallel for num_threads(n_threads) schedule(dynamic, GRAIN_SIZE) reduction(+:sum)
  for(int k = 0; k < nnz; k++) {
    sum += x.at(i[k], j[k]);
  }
  printf("%f\n", sum);
  return(0);
}

But it is still ~ 5-10x slower than 0.7.960

FYI @conradsnicta

eddelbuettel · 2017-11-08T20:54:55Z

That can happen. I'll get to 0.8.200.* when I have a moment.

eddelbuettel · 2017-11-09T01:27:28Z

@dselivanov I just pushed a new branch with 0.8.200.2.0 -- untested as of now -- but with small changes. I may have time to put it through the test harness tomorrow and then merge to master.

Feel free to experiment in the interim.

dselivanov · 2017-11-09T05:54:16Z

@eddelbuettel thank you, I can confirm that code works fine with 0.8.200.2.0 branch
So I think issue related to RcppArmadillo can be closed.

@conradsnicta I checked on 2 different code chunks:

From last pure C++ example:

On my system (OS X) and single thread:
g++-7:

# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
real	0m7.398s
user	0m7.276s
sys	0m0.114s

# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
real	0m7.206s
user	0m7.087s
sys	0m0.114s

clang 4:

# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
real	0m9.685s
user	0m9.383s
sys	0m0.295s

# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
real	0m5.773s
user	0m5.647s
sys	0m0.118s

From my initial example

clang 4:

# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
1 CORE
user  system elapsed 
5.838   0.050   5.892 

4 CORE
user  system elapsed 
6.223   0.052   5.573

# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
1 CORE
user  system elapsed 
0.677   0.004   0.682 

4 CORE
user  system elapsed 
1.842   0.006   0.466

@conradsnicta I'm not sure why difference is so huge in second case. I can provide sparse matrix in market matrix triplet format and subsetting indices if it can help. How can I help in investigation?

conradsnicta · 2017-11-09T14:33:05Z

@dselivanov - I don't know what would be causing this. It works properly under gcc, so I suspect it's an issue with the openmp implementation in clang and/or macOS. Apple has been a bit iffy about providing openmp as part of the standard compiler on macOS, which would also suggest that either openmp in clang and/or its interaction with macOS is problematic.

dselivanov · 2017-11-09T14:40:03Z

But for certain cases single thread performance also suffer. I will prepare small self contained example and test on my Ubuntu machine. 9 нояб. 2017 г. 18:33 пользователь "Conrad Sanderson" < notifications@github.com> написал:

…

@dselivanov <https://github.com/dselivanov> - I don't know what would be causing this. It works properly under gcc, so I suspect it's an issue with the openmp implementation in clang and/or macOS. Apple has been a bit iffy about providing openmp as part of the standard compiler on macOS, which would also suggest that either openmp in clang and/or its interaction with macOS is problematic. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#179 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE4u3cF6zmQ82ccIT-d8gAFbaMwU2yapks5s0w0igaJpZM4P2lQ6> .

eddelbuettel · 2017-11-09T15:01:37Z

The 0.8.200.2.0 release candidate looks good otherwise and I will merge that into master later, and probably prepare a drat release too.

eddelbuettel · 2017-11-10T16:16:14Z

FWIW the 0.8.200.2.0 tarball is now in the drat repo of the RcppCore organization so you can install it via your preferred R package tool just like prior pre-releases.

dselivanov mentioned this issue Oct 12, 2017

Thread-safe access to sparse matrix elements with arma 8.100+ dselivanov/rsparse#9

Closed

dselivanov closed this as completed Oct 12, 2017

dselivanov reopened this Nov 8, 2017

dselivanov closed this as completed Nov 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.100] Thread-safety of the read access to elements of sparse matrix #179

[8.100] Thread-safety of the read access to elements of sparse matrix #179

dselivanov commented Oct 12, 2017

dselivanov commented Oct 12, 2017

eddelbuettel commented Oct 12, 2017

coatless commented Oct 12, 2017

eddelbuettel commented Oct 12, 2017

dselivanov commented Nov 8, 2017

eddelbuettel commented Nov 8, 2017

dselivanov commented Nov 8, 2017 •

edited

eddelbuettel commented Nov 8, 2017

dselivanov commented Nov 8, 2017 via email

eddelbuettel commented Nov 8, 2017

dselivanov commented Nov 8, 2017 via email

kevinushey commented Nov 8, 2017 •

edited

dselivanov commented Nov 8, 2017

eddelbuettel commented Nov 8, 2017

eddelbuettel commented Nov 9, 2017

dselivanov commented Nov 9, 2017

conradsnicta commented Nov 9, 2017

dselivanov commented Nov 9, 2017 via email

eddelbuettel commented Nov 9, 2017

eddelbuettel commented Nov 10, 2017

[8.100] Thread-safety of the read access to elements of sparse matrix #179

[8.100] Thread-safety of the read access to elements of sparse matrix #179

Comments

dselivanov commented Oct 12, 2017

dselivanov commented Oct 12, 2017

eddelbuettel commented Oct 12, 2017

coatless commented Oct 12, 2017

eddelbuettel commented Oct 12, 2017

dselivanov commented Nov 8, 2017

eddelbuettel commented Nov 8, 2017

dselivanov commented Nov 8, 2017 • edited

eddelbuettel commented Nov 8, 2017

dselivanov commented Nov 8, 2017 via email

eddelbuettel commented Nov 8, 2017

dselivanov commented Nov 8, 2017 via email

kevinushey commented Nov 8, 2017 • edited

dselivanov commented Nov 8, 2017

eddelbuettel commented Nov 8, 2017

eddelbuettel commented Nov 9, 2017

dselivanov commented Nov 9, 2017

From last pure C++ example:

From my initial example

conradsnicta commented Nov 9, 2017

dselivanov commented Nov 9, 2017 via email

eddelbuettel commented Nov 9, 2017

eddelbuettel commented Nov 10, 2017

dselivanov commented Nov 8, 2017 •

edited

kevinushey commented Nov 8, 2017 •

edited