New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[8.100] Thread-safety of the read access to elements of sparse matrix #179
Comments
Thanks, tested, problem gone. |
Good report, and helpful and very timely answer. Any idea how we could document the need for |
'Tis always a good time to start an FAQ ;-) |
Unfortunately I'm going to reopen this ticket because of 2 problems for random access to sparse matrix elements (not sure what happened since last time, but I remember everything worked fine!):
Minimal reproducible example on gist and here: #include <RcppArmadillo.h>
#include <queue>
#include <iostream>
#include <vector>
#ifdef _OPENMP
#include <omp.h>
#endif
#define GRAIN_SIZE 10
using namespace Rcpp;
using namespace RcppArmadillo;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double test_spmat(const arma::sp_mat &x, IntegerVector I, IntegerVector J, int n_threads) {
int *i_ptr = I.begin();
int *j_ptr = J.begin();
double sum = 0;
#ifdef _OPENMP
#pragma omp parallel for num_threads(n_threads) schedule(dynamic, GRAIN_SIZE) reduction(+:sum)
#endif
for(int k = 0; k < J.size(); k++) {
//adjust to 0-based indexes
int i = i_ptr[k] - 1;
int j = j_ptr[k] - 1;
sum += x(i, j);
}
return(sum);
} library(Rcpp)
library(Matrix)
n = 100000
m = 10000
nnz = 0.001 * n * m
set.seed(1)
x = sparseMatrix(i = sample(n, nnz, T), j = sample(m, nnz, T), x = 1, dims = c(n, m))
i = sample(n, nnz * 10, T)
j = sample(m, nnz * 10, T)
install.packages("~/Downloads/RcppArmadillo_0.7.960.1.2.tar.gz", repos = NULL, type = "source")
sourceCpp("~/Downloads/tst-arma.cpp", rebuild = T)
system.time(temp <- test_spmat(x, i, j, 1))
# user system elapsed
# 0.568 0.003 0.572
temp
# 9830
system.time(temp <- test_spmat(x, i, j, 4))
# user system elapsed
# 0.636 0.004 0.164
temp
# 9830
install.packages("~/Downloads/RcppArmadillo_0.8.100.1.0.tar.gz", repos = NULL, type = "source")
sourceCpp("~/Downloads/tst-arma.cpp", rebuild = T)
system.time(temp <- test_spmat(x, i, j, 1))
# user system elapsed
# 6.199 0.037 6.253
temp
# 9830
# this one crash R session
system.time(temp <- test_spmat(x, i, j, 4)) |
Can you force a deep copy via That would essentially be the lesson from |
Same crash with PS I realize that in general element-by-element access to sparse matrix should be avoided, but in my case according to benchmark it wasn't bottleneck (initially I've panned to convert in to hash map of triplets). |
I can't help you here. There is nothing as far as I can see that the package does to get in the way. "If it doesn't work, it doesn't work." Use an older (Rcpp)Armadillo or do something. Multithreading and R require a lot of care. I suggest we close this, and I would propose you work out if a plain C++ example (no R) also crashes. In which case you need to talk Conrad. |
I will try to narrow down the problem and create pure c++ example.
8 нояб. 2017 г. 17:23 пользователь "Dirk Eddelbuettel" <
notifications@github.com> написал:
… I can't help you here. There is nothing as far as I can see that the
package does to get in the way.
"If it doesn't work, it doesn't work." Use an older (Rcpp)Armadillo or do
something. Multithreading and R require a lot of care.
I suggest we close this, and I would propose you work out if a plain C++
example (no R) also crashes. In which case you need to talk Conrad.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE4u3SiTmiW4HrCOwBbM8MLG8xYO5FgRks5s0as4gaJpZM4P2lQ6>
.
|
I was also thinking that ... maybe the fact that you use threading, and that Conrad switched to more OpenMP use can get into each others way? |
In a package I have PKG_CXXFLAGS = -DARMA_DONT_USE_OPENMP
8 нояб. 2017 г. 19:54 пользователь "Dirk Eddelbuettel" <
notifications@github.com> написал:
… I was also thinking that ... maybe the fact that you use threading, and
that Conrad switched to more OpenMP use can get into each others way?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE4u3R79YwyAr-zajk5dGJFNfhg7smmTks5s0c6ggaJpZM4P2lQ6>
.
|
FWIW it all runs fine on my macOS machine, with clang-5.0 + sanitizers. I do have OpenMP disabled though, and I'm not using or linking to the R-LLVM toolchain.
> sessionInfo()
R Under development (unstable) (2017-10-12 r73548)
Platform: x86_64-apple-darwin17.0.0 (64-bit)
Running under: macOS High Sierra 10.13.1
Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): |
I think that problem is in Armadillo 0.8.100 because code works fine with latest 0.8.200. #include "include/armadillo"
#include <stdio.h>
#define GRAIN_SIZE 10
int main() {
int n_threads = 4;
int n = 1000000;
int m = 10000;
double nnz_prop = 0.001;
int nnz = n * m * nnz_prop;
const arma::sp_mat x = arma::sprandu(n, m, nnz_prop);
arma::ivec i = arma::randi<arma::ivec>(nnz, arma::distr_param(0, n - 1));
arma::ivec j = arma::randi<arma::ivec>(nnz, arma::distr_param(0, m - 1));
double sum = 0;
#pragma omp parallel for num_threads(n_threads) schedule(dynamic, GRAIN_SIZE) reduction(+:sum)
for(int k = 0; k < nnz; k++) {
sum += x.at(i[k], j[k]);
}
printf("%f\n", sum);
return(0);
} But it is still ~ 5-10x slower than 0.7.960 FYI @conradsnicta |
That can happen. I'll get to 0.8.200.* when I have a moment. |
@dselivanov I just pushed a new branch with 0.8.200.2.0 -- untested as of now -- but with small changes. I may have time to put it through the test harness tomorrow and then merge to master. Feel free to experiment in the interim. |
@eddelbuettel thank you, I can confirm that code works fine with 0.8.200.2.0 branch @conradsnicta I checked on 2 different code chunks: From last pure C++ example:On my system (OS X) and single thread:
clang 4:
From my initial exampleclang 4:
@conradsnicta I'm not sure why difference is so huge in second case. I can provide sparse matrix in market matrix triplet format and subsetting indices if it can help. How can I help in investigation? |
@dselivanov - I don't know what would be causing this. It works properly under gcc, so I suspect it's an issue with the openmp implementation in clang and/or macOS. Apple has been a bit iffy about providing openmp as part of the standard compiler on macOS, which would also suggest that either openmp in clang and/or its interaction with macOS is problematic. |
But for certain cases single thread performance also suffer. I will prepare
small self contained example and test on my Ubuntu machine.
9 нояб. 2017 г. 18:33 пользователь "Conrad Sanderson" <
notifications@github.com> написал:
… @dselivanov <https://github.com/dselivanov> - I don't know what would be
causing this. It works properly under gcc, so I suspect it's an issue with
the openmp implementation in clang and/or macOS. Apple has been a bit iffy
about providing openmp as part of the standard compiler on macOS, which
would also suggest that either openmp in clang and/or its interaction with
macOS is problematic.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE4u3cF6zmQ82ccIT-d8gAFbaMwU2yapks5s0w0igaJpZM4P2lQ6>
.
|
The 0.8.200.2.0 release candidate looks good otherwise and I will merge that into master later, and probably prepare a drat release too. |
FWIW the 0.8.200.2.0 tarball is now in the |
I have this chunk of code where I read elements of
arma::sp_mat
sparse matrix from many threads. With Armadillo 7.* series it worked fine, with latest 8.100 it crashes with some weird traceback. Any thoughts?The text was updated successfully, but these errors were encountered: