You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During my last dive in sparse matrix code, I came across multiple uses of .push_back(...) when building some vectors, e.g. around RcppArmadilloAs.h:178. It is used notably in the code converting dgt matrices to default dgc type (may be somewhere else too, I didn't check). Such vector building method can require big number of memory operations that are known to be relatively slow. Yet there is already native armadillo code for batch building dgc matrices from triplets ijv. My test shows that the native method is much faster on big matrices, e.g. on 3000x4000 randomly sparse matrix with about 8000 non zero entries, actual code converts such matrix in 4.5 ms while native batch construction takes only 0.9 ms on my Intel Xeon E5-2609 v2 @ 2.50GHz.
I think we could rely on native batch construction not only for the speed but also simply for avoiding code duplication.
Here is my test code:
// [[Rcpp::depends(RcppArmadillo)]]
#include<RcppArmadillo.h>usingnamespacearma;// [[Rcpp::export]]
arma::sp_mat asSpMat(SEXP S) {
return Rcpp::as<arma::sp_mat>(S);
}
// [[Rcpp::export]]
arma::sp_mat dgt2spmatNative(Rcpp::S4 mat) {
if (Rcpp::as<std::string>(mat.slot("class")) != "dgTMatrix")
Rcpp::stop("matrix must be of dgTMatrix type");
urowvec ti = mat.slot("i");
urowvec tj = mat.slot("j");
vec tx = mat.slot("x");
Rcpp::IntegerVector dims = mat.slot("Dim");
int nrow = dims[0];
int ncol = dims[1];
umat loc=join_cols(ti, tj);
//sp_mat(add_values, locations, values, n_rows, n_cols, sort_locations = true, check_for_zeros = true)returnsp_mat(true, loc, tx, nrow, ncol, true, false);
}
/*** Rlibrary(Matrix)library(Rcpp)library(microbenchmark)nrow=3000ncol=4000nnz=2*ncoli=sample(1:nrow, nnz, replace=TRUE)j=sample(1:ncol, nnz, replace=TRUE)set.seed(7)x=rnorm(nnz)x[1]=0. # see if it's kept in conversionsdgt=sparseMatrix(i = i, j = j, x=x, dims=c(nrow, ncol), giveCsparse=FALSE)(test=microbenchmark(dgc <- as(dgt, "dgCMatrix"), adgc <- asSpMat(dgt), ndgc <- dgt2spmatNative(dgt)))stopifnot(range(dgc-adgc)==c(0,0))stopifnot(range(dgc-ndgc)==c(0,0))*/
If you decide to go in this direction, I can prepare a patch request.
The text was updated successfully, but these errors were encountered:
Thanks for pointing that out. When I wrote the conversion dgt2dgc, the push_backs are used to guarantee some corner cases, for example, duplicate indices of i and j. My bad that I was not aware that armadillo can solve the problem itself. I test your solution with several corner cases, and it works for them. So PR is welcomed. I'll check other types like dtt or dst soon. It seems that they can also be converted by native armadillo code.
During my last dive in sparse matrix code, I came across multiple uses of
.push_back(...)
when building some vectors, e.g. around RcppArmadilloAs.h:178. It is used notably in the code converting dgt matrices to default dgc type (may be somewhere else too, I didn't check). Such vector building method can require big number of memory operations that are known to be relatively slow. Yet there is already native armadillo code for batch building dgc matrices from triplets ijv. My test shows that the native method is much faster on big matrices, e.g. on 3000x4000 randomly sparse matrix with about 8000 non zero entries, actual code converts such matrix in 4.5 ms while native batch construction takes only 0.9 ms on my Intel Xeon E5-2609 v2 @ 2.50GHz.I think we could rely on native batch construction not only for the speed but also simply for avoiding code duplication.
Here is my test code:
If you decide to go in this direction, I can prepare a patch request.
The text was updated successfully, but these errors were encountered: