Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the problem in slot "p" in "dgCMatrix". close #149 #150

Merged
merged 1 commit into from
Aug 2, 2017

Conversation

thirdwing
Copy link
Member

Please don't merge this. I need to double check the definition of slot "p" and "col_ptrs".

Before this PR:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
//
using namespace Rcpp;
using namespace arma;

//[[Rcpp::export]]
sp_mat test() {
  return speye(4, 5);
}

/*
> Rcpp::sourceCpp("test.cpp")
> library(Matrix)
> test()
4 x 5 sparse Matrix of class "dgCMatrix"
Error in validObject(x) : 
  invalid class “dgCMatrix” object: slot p must be non-decreasing
*/

Now:

> Rcpp::sourceCpp("test.cpp")
> library(Matrix)
> test()
4 x 5 sparse Matrix of class "dgCMatrix"
              
[1,] 1 . . . .
[2,] . 1 . . .
[3,] . . 1 . .
[4,] . . . 1 .

@eddelbuettel
Copy link
Member

So ... from the example and tests it looks like you nailed this on the head? Read to merge now?

@dmbates
Copy link
Collaborator

dmbates commented Jul 24, 2017

I think it is peculiar that Armadillo allows for the column pointers to be decreasing in this case. You may want to check with the author if that is intentional.

It is quite common to have code that checks the last element of the column pointers (position n + 1 in R and index n in C++) to determine the number of non-zeros in the sparse matrix.

@thirdwing thirdwing force-pushed the iss149 branch 2 times, most recently from d8b4169 to 3a4b69a Compare July 24, 2017 18:08
@thirdwing
Copy link
Member Author

thirdwing commented Jul 24, 2017

There is some slight difference in column pointers between Armadillo and Matrix pkg in R.

In a matrix below:

3 x 5 sparse Matrix of class "dgCMatrix"
              
[1,] 1 . . . .
[2,] . 1 . . .
[3,] . . 1 . .

The p slot is 0 1 2 3 3 3 and the col_ptrs in Armadillo is 0 1 2 3 0 0.

I think this can be intentional since we only have three nonzero elements.

I will double check with the author.

@eddelbuettel
Copy link
Member

Any news on checking with Conrad or other experts on sparse matrices?

@thirdwing
Copy link
Member Author

The sparse matrix in armadillo is by Ryan (our MLPACK hero) and he is on travel.

@eddelbuettel
Copy link
Member

@thirdwing Where are we on this and the off-list reply from Ryan? What's next to move this forward?

@thirdwing
Copy link
Member Author

I am afraid we didn't get any reply from Ryan.

@eddelbuettel
Copy link
Member

So what is next? Is it conceivable that Armadillo and Matrix have different ideas about corner cases of the DSC (or other) formats? Shall we document that / create test cases?

@dmbates
Copy link
Collaborator

dmbates commented Aug 1, 2017

Every description of the compressed sparse row/column format that I have read states that the non-zeros for row/column j are in positions p[j] to p[j+1]-1 of the pointer array. See Tim Davis's book or Wikipedia

Another way of saying this is that, in 0-based indices the pointer array is recursively defined as

p[0] = 0
p[j] = p[j-1] + number of non-zeros in column j-1

It is common to determine the number of non-zeros in each column as the successive differences in the column pointers, which is why p is usually defined to be non-decreasing. As I mentioned, this leads to the last element of the column pointer vector being the number of non-zeros in the matrix.

To me the solution is to convince Conrad to modify the Armadillo representation so that it is in keeping with other sparse matrix representations.

@eddelbuettel
Copy link
Member

eddelbuettel commented Aug 1, 2017

Thanks for chiming in @dmbates. I am trying to find a private email thread where Ryan described the scheme to (to KK, who forwarded to me). I think it said the same.

@rcurtin Do you mind if I paste your definition in here?

@rcurtin
Copy link

rcurtin commented Aug 2, 2017

Sure, no problem from my end.

@eddelbuettel
Copy link
Member

From an earlier email by @rcurtin :

An answer to your two questions:

For a matrix generated by "arma::speye(3, 5)",

(1) Is the length of the col_ptrs always n_cols + 1? Even there are
only three nonzero elements?

Yes, this is the 'compressed sparse column' format. Technically in this
implementation, the length is always n_cols + 2.

The column pointers hold the index of the first element in each column.

So, col_ptrs[0] is always 0 (note below), col_ptrs[1] is the index of
the first nonzero element in column 1, and so forth. col_ptrs[n_cols]
is always n_nonzero.

col_ptrs[n_cols + 1] is a "sentinel" value used to terminate loops, and
should always be numeric_limits::max().
> Note: since col_ptrs[0] is always 0 and col_ptrs[n_cols] is always
n_nonzero, there's some very minor optimization that could be done. But
I've never considered it to be worthwhile.

(2) If the length is n_cols + 1, the content of col_ptrs is 0 1 2 3 0
0 . The column pointers are padded by zeros since three are only three
nonzero elements.

I'm not sure I understand what you mean, are you saying that you run
speye() and the resulting matrix has column pointers [0 1 2 3 0 0]?
I'm traveling right now, so any response to this will unfortunately take
a couple of days---but I will get back to it. :)

and by now we already have some new code to test thanks to @conradsnicta.

@thirdwing
Copy link
Member Author

I think this has been fixed by conradsnicta/armadillo-code@763ed1a

@eddelbuettel
Copy link
Member

That would be awesome. I haven't had a chance to look into the diff but I may then apply this to the pending RcppArmadillo release based on 7.960.0 (as well as all of the changes by @binxiangni).

@thirdwing
Copy link
Member Author

I have applied that patch locally. @eddelbuettel

@eddelbuettel
Copy link
Member

You mean you added it to this (pending) PR? That's very helpful.

And I presume the unit test fails without it, but now passes?

@thirdwing
Copy link
Member Author

I add that patch in this PR and it just fixed the problem and passed new tests.

@eddelbuettel eddelbuettel merged commit 9307804 into RcppCore:master Aug 2, 2017
@thirdwing thirdwing deleted the iss149 branch August 2, 2017 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants