New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the problem in slot "p" in "dgCMatrix". close #149 #150

Merged
merged 1 commit into from Aug 2, 2017

Conversation

Projects
None yet
4 participants
@thirdwing
Member

thirdwing commented Jul 23, 2017

Please don't merge this. I need to double check the definition of slot "p" and "col_ptrs".

Before this PR:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
//
using namespace Rcpp;
using namespace arma;

//[[Rcpp::export]]
sp_mat test() {
  return speye(4, 5);
}

/*
> Rcpp::sourceCpp("test.cpp")
> library(Matrix)
> test()
4 x 5 sparse Matrix of class "dgCMatrix"
Error in validObject(x) : 
  invalid class “dgCMatrix” object: slot p must be non-decreasing
*/

Now:

> Rcpp::sourceCpp("test.cpp")
> library(Matrix)
> test()
4 x 5 sparse Matrix of class "dgCMatrix"
              
[1,] 1 . . . .
[2,] . 1 . . .
[3,] . . 1 . .
[4,] . . . 1 .
@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jul 24, 2017

Member

So ... from the example and tests it looks like you nailed this on the head? Read to merge now?

Member

eddelbuettel commented Jul 24, 2017

So ... from the example and tests it looks like you nailed this on the head? Read to merge now?

@dmbates

This comment has been minimized.

Show comment
Hide comment
@dmbates

dmbates Jul 24, 2017

Collaborator

I think it is peculiar that Armadillo allows for the column pointers to be decreasing in this case. You may want to check with the author if that is intentional.

It is quite common to have code that checks the last element of the column pointers (position n + 1 in R and index n in C++) to determine the number of non-zeros in the sparse matrix.

Collaborator

dmbates commented Jul 24, 2017

I think it is peculiar that Armadillo allows for the column pointers to be decreasing in this case. You may want to check with the author if that is intentional.

It is quite common to have code that checks the last element of the column pointers (position n + 1 in R and index n in C++) to determine the number of non-zeros in the sparse matrix.

@thirdwing

This comment has been minimized.

Show comment
Hide comment
@thirdwing

thirdwing Jul 24, 2017

Member

There is some slight difference in column pointers between Armadillo and Matrix pkg in R.

In a matrix below:

3 x 5 sparse Matrix of class "dgCMatrix"
              
[1,] 1 . . . .
[2,] . 1 . . .
[3,] . . 1 . .

The p slot is 0 1 2 3 3 3 and the col_ptrs in Armadillo is 0 1 2 3 0 0.

I think this can be intentional since we only have three nonzero elements.

I will double check with the author.

Member

thirdwing commented Jul 24, 2017

There is some slight difference in column pointers between Armadillo and Matrix pkg in R.

In a matrix below:

3 x 5 sparse Matrix of class "dgCMatrix"
              
[1,] 1 . . . .
[2,] . 1 . . .
[3,] . . 1 . .

The p slot is 0 1 2 3 3 3 and the col_ptrs in Armadillo is 0 1 2 3 0 0.

I think this can be intentional since we only have three nonzero elements.

I will double check with the author.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jul 29, 2017

Member

Any news on checking with Conrad or other experts on sparse matrices?

Member

eddelbuettel commented Jul 29, 2017

Any news on checking with Conrad or other experts on sparse matrices?

@thirdwing

This comment has been minimized.

Show comment
Hide comment
@thirdwing

thirdwing Jul 29, 2017

Member

The sparse matrix in armadillo is by Ryan (our MLPACK hero) and he is on travel.

Member

thirdwing commented Jul 29, 2017

The sparse matrix in armadillo is by Ryan (our MLPACK hero) and he is on travel.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 1, 2017

Member

@thirdwing Where are we on this and the off-list reply from Ryan? What's next to move this forward?

Member

eddelbuettel commented Aug 1, 2017

@thirdwing Where are we on this and the off-list reply from Ryan? What's next to move this forward?

@thirdwing

This comment has been minimized.

Show comment
Hide comment
@thirdwing

thirdwing Aug 1, 2017

Member

I am afraid we didn't get any reply from Ryan.

Member

thirdwing commented Aug 1, 2017

I am afraid we didn't get any reply from Ryan.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 1, 2017

Member

So what is next? Is it conceivable that Armadillo and Matrix have different ideas about corner cases of the DSC (or other) formats? Shall we document that / create test cases?

Member

eddelbuettel commented Aug 1, 2017

So what is next? Is it conceivable that Armadillo and Matrix have different ideas about corner cases of the DSC (or other) formats? Shall we document that / create test cases?

@dmbates

This comment has been minimized.

Show comment
Hide comment
@dmbates

dmbates Aug 1, 2017

Collaborator

Every description of the compressed sparse row/column format that I have read states that the non-zeros for row/column j are in positions p[j] to p[j+1]-1 of the pointer array. See Tim Davis's book or Wikipedia

Another way of saying this is that, in 0-based indices the pointer array is recursively defined as

p[0] = 0
p[j] = p[j-1] + number of non-zeros in column j-1

It is common to determine the number of non-zeros in each column as the successive differences in the column pointers, which is why p is usually defined to be non-decreasing. As I mentioned, this leads to the last element of the column pointer vector being the number of non-zeros in the matrix.

To me the solution is to convince Conrad to modify the Armadillo representation so that it is in keeping with other sparse matrix representations.

Collaborator

dmbates commented Aug 1, 2017

Every description of the compressed sparse row/column format that I have read states that the non-zeros for row/column j are in positions p[j] to p[j+1]-1 of the pointer array. See Tim Davis's book or Wikipedia

Another way of saying this is that, in 0-based indices the pointer array is recursively defined as

p[0] = 0
p[j] = p[j-1] + number of non-zeros in column j-1

It is common to determine the number of non-zeros in each column as the successive differences in the column pointers, which is why p is usually defined to be non-decreasing. As I mentioned, this leads to the last element of the column pointer vector being the number of non-zeros in the matrix.

To me the solution is to convince Conrad to modify the Armadillo representation so that it is in keeping with other sparse matrix representations.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 1, 2017

Member

Thanks for chiming in @dmbates. I am trying to find a private email thread where Ryan described the scheme to (to KK, who forwarded to me). I think it said the same.

@rcurtin Do you mind if I paste your definition in here?

Member

eddelbuettel commented Aug 1, 2017

Thanks for chiming in @dmbates. I am trying to find a private email thread where Ryan described the scheme to (to KK, who forwarded to me). I think it said the same.

@rcurtin Do you mind if I paste your definition in here?

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 2, 2017

Sure, no problem from my end.

rcurtin commented Aug 2, 2017

Sure, no problem from my end.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 2, 2017

Member

From an earlier email by @rcurtin :

An answer to your two questions:

For a matrix generated by "arma::speye(3, 5)",

(1) Is the length of the col_ptrs always n_cols + 1? Even there are
only three nonzero elements?

Yes, this is the 'compressed sparse column' format. Technically in this
implementation, the length is always n_cols + 2.

The column pointers hold the index of the first element in each column.

So, col_ptrs[0] is always 0 (note below), col_ptrs[1] is the index of
the first nonzero element in column 1, and so forth. col_ptrs[n_cols]
is always n_nonzero.

col_ptrs[n_cols + 1] is a "sentinel" value used to terminate loops, and
should always be numeric_limits::max().
> Note: since col_ptrs[0] is always 0 and col_ptrs[n_cols] is always
n_nonzero, there's some very minor optimization that could be done. But
I've never considered it to be worthwhile.

(2) If the length is n_cols + 1, the content of col_ptrs is 0 1 2 3 0
0 . The column pointers are padded by zeros since three are only three
nonzero elements.

I'm not sure I understand what you mean, are you saying that you run
speye() and the resulting matrix has column pointers [0 1 2 3 0 0]?
I'm traveling right now, so any response to this will unfortunately take
a couple of days---but I will get back to it. :)

and by now we already have some new code to test thanks to @conradsnicta.

Member

eddelbuettel commented Aug 2, 2017

From an earlier email by @rcurtin :

An answer to your two questions:

For a matrix generated by "arma::speye(3, 5)",

(1) Is the length of the col_ptrs always n_cols + 1? Even there are
only three nonzero elements?

Yes, this is the 'compressed sparse column' format. Technically in this
implementation, the length is always n_cols + 2.

The column pointers hold the index of the first element in each column.

So, col_ptrs[0] is always 0 (note below), col_ptrs[1] is the index of
the first nonzero element in column 1, and so forth. col_ptrs[n_cols]
is always n_nonzero.

col_ptrs[n_cols + 1] is a "sentinel" value used to terminate loops, and
should always be numeric_limits::max().
> Note: since col_ptrs[0] is always 0 and col_ptrs[n_cols] is always
n_nonzero, there's some very minor optimization that could be done. But
I've never considered it to be worthwhile.

(2) If the length is n_cols + 1, the content of col_ptrs is 0 1 2 3 0
0 . The column pointers are padded by zeros since three are only three
nonzero elements.

I'm not sure I understand what you mean, are you saying that you run
speye() and the resulting matrix has column pointers [0 1 2 3 0 0]?
I'm traveling right now, so any response to this will unfortunately take
a couple of days---but I will get back to it. :)

and by now we already have some new code to test thanks to @conradsnicta.

@thirdwing

This comment has been minimized.

Show comment
Hide comment
@thirdwing

thirdwing Aug 2, 2017

Member

I think this has been fixed by conradsnicta/armadillo-code@763ed1a

Member

thirdwing commented Aug 2, 2017

I think this has been fixed by conradsnicta/armadillo-code@763ed1a

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 2, 2017

Member

That would be awesome. I haven't had a chance to look into the diff but I may then apply this to the pending RcppArmadillo release based on 7.960.0 (as well as all of the changes by @binxiangni).

Member

eddelbuettel commented Aug 2, 2017

That would be awesome. I haven't had a chance to look into the diff but I may then apply this to the pending RcppArmadillo release based on 7.960.0 (as well as all of the changes by @binxiangni).

@thirdwing

This comment has been minimized.

Show comment
Hide comment
@thirdwing

thirdwing Aug 2, 2017

Member

I have applied that patch locally. @eddelbuettel

Member

thirdwing commented Aug 2, 2017

I have applied that patch locally. @eddelbuettel

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 2, 2017

Member

You mean you added it to this (pending) PR? That's very helpful.

And I presume the unit test fails without it, but now passes?

Member

eddelbuettel commented Aug 2, 2017

You mean you added it to this (pending) PR? That's very helpful.

And I presume the unit test fails without it, but now passes?

@thirdwing

This comment has been minimized.

Show comment
Hide comment
@thirdwing

thirdwing Aug 2, 2017

Member

I add that patch in this PR and it just fixed the problem and passed new tests.

Member

thirdwing commented Aug 2, 2017

I add that patch in this PR and it just fixed the problem and passed new tests.

@eddelbuettel eddelbuettel merged commit 9307804 into RcppCore:master Aug 2, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@thirdwing thirdwing deleted the thirdwing:iss149 branch Aug 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment