/ LAGraph Public

# Drop identity values problem#28

Closed
opened this issue Oct 15, 2019 · 25 comments
Closed

# Drop identity values problem #28

opened this issue Oct 15, 2019 · 25 comments

### simpletonDL commented Oct 15, 2019

 Hello, I don`t understand how to make GraphBlass not write implicit zeroes (identity values). I found in the documentation the following: The entries in the pattern of A can take on any value, including the implicit value, whatever it happens to be. This differs slightly from MATLAB, which always drops all explicit zeros from its sparse matrices. This is a minor difference but it cannot be done in GraphBLAS. What I should do, if I want to always drop identity values after some operations? Below there is a simple example of matrix multiplication that generates identity (zero) value. ``` GrB_Matrix a, b; GrB_Matrix_new(&a, GrB_INT64, 2, 2); GrB_Matrix_new(&b,GrB_INT64, 2, 2); GrB_Matrix_setElement(a, 2, 0, 0); GrB_Matrix_setElement(a, -2, 0, 1); GrB_Matrix_setElement(b, 1, 0, 0); GrB_Matrix_setElement(b, 1, 1, 0); GrB_Monoid monoid; GrB_Semiring semiring; GrB_Monoid_new_INT64(&monoid, GrB_PLUS_INT64, (int64_t) 0); GrB_Semiring_new(&semiring, monoid, GrB_TIMES_INT64); GrB_Matrix matrix_new; GrB_Matrix_new(&matrix_new, GrB_INT64, 2, 2); GrB_mxm(matrix_new, GrB_NULL, GrB_NULL, semiring, a, b, GrB_NULL); GxB_print(matrix_new, GxB_SHORT);``` The output matrix contains one entry that equal to zero: ``````... row: 0 : 1 entries [0:0] column 0: int64 0 `````` In the real task, I need to use custom types and custom operations, but at first, I want to solve this small problem. Can you help me, please? The text was updated successfully, but these errors were encountered:

### DrTimothyAldenDavis commented Oct 15, 2019 • edited

 A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS. But there are cases when you do want to delete entries, like all explicit zeros. It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type. `GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ;` GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected. If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type). `GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ;` where Replace is a descriptor with the replace option turned on. If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply: ``````void my_typecast_func (void *z, const void *x) { bool result = 0 if x is zero, 1 if x is nonzero *((bool *) z) = result ; } GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL, My_type) ; GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ; GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ; GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; `````` (technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS).

### tgmattso commented Oct 15, 2019 via email

 You know, that GxB_select() is a darn useful function. We should add it to the next GraphBLAS release. … --tim From: Tim Davis Reply-To: GraphBLAS/LAGraph Date: Tuesday, October 15, 2019 at 7:58 AM To: GraphBLAS/LAGraph Cc: Subscribed Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28) A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS. But there are cases when you do want to delete entries, like all explicit zeros. It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type. GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ; GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected. If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type). GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; where Replace is a descriptor with the replace option turned on. If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply: void my_typecast_func (void *z, const void *x) { bool result = 0 if x is zero, 1 if x is nonzero ((*bool) z) = result ; } GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL, My_type) ; GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ; GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ; GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; (technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#28?email_source=notifications&email_token=AATVMEYMVD5N4T5U2VMFSDTQOXK7JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJCS7I#issuecomment-542255485>, or unsubscribe.

### DrTimothyAldenDavis commented Oct 15, 2019 via email

 Yes, GxB_select is very useful. I used it for both MIT GraphChallenge solutions, and for some parts of LAGraph. The triangle count needs the same as L=tril(A) in MATLAB (extract the lower triangular part). That is tricky do in pure GraphBLAS. You can't do it with a mask. The only way to do it is with GrB_extractTuples, and then delete the tuples you don't want. Tedious... I also needed it for the ReLU, to drop values that were less than or equal to zero. So it seems to be an important function. GxB_select acts kind of like a functional mask, which GraphBLAS doesn't have. … On Tue, Oct 15, 2019 at 10:35 AM Tim Mattson ***@***.***> wrote: You know, that GxB_select() is a darn useful function. We should add it to the next GraphBLAS release. --tim From: Tim Davis ***@***.***> Reply-To: GraphBLAS/LAGraph ***@***.***> Date: Tuesday, October 15, 2019 at 7:58 AM To: GraphBLAS/LAGraph ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28) A very good question. It points out a feature of GraphBLAS, since "zero" can differ depending on the semiring (in a path distance problem, for example, an edge of weight zero is very different than no edge at all). So zeros cannot be dropped automatically inside GraphBLAS. But there are cases when you do want to delete entries, like all explicit zeros. It takes a second step to delete entries from a matrix. If you are using SuiteSparse:GraphBLAS, then you can use the following to drop explicit zeros from the GrB_Matrix A. This works for any matrix, including any user-defined type. GxB_select (A, NULL, NULL, GxB_NONZERO, A, NULL, NULL) ; GxB_select can also be used to drop any other particular value (or range of values, using, say, GxB_GT_ZERO, which keeps only those entries greater than zero, dropping values that are zero or less). GxB_GT_ZERO only works for the 11 built-in types, while GxB_NONZERO works for any type, including user-defined types. For user-defined types, it checks to see if the bit pattern is all zero, and keeps those that have at least one 1 bit in them. So if your typedef is a struct with "holes" in it, this might not always work as expected. If you are using another GraphBLAS library, you need to use the matrix as its own mask (assuming A has a built-in type, not a user-defined type). GrB_assign (A, A, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; where Replace is a descriptor with the replace option turned on. If A has a user-defined type, you first have to create a boolean matrix, where M(i,j) = 0 if A(i,j) is zero, or M(i,j)=1 otherwise. That can be done with a user-defined typecast function, via GrB_apply: void my_typecast_func (void *z, const void *x) { bool result = 0 if x is zero, 1 if x is nonzero ((*bool) z) = result ; } GrB_UnaryOp_new (&My_typecast_function, my_typecast_func, GrB_BOOL, My_type) ; GrB_Matrix_new (&M, GrB_BOOL, nrows, ncols) ; GrB_apply (M, NULL, NULL, My_typecast_function, A, NULL) ; GrB_assign (A, M, NULL, A, GrB_ALL, nrows, GrB_ALL, ncols, Replace) ; (technically speaking, all the "NULL"s above should be GrB_NULL ... but NULL works the same as GrB_NULL in SuiteSparse:GraphBLAS). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub< #28?email_source=notifications&email_token=AATVMEYMVD5N4T5U2VMFSDTQOXK7JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJCS7I#issuecomment-542255485>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AATVMEY2BCJI63SLE4TEJMDQOXK7JANCNFSM4JA4HHPA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOJQLINZGYTNAPQNRRDQOXPNBA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJHBSA#issuecomment-542273736>, or unsubscribe .

### gsvgit commented Oct 16, 2019

 Hello. I have the same question. I clearly understand why it is not good idea to remove zero values. But what if I explicitly specify zero as an identity in the monoid? In path distance problem the identity is not zero, so we should not delete zeroes, but I think that we can remove minus infinity which is identity. So, the question is about identities: is it possible to drop identities out automatically during operations over sparse matrices?

### DrTimothyAldenDavis commented Oct 16, 2019 via email

 A matrix doesn’t remain in a single semiring in an algorithm. It can be used in multiple semirings. There are several examples of this So the value that isn’t there is suddenly different. It changes with the semiring. As a result, it’s impossible to automatically drop any values On Wed, Oct 16, 2019 at 12:52 AM Semyon ***@***.***> wrote: Hello. I have the same question. I clearly understand why it is not good idea to remove zero values. But what if I explicitly specify zero as an identity in the monoid? In path distance problem the identity is not zero, so we should not delete zeroes, but I think that we can remove minus infinity which is identity. So, the question is about identities: is it possible to drop identities out automatically during operations over sparse matrices? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOMMQMVYBQNE4MNYNRDQO2T3JA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBLFXCY#issuecomment-542530443>, or unsubscribe . -- Sent from Gmail Mobile

### ScottKolo commented Oct 16, 2019

 I agree that this is not something that should be done automatically, but it would be convenient to have a utility method or canonical way of doing it. The GxB_select approach seems to be that, so I also agree with the calls to get that into the standard (this application alone justifies it in my opinion). I think the usual argument here is that not dropping identity values in some cases could result in a lot of fill-in down the road, leading to performance issues. Maybe an LAGraph utility function would be a nice middle ground?

### DrTimothyAldenDavis commented Oct 16, 2019 via email

 Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and would use GrB* functions otherwise. I have a function in my MATLAB interface to do this as well, as A = GrB.prune (A). By default, it prunes zeros. To prune other values equal to the identity id, use A = GrB.prune (A, id). … On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej ***@***.***> wrote: I agree that this is not something that should be done *automatically,* but it would be convenient to have a utility method or canonical way of doing it. The GxB_select approach seems to be that, so I also agree with the calls to get that into the standard (this application alone justifies it in my opinion). I think the usual argument here is that not dropping identity values in some cases could result in a lot of fill-in down the road, leading to performance issues. Maybe an LAGraph utility function would be a nice middle ground? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500>, or unsubscribe .

### aydinbuluc commented Oct 16, 2019 via email

 I am actually surprised that we managed to not include an easy way to do this in GraphBLAS. Prune always existed in CombBLAS (by now, for a decade) … On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***> wrote: Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and would use GrB* functions otherwise. I have a function in my MATLAB interface to do this as well, as A = GrB.prune (A). By default, it prunes zeros. To prune other values equal to the identity id, use A = GrB.prune (A, id). On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej ***@***.***> wrote: > I agree that this is not something that should be done *automatically,* > but it would be convenient to have a utility method or canonical way of > doing it. The GxB_select approach seems to be that, so I also agree with > the calls to get that into the standard (this application alone justifies > it in my opinion). > > I think the usual argument here is that not dropping identity values in > some cases could result in a lot of fill-in down the road, leading to > performance issues. > > Maybe an LAGraph utility function would be a nice middle ground? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > < #28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA > > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105>, or unsubscribe .

### gsvgit commented Oct 16, 2019

 Ah... I see. @DrTimothyAldenDavis thank you for the explanation! And even for such operation like `GrB_mxm` where we should specify semiring, we still have no enough information to drop identities of the given semiring automatically? Suppose the next case. I have sparse matrices `A` and `B` without explicit zero values. I perform matrix multiplication `A * B` over semiring where zero is identity. The result is matrix `C` in which the value of some cells is explicit zero (because we do not drop it out), and the value of some cells is implicit zero. The first (and principal for me) question here is why the behavior of operation is not agreed with specified semiring? And the second is mentioned by @ScottKolo: such behavior can lead to poor performance. Now I want to use `C` in operation over semiring in which zero is not identity. And now I'm confused. Because in terms of the result type all implicit and explicit zeros are equal. But in terms of argument type (`C` is an argument of operation over another semiring) implicit values and explicit zeros are different. So, I guess that The behavior of operation with specified semiring should be agreed with this semiring. If I want to switch from one semiring to another, I should do it explicitly by using the `select` function, for example.

### DrTimothyAldenDavis commented Oct 16, 2019 via email

 Automatic dropping of zeros (say in MATLAB) is an awful thing to do. But it's perfect to add as a non-default option, where the user is able to prune things easily at any time. But it can't be done automatically, for many reasons: First of all, it breaks the semirings in GraphBLAS. Switching between semirings causes all implicit values to change but not explicit values, so the explicit zero is never the same thing as an implicit entry that is not present in the pattern. The matrix has no tag that tells what semiring it's in, nor a tag to say what the implicit value is, so there's no select function to change a matrix from one semiring to another. Second, it destroys all the graph theoretic structure in the resulting matrices. There are things I could do inside MATLAB, but I can't because it drops zeros all the time (MATLAB uses my solvers for x=A\b, and I also do C=A*B when A and/or B are sparse, inside MATLAB). In GraphBLAS, in the future, I could speed up GrB_mxm on a sequence of matrices with the same pattern, so the pattern of the result never changes. That way, I could cache the symbolic analysis and reuse it. Zoom ... but if you make me drop things, this breaks and I can't do it. Third, it's slow. If a few zeros are in the matrix, it's faster to leave them there, and prune as needed. Changing the pattern of a matrix can cause a huge slowdown. Zombies are better for this (that's a long story... http://aldenmath.com/my-friendly-zombie/ ). (I should probably turn our discussion into a blog post there because this is a very important question). Fourth, there are times in GraphBLAS where you want to keep all zeros. GraphBLAS does not have a different object for dense or sparse matrices, as MATLAB does. There are times when dense is faster ... say a vector of size n, that gives the depth of each node in a breadth-first-search. That vector starts out sparse (empty, actually) and slow accumulates entries until it becomes dense. But each time new entries get added, I have to redo the whole data structure (in my implementation). So it's far faster to start it dense, with explicit zeros (or whatever identity values it needs). In this case, any kind of automatic dropping is bad. Fifth, it's unpredictable. Say the result is floating point epsilon, because of roundoff. So it is kept. But in another machine the result is zero. So you get a different combinatoric result depending on what your roundoff is, what your compiler -O flag is, what your compiler is, if you're in parallel or not, on the GPU or not ... ack. Now try to explore a bug where your pattern differs from what you expect. Turn on -g, and your bug goes away. Heisenbug. Nasty. Having said all this, it is essential that some algorithms need to drop entries that match some specific criterion, like "drop all zeros", "drop all nans", "drop all entries <= 0", and even "drop all entries that satisfy some condition determined by a function f (aij, i, j, m, n, thunk) where aij is the value, i and j are the indices, thunk is some user-defined 'scalar', etc". That can be used for all sorts of things, like L=tril(A) in MATLAB, which cannot be done easily in pure GraphBLAS. So I absolutely agree that it needs to be simple to drop things. It just can never be done automatically. … -- Tim On Wed, Oct 16, 2019 at 10:30 AM Semyon ***@***.***> wrote: Ah... I see. @DrTimothyAldenDavis thank you for the explanation! And even for such operation like GrB_mxm where we should specify semiring, we still have no enough information to drop identities of the given semiring automatically? Suppose the next case. 1. I have sparse matrices A and B without explicit zero values. 2. I perform matrix multiplication A * B over semiring where zero is identity. The result is matrix C in which the value of some cells is explicit zero (because we do not drop it out), and the value of some cells is implicit zero. The first (and principal for me) question here is why the behavior of operation is not agreed with specified semiring? And the second is mentioned by @ScottKolo : such behavior can lead to poor performance. 3. Now I want to use C in operation over semiring in which zero is not identity. And now I'm confused. Because in terms of the result type all implicit and explicit zeros are equal. But in terms of argument type (C is an argument of operation over another semiring) implicit values and explicit zeros are different. So, I guess that 1. The behavior of operation with specified semiring should be agreed with this semiring. 2. If I want to switch from one semiring to another, I should do it explicitly by using the select function, for example. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOPT4ZSHFKXCQFOVOSDQO4XRXA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5RQQ#issuecomment-542759106>, or unsubscribe .

### mcmillan03 commented Oct 16, 2019 via email

 IIRC, it was because "it could be done with a combination of existing operations." GxB_select is in the list of issues to consider for inclusion in the next update to the spec. By the way, using something called GrB_select() to "remove" unwanted values from a matrix/vector is a bit counter-intuitive. "Prune" implies a heuristic which might be useful (especially supporting binaryops and scalar constants as one input). Soliciting ideas for names On Wed, Oct 16, 2019 at 8:26 AM Aydin Buluc wrote: … I am actually surprised that we managed to not include an easy way to do this in GraphBLAS. Prune always existed in CombBLAS (by now, for a decade) On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***> wrote: > Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so > that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and > would use GrB* functions otherwise. > > I have a function in my MATLAB interface to do this as well, as A = > GrB.prune (A). By default, it prunes zeros. To prune other values equal > to the identity id, use A = GrB.prune (A, id). > > On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej < ***@***.***> > wrote: > > > I agree that this is not something that should be done *automatically,* > > but it would be convenient to have a utility method or canonical way of > > doing it. The GxB_select approach seems to be that, so I also agree with > > the calls to get that into the standard (this application alone justifies > > it in my opinion). > > > > I think the usual argument here is that not dropping identity values in > > some cases could result in a lot of fill-in down the road, leading to > > performance issues. > > > > Maybe an LAGraph utility function would be a nice middle ground? > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > < > #28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500 > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA > > > > . > > > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > < #28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA > > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AANXEP37VTFPXZE73PDC2JDQO4XDFA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5CRA#issuecomment-542757188>, or unsubscribe .

### mcmillan03 commented Oct 16, 2019 via email

 Mathematicians please chime in...what I am about to say is secondhand explanation that was given to me years ago.... Note that you are using a semiring which does not define additive inverse (e.g. "minus"). The production of a "zero" is happenstance (because an additive inverse operation occurred somewhere either by adding a negated value or subtraction...which are not part of the semiring). I would defer to the more mathematically inclined to correct my understanding. … On Wed, Oct 16, 2019 at 8:30 AM Semyon ***@***.***> wrote: Ah... I see. @DrTimothyAldenDavis thank you for the explanation! And even for such operation like GrB_mxm where we should specify semiring, we still have no enough information to drop identities of the given semiring automatically? Suppose the next case. 1. I have sparse matrices A and B without explicit zero values. 2. I perform matrix multiplication A * B over semiring where zero is identity. The result is matrix C in which the value of some cells is explicit zero (because we do not drop it out), and the value of some cells is implicit zero. The first (and principal for me) question here is why the behavior of operation is not agreed with specified semiring? And the second is mentioned by @ScottKolo : such behavior can lead to poor performance. 3. Now I want to use C in operation over semiring in which zero is not identity. And now I'm confused. Because in terms of the result type all implicit and explicit zeros are equal. But in terms of argument type (C is an argument of operation over another semiring) implicit values and explicit zeros are different. So, I guess that 1. The behavior of operation with specified semiring should be agreed with this semiring. 2. If I want to switch from one semiring to another, I should do it explicitly by using the select function, for example. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AANXEP3NZYTFZT6WH77B3O3QO4XRXA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5RQQ#issuecomment-542759106>, or unsubscribe .

### DrTimothyAldenDavis commented Oct 16, 2019 via email

 GxB_select was named that way because it doesn't prune. It selects entries for the output. So for example, for the sparse deep neural network, to select only positive entries, I do GxB_select (A, ... , GxB_GT_ZERO...). and to delete explicit zeros I do GxB_select (A, ... , GxB_NONZERO ...). which selects all zeros. GxB_select keeps all entries A(i,j) for which the selectop f (aij,i,j,m,n,thunk) is true, just as the mask M(i,j)=true selects the entry (i,j) to be written to the result. I'm open to other naming alternatives, though. I considered something with the word "mask" in it, but it acts differently than the mask so I avoided that name as potentially confusing. On Wed, Oct 16, 2019 at 11:07 AM Doc McMillan wrote: … IIRC, it was because "it could be done with a combination of existing operations." GxB_select is in the list of issues to consider for inclusion in the next update to the spec. By the way, using something called GrB_select() to "remove" unwanted values from a matrix/vector is a bit counter-intuitive. "Prune" implies a heuristic which might be useful (especially supporting binaryops and scalar constants as one input). Soliciting ideas for names On Wed, Oct 16, 2019 at 8:26 AM Aydin Buluc ***@***.***> wrote: > I am actually surprised that we managed to not include an easy way to do > this in GraphBLAS. Prune always existed in CombBLAS (by now, for a decade) > > On Wed, Oct 16, 2019 at 8:24 AM Tim Davis ***@***.***> > wrote: > > > Yes, adding it to LAGraph would be a good idea. It would use an #ifdef so > > that the GxB_select can be used if SuiteSparse:GraphBLAS is in use, and > > would use GrB* functions otherwise. > > > > I have a function in my MATLAB interface to do this as well, as A = > > GrB.prune (A). By default, it prunes zeros. To prune other values equal > > to the identity id, use A = GrB.prune (A, id). > > > > On Wed, Oct 16, 2019 at 9:31 AM Scott Kolodziej < > ***@***.***> > > wrote: > > > > > I agree that this is not something that should be done *automatically,* > > > but it would be convenient to have a utility method or canonical way of > > > doing it. The GxB_select approach seems to be that, so I also agree > with > > > the calls to get that into the standard (this application alone > justifies > > > it in my opinion). > > > > > > I think the usual argument here is that not dropping identity values in > > > some cases could result in a lot of fill-in down the road, leading to > > > performance issues. > > > > > > Maybe an LAGraph utility function would be a nice middle ground? > > > > > > — > > > You are receiving this because you commented. > > > Reply to this email directly, view it on GitHub > > > < > > > #28?email_source=notifications&email_token=AEYIIOM56AHABGAUTJM744LQO4QTRA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMWSBA#issuecomment-542730500 > > >, > > > or unsubscribe > > > < > > > https://github.com/notifications/unsubscribe-auth/AEYIIONIISEAJSNAPFF6ZALQO4QTRANCNFSM4JA4HHPA > > > > > > . > > > > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub > > < > #28?email_source=notifications&email_token=AAMJ7L2C55CN7YB2D3I4JHDQO4W27A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM42CI#issuecomment-542756105 > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AAMJ7L3TNSL2CKLZIG4QQKLQO4W27ANCNFSM4JA4HHPA > > > > . > > > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > < #28?email_source=notifications&email_token=AANXEP37VTFPXZE73PDC2JDQO4XDFA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM5CRA#issuecomment-542757188 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AANXEPYQO47EEGI67IHO67DQO4XDFANCNFSM4JA4HHPA > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOKMBHZ35VMAC5AZJMTQO432NA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNBTJY#issuecomment-542775719>, or unsubscribe .

### szarnyasg commented Oct 16, 2019 • edited

 Maybe `GxB_keep` would be a better name? The documentation says Each entry A(i,j) is evaluated with the operator, which returns true if the entry is to be kept in the output, or false if it is not to appear in the output. For me, this name works well for simple cases such as keeping the lower triangular part of a matrix. Not sure about the more complex cases (i.e. ones with a mask) though.

### tgmattso commented Oct 16, 2019 via email

 GrB_keep() is a nice name, but I still like GrB_select() better. From an SQL point of view, I’m used to using SELECT to choose the items I want to pull into a table. So the name is quite intuitive to people with exposure to SQL. … -Tim From: Gabor Szarnyas Reply-To: GraphBLAS/LAGraph Date: Wednesday, October 16, 2019 at 9:57 AM To: GraphBLAS/LAGraph Cc: Tim Mattson , Comment Subject: Re: [GraphBLAS/LAGraph] Drop identity values problem (#28) Maybe GxB_keep would be a better name? The documentation says Each entry A(i,j) is evaluated with the operator, which returns true if the entry is to be kept in the output, or false if it is not to appear in the output. For me, this name works well for simple cases such as keeping the lower triangular part of a matrix. Not sure about the more complex cases (i.e. ones with masks) though. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#28?email_source=notifications&email_token=AATVME5IQJRNZJZSCMGJZK3QO5BW5A5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNGQDA#issuecomment-542795788>, or unsubscribe.

### simpletonDL commented Oct 17, 2019 • edited

 Thank you very much for a very detailed explanation of this issue, it's great!) I understand why automatic dropping of zeros isn't allowed in GraphBlas and why it breaks the framework. So I don't pretend to include this feature by default, but I would be very grateful if you, if possible, could add the ability to change operation behaviour (like some entry in the operation descriptor or something else). So I want to give an additional more real-life example when it would be very useful and the selection operation wouldn't be enough. The problem will appear if we want to change the graph dynamically, e.g. adding edges step by step. Suppose we want to find all paths in the directed graph which satisfy some conditions. For simplicity, let's assume that conditions are predicates A, B, C so the entry of the matrix G that corresponds to graph is a subset of {A, B, C}. So predicate P belongs to G[i][j] iff there is a path from node i to j, which satisfies the predicate P. Also, we have some rules, that allow merging two paths, whose final vertices coincide. E.g. if some path from i to j satisfies the predicate B and some path from j to k satisfy predicate C, then the path from i to k satisfies the predicate A. Thus these rules constitute the semiring, whose elements are subsets of {A, B, C} and binary multiplication operation corresponds to applying all rules to subsets. In the example above, {B} multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because there isn't such kind of rule. The addition operation, you guessed it, is a simple union of subsets. At the beginning of the algorithm, we initialize matrix with some subsets (which ones don't matter), so the base of the algorithm is the answer for paths of length 1. Then we multiply the matrix by itself and receive an answer for paths of length 2. Then we get the union of matrices for paths of length 2 and 1 and can repeat multiplication to get answers for paths of length 3. And so on. I believe, that these iterations will converge :D In this algorithm, I need to create own function to implement set multiplication. And there is a case when the result of the operation is empty set (when there is no suitable rule). In the current version, I have to set the explicit value of this empty set, but in the semantics of algorithm, it is equal to an explicit value, which doesn't occur in the pattern, because it means "there isn't the path that satisfies at least one predicates between this vertices". That gets into serious troubles. The main terrible thing can happen even after the first matrix multiplication due to the appearance of a huge number of "zeroes" explicit values. Even if we clear all unnecessary values after each operation thanks to the selection operation, the zeroes values will come to us after multiplication, and before selection in the worst case will permeate the swap memory, get out of there and kill the process. And this is a real-life case. The other problem is performance due to many unnecessary operations. At first, We have to add implicit unnecessary value, at second, delete this value. It seems a little strange. So it would be very nice to be able to change the behaviour of operations (make it drop unnecessary values) or to return from user-defined operation function special value or something else. This will make it more flexible. In conclusion, I want to thank everyone) I am very pleased to participate in the conversation.

### simpletonDL commented Oct 17, 2019

 There is a problem exactly after GrB_mxm and before GxB_select, because zeroes respawn between these moments. GxB_select_ can reduce the problem, but it can`t solve it. If dropping identity values is impossible during the matrix multiplication (only for user-defined operations) due to algorithmic features, I will understand/

### simpletonDL commented Oct 17, 2019

 In this case, I think, the issue can be closed. Thank you very much for your answer!

### simpletonDL commented Oct 17, 2019

 If the first part of matrix multiplication computes the pattern of result, then in any case memory will be allocated for identity values. So I understand that there is no way to increase memory in this case (may be, it works only for some special built-in types).

### DrTimothyAldenDavis commented Oct 17, 2019 via email

 I think there is a better solution, one that uses the predicate B as a mask. There's no need to drop zeros, just don't compute them in the first place (that is the purpose of the mask). In your earlier email, you wrote: The problem will appear if we want to change the graph dynamically, e.g. adding edges step by step. Suppose we want to find all paths in the directed graph which satisfy some conditions. For simplicity, lets assume that conditions are predicates A, B, C so the entry of the matrix G that corresponds to graph is a subset of {A, B, C}. So predicate P belongs to G[i][j] iff there is a path from node i to j, which satisfies the predicate P. Also, we have some rules, that allow merging two paths, whose final vertices coincide. E.g. if some path from i to j satisfies the predicate B and some path from j to k satisfy predicate C, then the path from i to k satisfies the predicate A. Thus these rules constitute the semiring, whose elements are subsets of {A, B, C} and binary multiplication operation corresponds to applying all rules to subsets. In the example above, {B} multiplied by {C} is {A}, but {C} multiplied by {C} is empty set, because there isnt such kind of rule. The addition operation, you guessed it, is a simple union of subsets. This sounds like the GraphBLAS operation G=A*C'. Let me ask the following. Is the following computation being done? I will write it as if it considers all i and j, but don't fear, this is not what I do. Just the mathematical specification: Let A, B, C be square boolean matrices of dimension n. The matrix G will be n by n and is currently empty. for all i = 1 to n for all j = 1 to n if (B (i,j)) is true then for all k = 1 to n G (i,j) = G(i,j) OR (A(i,k) AND C (j,k)) If that is what you want to compute, then it is a very fast GrB_mxm computation, G = A*C'. I do not take O(n^3) to do the above computation. The above is just a simple mathematical specification of what is computed by the following: GrB_Descriptor_new (&desc) ; GrB_Descriptor_set (desc, GrB_IN1, GrB_TRAN) ; GrB_mxm (G, B, NULL, GxB_LOR_LAND_BOOL, A, C, desc) ; The GxB_LOR_LAND_BOOL is the boolean monoid. If instead you want A and C to be integer, and want to compute the following: for all i = 1 to n for all j = 1 to n if (B (i,j)) is true then for all k = 1 to n G (i,j) = G(i,j) + (A(i,k) * C (k,j)) Then the computation is also fast, just with different semiring. If A and C are stored in their default format, which is by row (CSR), then this will use my masked dot product, internally. That function is very fast, very parallel, and very memory efficient. It will use no more than O (nne(B)) memory, where nne(B) is the number of explicit entries present in B. I do not need to transpose the matrix C to compute G=A*C'. If G is changing dynamically, then you might also consider using an accumulator operator, like G += A*C'. The matrix B(i,j) does not have to be boolean. It can be any built-in type, which are all typecastable to bool. So in that case, the above pseudocode would read "if B(i,j) is nonzero then...". You may still want to drop zeros after the fact, if G(i,j) is computed yet becomes explicitly zero. That could happen if A or C have negative entries in them, or if there is an accum operator and G(i,j) starts out negative and then becomes zero after the accumulation occurs. Is this what you want to compute? … On Wed, Oct 16, 2019 at 9:50 PM simpletonDL ***@***.***> wrote: If the first part of matrix multiplication computes the pattern of result, then in any case memory will be allocated for identity values. So I understand that there is no way to increase memory in this case (may be, it works only for some special built-in types). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=AEYIIOJPLJ3AIIDICSJBSP3QO7HIVA5CNFSM4JA4HHPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOSAKI#issuecomment-542973993>, or unsubscribe .

### simpletonDL commented Oct 18, 2019

 Oh, it’s interesting, I’ll think about it and try to do it, and say whether it worked out

### johnrgilbert commented Oct 19, 2019

 This is a great discussion about an issue that's been interesting and important (and sometimes controversial) going back to KDT, CombBLAS, and even sparse Matlab in the early 1990s. I just planted a link to it on GraphBLAS.org :-)

mentioned this issue Jul 28, 2021
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants