nnz and nonzeros #6769

eldadHaber · 2014-05-06T18:17:11Z

Someone has decided to change nnz and to remove the nonzeros command.

I find this very unacceptable!
It is a bad practice to not allow for backward compatibility.
Many of our codes use these commands and we have to go and replace them in all our previous versions.

We are investing a considerable amount of time, trying to develop codes in julia for geophysical inversion and truly believe that it is the way to go.
If new julia versions are not going to be backward compatible we will need to rethink
our development.

Please think about this whenever you want to retire a command

JeffBezanson · 2014-05-06T18:36:25Z

Sorry about this. You can add the following to your code:

const nnz = countnz
Base.nonzeros(s::SparseMatrixCSC) = nonzeros(s.nzval)

and then you won't need to change every use. Wrapping this in if !method_exists(nonzeros,(SparseMatrixCSC,)) will let the code work with both 0.2 and 0.3.

Personally I dislike the decision to allow explicit zeros in sparse matrices, but I think we should bring back the definition of nonzeros for sparse.

ViralBShah · 2014-05-06T18:56:06Z

I find myself wanting to use nnz every once in a while. While we can retain stored zeros, it would be nice to have nnz and nonzeros.

eldadHaber · 2014-05-06T19:01:30Z

My problem is not that changes are done (I'm also not a fan of this change but this is not a big deal).
My problem is backward compatibility.
We now have a relatively large investment in a julia library for geophysical inverse problems. If today a decision is made to retire a command tomorrow it may be a different one.
We cannot commit on building code when basic commands are changed.

My request for the future, please

Try to not retire a command
If you have a new command to do the same thing better, link the two (maybe with a recommendation to use the new one)
Documentation! - There is no documentation about nonzeros not in use. The first time
I got this is when I try to run a code that ran in the past

Thanks

tknopp · 2014-05-06T19:31:54Z

Unless Julia has version 1.0 I don't see the issue when the API is evolving. Yes this can be painful but trying to be backward compatible in all the 0.x releases will not lead to a consistent 1.0 release.

We all know that the documentation can be improved but this is a community project. There is nobody you can blame because of missing / wrong documentation. The project lives from participation.

Independently of this specific issues I think there will be more breaking changes on the way to 1.0. If this is not ok for you (which is absolutely understandable) you might better try a programming environment that is more stable (e.g. Matlab).

JeffBezanson · 2014-05-06T20:03:37Z

Please continue to file issues like this for specific changes that cause problems. There is always a chance something got removed by mistake, or that we will change our minds.

nonzeros seems to have been botched a bit --- in all other cases (including nnz) we add deprecation warnings that let code continue to work. Also adding your own aliases and work-arounds like those I posted above has no overhead, so in some cases that will be the right approach.

simonster · 2014-05-06T20:10:28Z

I think we just changed the method signature for nonzeros so that it only applies to StridedArray in c330b1d, hence the lack of a deprecation warning. The existing method was inefficient for sparse matrices, but it could be implemented efficiently by adding the definition above.

JeffBezanson · 2014-05-06T20:19:21Z

Yes, but there still should have been a deprecation warning or error for the removed functionality. It was an oversight.

tkelman · 2014-05-06T22:15:49Z

And a highlight of not-very-good test coverage for sparse matrices. When this goes back in it should have a test so accidental removal without a deprecation gets caught in the future.

JeffBezanson · 2014-05-06T22:29:33Z

Well, the removal wasn't accidental. The tests were updated, but not the deprecations. Better test coverage would of course help in general with this kind of thing though.

ViralBShah · 2014-05-07T03:40:07Z

Nnz does have a deprecation. That is how I keep getting reminded often that I like the name. Perhaps nonzeros may have been missed. Anyways, I will revisit this would matter soon and iron out the issues.

ViralBShah · 2014-05-07T09:23:45Z

@tkelman You are certainly right about the lack of tests and coverage for sparse matrices. Hoping to significantly improve sparse matrix support over the next few weeks.

ViralBShah · 2014-05-07T09:24:00Z

Cc: @mlubin @IainNZ

mlubin · 2014-05-07T12:28:09Z

I don't think we should restore nnz, since it's so tempting to use yet would be the wrong choice for the majority of code that works with sparse matrices. As @tkelman mentioned in #6769, the representation details of sparse matrices are very important for writing efficient code. Users should have to explicitly choose whether they want the storage size of the matrix or the exact number of nonzero elements, which requires looping through the matrix.

Sorry @eldadHaber for the churn. The deprecation of nnz resulted from a change in the structure of SparseMatrixCSC to allow explicit nonzero entries in the matrix, which is useful in a number of applications. This means that the exact number of nonzeros cannot be computed in constant time anymore. We believed that an explicit warning would be better than silently changing an O(1) operation into a O(nonzeros) operation in users code.

mlubin · 2014-05-07T15:01:06Z

Rereading the comments in #6769, I'll defer to @JeffBezanson if the consensus is that it's better to silently change the performance of nnz.

JeffBezanson · 2014-05-07T15:22:32Z

My view is based on looking at the other nnz (now countnz) methods, which clearly count elements such that x!=0. Of course representation details are important for performance, but nfilled is not a drop-in replacement for nnz --- it doesn't mean the same thing.

mlubin · 2014-05-07T15:33:02Z

Right, it's not a drop in replacement, so the deprecation calls countnz. My point is that the name nnz is misleading for sparse matrices, because almost anyone reading the code who hasn't read the Julia documentation would assume that this is a constant-time operation on sparse matrices. The countnz name is much more explicit and properly describes what the function is actually doing.

mlubin · 2014-05-07T15:34:07Z

It's an application of the principle of least surprise.

JeffBezanson · 2014-05-07T15:49:22Z

I consider the meaning of the operation primary, not its performance.
I guess I (admittedly not being a sparse expert) just don't see the value of trying to make people realize that nnz is O(n). If they need that function, there's nothing they can do about it, so what's the point?

Is it the case that knowing the actual number of !=0 elements is not useful, and 90% of the time people actually want the storage size? I suspect not, but if so then I might change my mind.

mlubin · 2014-05-07T15:55:53Z

In my experience when writing linear algebra operations with sparse matrices, nfilled is the quantity of interest, not nnz. Compare the number of times nfilled is used in base/sparse/*, which is 28 by my count, with number of times countnz is used, twice.

JeffBezanson · 2014-05-07T16:05:22Z

Maybe @eldadHaber can comment on how he uses nnz?

eldadHaber · 2014-05-07T17:42:30Z

I have a somewhat different perspective about nnz.
First, for real and complex matrices, it does not make sense to me to ask the question
nonzeos(A) != 0

As we teach in any numerical analysis course, zero is not really 0. For real matrices
we should be reporting
abs(nonzeros(A)) <= tol

So I fail to see what is the fuss about this command.
When I use nnz(A) I want to know what is the storage that is roughly needed for some variable.
The number of "true" nonzeros should be asked with a tol

mlubin · 2014-05-07T17:49:02Z

It seems that this perspective falls under the nfilled use case where you were using nnz to access the number of elements stored in the matrix, nonzero or otherwise, as opposed to counting the occurrence of exact nonzeros.

eldadHaber · 2014-05-07T17:54:11Z

I try to say that counting nonzeros (for real and complex matrices) does not make sense anyway. Sure we can have this functionality but it is misleading. If A has real values, most around 1 and 10% of the values are smaller than 1e-16 then you still report these values as nonzeros.

tkelman · 2014-05-07T17:58:58Z

@eldadHaber tolerancing and small numerical values are a separate question, that applies to any operation involving floating point numbers. Sparse algorithms are written in such a way that explicitly stored zero values don't change the output result (assuming you don't run into any overflow or nan).

I want the storage size when I'm writing a sparse algorithm - when explicit zeros need to be there, it's almost always for matrix structural reasons. For example Pardiso requires storage to already exist for elements on the diagonal - ideally these would be stored zeros when the language allows for them, but the Matlab interface to Pardiso requires hacky inaccurate workarounds like adding eps * speye(n,n). Prior to 2009 Petsc did something similar (see https://bitbucket.org/petsc/petsc/src/cb6910d74d53/bin/matlab/PetscBinaryWrite.m#cl-31) when saving sparse matrices from Matlab into its binary format, though slightly more reliably by using a small magic number and setting any element matching that magic number back to an explicit 0 before saving the binary file.

The Matlab interface to Ipopt is another case of this. The linear solver used within Ipopt, like the majority of sparse direct algorithms, separates the structure-based symbolic factorization allocation step from the value-based numerical factorization step. You need to evaluate the sparse Jacobian and Hessian matrices at every iteration, with the same sparsity structure but varying nonzero values. What happens if one of the "structurally nonzero" elements happens to have a value of exactly zero at some intermediate iteration? If your language automatically removes zeros from sparse matrices, you have to account for this by making a copy of the entire sparse matrix data structure at every iteration to put the nonzeros in the proper place. Bad times.

JeffBezanson · 2014-05-07T18:03:09Z

Thanks, that perspective is very helpful. In that case maybe we should use the general count function instead of either nnz or countnz. Or sum(abs(A) .>= tol) and the like.

mlubin · 2014-05-07T18:03:39Z

There was a lot of discussion on this in #5538, and I agree and have been arguing that countnz is a minority use case. That said, I don't think it's appropriate to have nnz return the value of nfilled, because, as @JeffBezanson mentioned, it's not the semantic meaning of the operation, which itself is general and is defined on other data structures. countnz could certainly be extended to use a numerical tolerance, but that's a separate issue from the function names.

eldadHaber · 2014-05-07T18:12:11Z

I think that the tolerance should be a used choice. I'm trying to construct an example to demonstrate some of these issue

JeffBezanson · 2014-05-07T18:22:57Z

It would make sense to add a tol argument to countnz.

What about nonzeros? It seems one might want it to just return the nzval array, and not filter out stored zeros. In that case it could use a rename as well.

timholy · 2014-05-07T18:23:05Z

@eldadHaber, just so you're aware: the "fuss" over this command arose because we, unlike Matlab, now allow you to store zeros in a sparse matrix. Check out the implications (following code is Matlab):

>> A = sprand(10^4, 10^4, 0.1);
>> [i,j] = ind2sub(size(A), find(A, 1));
>> tic; A(i,j) = 0.1; toc
Elapsed time is 0.000443 seconds.
>> tic; A(i,j) = 0.0; toc
Elapsed time is 0.037815 seconds.

Notice the second timing is one hundred fold longer than the first: if you set a non-zero entry to zero, Matlab actually copies the entire data for the matrix, deleting that one from memory. That's really expensive. Matlab does this because it doesn't allow you to have any stored value of a sparse matrix be identically zero.

In contrast, we allow stored zeros. That will* make this kind of thing infinitely more efficient. But that means there's a difference between the amount of storage required and how many entries are not identically zero. Given that matlab only has the nnz command, we realized we had better differentiate these two meanings.

But breaking old code is a no-no, and usually we are reasonably good about avoiding that because of the deprecation framework. But it seems to have failed in this case.

*Currently it looks like sparsematrix/setindex! needs updating, so it stops caring about whether the supplied value is 0.

StefanKarpinski · 2014-05-21T15:14:11Z

You seem to be stuck in a posting loop, @jiahao.

lindahua · 2014-05-21T15:14:55Z

I am ok with either option 1 or 2. Option 3 is going to cause semantic inconsistencies, and a definitely no-go.

jiahao · 2014-05-21T15:15:19Z

Sorry, there appears to be a problem with my connection.

jiahao · 2014-05-21T15:34:49Z

Those conversations about why nonzeros won't give you the non-zeros of an array are going to be fun.

What if we hedge this by renaming nonzeros to nzs and define nz to be a structural nonzero element of a sparse array, which may have numerical value == 0?

mlubin · 2014-05-21T15:40:02Z

I'd vote for nzval, since that's the name of the member of SparseMatrixCSC.

ViralBShah · 2014-05-21T17:12:05Z

Option 1 is the right one I feel. For dense arrays, those methods can give suggestions to the user.

tkelman · 2014-05-21T19:25:18Z

Giving nonzeros a more specific name for its particular purpose just for sparse arrays seems fine. Besides, there are already other names for finding the numerically nonzero values of a general array: find and findnz.

tknopp · 2014-05-22T08:46:17Z

This is now officially the most important (commented) open Julia issue :-)

It seems that people agree that nnz is a specfic name for structural non-zeros and although nfilled is somwhat clearer, in my opinion nnz is ok as it is the "common" name and it also does not conflict with a generic name. For me the name nonzeros is too generic that it should be claimed by the sparse matrix semantics. So I am with @mlubin to name this nzval or nzvals (or nzentries?) Or simply let this issue open and use the direct field accessor to nzval for the moment.

But in any case, I don't see this issue a blocker for 0.3. I see it more as a proof that we have to shrink base and move these naming discussions into a standard library that potentially could be even decoupled from core/base Julia (not necessary technically but at least conceptually)

carlobaldassi · 2014-05-22T09:06:01Z

Ah so I'll add some noise too. As a casual sparse matrix user, I think using nz in the names for the internal structure is a little confusing with findnz, where the abbreviation really stands for non-zeros, and which is arguably a not-uncommon operation to perform on sparse matrices.

ViralBShah · 2014-05-23T04:46:33Z

We went with findnz because we decided long back to not have our find return the values along with the indices. It could certainly use a better name for cases not related to sparse matrices.

mlubin · 2014-05-23T04:49:17Z

Maybe findnz should be renamed to sparsetriplet or something along those lines?

Deprecate nfilled Deprecate nonzeros for StridedArray and BitArray find()/findnz() over sparse matrices no longer filters out stored zeros

Deprecate nfilled Deprecate nonzeros for StridedArray and BitArray find()/findnz() over sparse matrices no longer filters out stored zeros Update documentation for structural nonzeros.

Deprecate nfilled for SparseMatrixCSC. Deprecate nonzeros for StridedArray and BitArray. Update documentation for structural nonzeros.

Deprecate nfilled for SparseMatrixCSC. Deprecate nonzeros for StridedArray and BitArray. Deprecate nnz for StridedArray. Update documentation for structural nonzeros. Update NEWS

ViralBShah · 2014-05-27T04:32:13Z

Fixed by 8879b88

ViralBShah mentioned this issue May 7, 2014

nnz and nonzeros change #6761

Closed

ViralBShah pushed a commit that referenced this issue May 25, 2014

Reintroduce nnz and nonzeros as discussed in #6769

4fa5457

Deprecate nfilled Deprecate nonzeros for StridedArray and BitArray find()/findnz() over sparse matrices no longer filters out stored zeros

ViralBShah mentioned this issue May 25, 2014

RFC: Reintroduce nnz and nonzeros #6963

Merged

ViralBShah pushed a commit that referenced this issue May 26, 2014

Reintroduce nnz and nonzeros as discussed in #6769

90d0f0b

Deprecate nfilled for SparseMatrixCSC. Deprecate nonzeros for StridedArray and BitArray. Update documentation for structural nonzeros.

ViralBShah pushed a commit that referenced this issue May 26, 2014

Reintroduce nnz and nonzeros as discussed in #6769

4ac5e2b

Deprecate nfilled for SparseMatrixCSC. Deprecate nonzeros for StridedArray and BitArray. Update documentation for structural nonzeros.

ViralBShah pushed a commit that referenced this issue May 26, 2014

Reintroduce nnz and nonzeros as discussed in #6769

944f5ef

Deprecate nfilled for SparseMatrixCSC. Deprecate nonzeros for StridedArray and BitArray. Update documentation for structural nonzeros.

ViralBShah closed this as completed May 27, 2014

mlubin mentioned this issue Oct 26, 2014

SparseMatrix Interface #8720

Merged

stevengj mentioned this issue Feb 13, 2015

sparse-constructor does not always purge zeros #9928

Closed

jiahao mentioned this issue Mar 5, 2015

sparse setindex and stored zeros #9906

Closed

timholy mentioned this issue Apr 19, 2015

Add more documentation on array iteration #10902

Merged

Sacha0 mentioned this issue Jan 26, 2016

MIT-licensed sparse() parent method and expert driver #14798

Closed

Sacha0 mentioned this issue Jul 13, 2016

Make setindex! for sparse matrices and vectors not purge stored entries on zero assignment #17404

Merged

nnz and nonzeros #6769

nnz and nonzeros #6769

Comments

eldadHaber commented May 6, 2014

JeffBezanson commented May 6, 2014

ViralBShah commented May 6, 2014

eldadHaber commented May 6, 2014

tknopp commented May 6, 2014

JeffBezanson commented May 6, 2014

simonster commented May 6, 2014

JeffBezanson commented May 6, 2014

tkelman commented May 6, 2014

JeffBezanson commented May 6, 2014

ViralBShah commented May 7, 2014

ViralBShah commented May 7, 2014

ViralBShah commented May 7, 2014

mlubin commented May 7, 2014

mlubin commented May 7, 2014

JeffBezanson commented May 7, 2014

mlubin commented May 7, 2014

mlubin commented May 7, 2014

JeffBezanson commented May 7, 2014

mlubin commented May 7, 2014

JeffBezanson commented May 7, 2014

eldadHaber commented May 7, 2014

mlubin commented May 7, 2014

eldadHaber commented May 7, 2014

tkelman commented May 7, 2014

JeffBezanson commented May 7, 2014

mlubin commented May 7, 2014

eldadHaber commented May 7, 2014

JeffBezanson commented May 7, 2014

timholy commented May 7, 2014

StefanKarpinski commented May 21, 2014

lindahua commented May 21, 2014

jiahao commented May 21, 2014

jiahao commented May 21, 2014

mlubin commented May 21, 2014

ViralBShah commented May 21, 2014

tkelman commented May 21, 2014

tknopp commented May 22, 2014

carlobaldassi commented May 22, 2014

ViralBShah commented May 23, 2014

mlubin commented May 23, 2014

ViralBShah commented May 27, 2014