Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nnz and nonzeros #6769

Closed
eldadHaber opened this issue May 6, 2014 · 110 comments
Closed

nnz and nonzeros #6769

eldadHaber opened this issue May 6, 2014 · 110 comments
Assignees
Labels
domain:arrays:sparse Sparse arrays needs decision A decision on this change is needed
Milestone

Comments

@eldadHaber
Copy link

Someone has decided to change nnz and to remove the nonzeros command.

I find this very unacceptable!
It is a bad practice to not allow for backward compatibility.
Many of our codes use these commands and we have to go and replace them in all our previous versions.

We are investing a considerable amount of time, trying to develop codes in julia for geophysical inversion and truly believe that it is the way to go.
If new julia versions are not going to be backward compatible we will need to rethink
our development.

Please think about this whenever you want to retire a command

@JeffBezanson
Copy link
Sponsor Member

Sorry about this. You can add the following to your code:

const nnz = countnz
Base.nonzeros(s::SparseMatrixCSC) = nonzeros(s.nzval)

and then you won't need to change every use. Wrapping this in if !method_exists(nonzeros,(SparseMatrixCSC,)) will let the code work with both 0.2 and 0.3.

Personally I dislike the decision to allow explicit zeros in sparse matrices, but I think we should bring back the definition of nonzeros for sparse.

@ViralBShah
Copy link
Member

I find myself wanting to use nnz every once in a while. While we can retain stored zeros, it would be nice to have nnz and nonzeros.

@eldadHaber
Copy link
Author

My problem is not that changes are done (I'm also not a fan of this change but this is not a big deal).
My problem is backward compatibility.
We now have a relatively large investment in a julia library for geophysical inverse problems. If today a decision is made to retire a command tomorrow it may be a different one.
We cannot commit on building code when basic commands are changed.

My request for the future, please

  1. Try to not retire a command
  2. If you have a new command to do the same thing better, link the two (maybe with a recommendation to use the new one)
  3. Documentation! - There is no documentation about nonzeros not in use. The first time
    I got this is when I try to run a code that ran in the past

Thanks

@tknopp
Copy link
Contributor

tknopp commented May 6, 2014

Unless Julia has version 1.0 I don't see the issue when the API is evolving. Yes this can be painful but trying to be backward compatible in all the 0.x releases will not lead to a consistent 1.0 release.

We all know that the documentation can be improved but this is a community project. There is nobody you can blame because of missing / wrong documentation. The project lives from participation.

Independently of this specific issues I think there will be more breaking changes on the way to 1.0. If this is not ok for you (which is absolutely understandable) you might better try a programming environment that is more stable (e.g. Matlab).

@JeffBezanson
Copy link
Sponsor Member

Please continue to file issues like this for specific changes that cause problems. There is always a chance something got removed by mistake, or that we will change our minds.

nonzeros seems to have been botched a bit --- in all other cases (including nnz) we add deprecation warnings that let code continue to work. Also adding your own aliases and work-arounds like those I posted above has no overhead, so in some cases that will be the right approach.

@simonster
Copy link
Member

I think we just changed the method signature for nonzeros so that it only applies to StridedArray in c330b1d, hence the lack of a deprecation warning. The existing method was inefficient for sparse matrices, but it could be implemented efficiently by adding the definition above.

@JeffBezanson
Copy link
Sponsor Member

Yes, but there still should have been a deprecation warning or error for the removed functionality. It was an oversight.

@tkelman
Copy link
Contributor

tkelman commented May 6, 2014

And a highlight of not-very-good test coverage for sparse matrices. When this goes back in it should have a test so accidental removal without a deprecation gets caught in the future.

@JeffBezanson
Copy link
Sponsor Member

Well, the removal wasn't accidental. The tests were updated, but not the deprecations. Better test coverage would of course help in general with this kind of thing though.

@ViralBShah
Copy link
Member

Nnz does have a deprecation. That is how I keep getting reminded often that I like the name. Perhaps nonzeros may have been missed. Anyways, I will revisit this would matter soon and iron out the issues.

@ViralBShah
Copy link
Member

@tkelman You are certainly right about the lack of tests and coverage for sparse matrices. Hoping to significantly improve sparse matrix support over the next few weeks.

@ViralBShah
Copy link
Member

Cc: @mlubin @IainNZ

@mlubin
Copy link
Member

mlubin commented May 7, 2014

I don't think we should restore nnz, since it's so tempting to use yet would be the wrong choice for the majority of code that works with sparse matrices. As @tkelman mentioned in #6769, the representation details of sparse matrices are very important for writing efficient code. Users should have to explicitly choose whether they want the storage size of the matrix or the exact number of nonzero elements, which requires looping through the matrix.

Sorry @eldadHaber for the churn. The deprecation of nnz resulted from a change in the structure of SparseMatrixCSC to allow explicit nonzero entries in the matrix, which is useful in a number of applications. This means that the exact number of nonzeros cannot be computed in constant time anymore. We believed that an explicit warning would be better than silently changing an O(1) operation into a O(nonzeros) operation in users code.

@mlubin
Copy link
Member

mlubin commented May 7, 2014

Rereading the comments in #6769, I'll defer to @JeffBezanson if the consensus is that it's better to silently change the performance of nnz.

@JeffBezanson
Copy link
Sponsor Member

My view is based on looking at the other nnz (now countnz) methods, which clearly count elements such that x!=0. Of course representation details are important for performance, but nfilled is not a drop-in replacement for nnz --- it doesn't mean the same thing.

@mlubin
Copy link
Member

mlubin commented May 7, 2014

Right, it's not a drop in replacement, so the deprecation calls countnz. My point is that the name nnz is misleading for sparse matrices, because almost anyone reading the code who hasn't read the Julia documentation would assume that this is a constant-time operation on sparse matrices. The countnz name is much more explicit and properly describes what the function is actually doing.

@mlubin
Copy link
Member

mlubin commented May 7, 2014

It's an application of the principle of least surprise.

@JeffBezanson
Copy link
Sponsor Member

I consider the meaning of the operation primary, not its performance.
I guess I (admittedly not being a sparse expert) just don't see the value of trying to make people realize that nnz is O(n). If they need that function, there's nothing they can do about it, so what's the point?

Is it the case that knowing the actual number of !=0 elements is not useful, and 90% of the time people actually want the storage size? I suspect not, but if so then I might change my mind.

@mlubin
Copy link
Member

mlubin commented May 7, 2014

In my experience when writing linear algebra operations with sparse matrices, nfilled is the quantity of interest, not nnz. Compare the number of times nfilled is used in base/sparse/*, which is 28 by my count, with number of times countnz is used, twice.

@JeffBezanson
Copy link
Sponsor Member

Maybe @eldadHaber can comment on how he uses nnz?

@eldadHaber
Copy link
Author

I have a somewhat different perspective about nnz.
First, for real and complex matrices, it does not make sense to me to ask the question
nonzeos(A) != 0

As we teach in any numerical analysis course, zero is not really 0. For real matrices
we should be reporting
abs(nonzeros(A)) <= tol

So I fail to see what is the fuss about this command.
When I use nnz(A) I want to know what is the storage that is roughly needed for some variable.
The number of "true" nonzeros should be asked with a tol

@mlubin
Copy link
Member

mlubin commented May 7, 2014

It seems that this perspective falls under the nfilled use case where you were using nnz to access the number of elements stored in the matrix, nonzero or otherwise, as opposed to counting the occurrence of exact nonzeros.

@eldadHaber
Copy link
Author

I try to say that counting nonzeros (for real and complex matrices) does not make sense anyway. Sure we can have this functionality but it is misleading. If A has real values, most around 1 and 10% of the values are smaller than 1e-16 then you still report these values as nonzeros.

@tkelman
Copy link
Contributor

tkelman commented May 7, 2014

@eldadHaber tolerancing and small numerical values are a separate question, that applies to any operation involving floating point numbers. Sparse algorithms are written in such a way that explicitly stored zero values don't change the output result (assuming you don't run into any overflow or nan).

I want the storage size when I'm writing a sparse algorithm - when explicit zeros need to be there, it's almost always for matrix structural reasons. For example Pardiso requires storage to already exist for elements on the diagonal - ideally these would be stored zeros when the language allows for them, but the Matlab interface to Pardiso requires hacky inaccurate workarounds like adding eps * speye(n,n). Prior to 2009 Petsc did something similar (see https://bitbucket.org/petsc/petsc/src/cb6910d74d53/bin/matlab/PetscBinaryWrite.m#cl-31) when saving sparse matrices from Matlab into its binary format, though slightly more reliably by using a small magic number and setting any element matching that magic number back to an explicit 0 before saving the binary file.

The Matlab interface to Ipopt is another case of this. The linear solver used within Ipopt, like the majority of sparse direct algorithms, separates the structure-based symbolic factorization allocation step from the value-based numerical factorization step. You need to evaluate the sparse Jacobian and Hessian matrices at every iteration, with the same sparsity structure but varying nonzero values. What happens if one of the "structurally nonzero" elements happens to have a value of exactly zero at some intermediate iteration? If your language automatically removes zeros from sparse matrices, you have to account for this by making a copy of the entire sparse matrix data structure at every iteration to put the nonzeros in the proper place. Bad times.

@JeffBezanson
Copy link
Sponsor Member

Thanks, that perspective is very helpful. In that case maybe we should use the general count function instead of either nnz or countnz. Or sum(abs(A) .>= tol) and the like.

@mlubin
Copy link
Member

mlubin commented May 7, 2014

There was a lot of discussion on this in #5538, and I agree and have been arguing that countnz is a minority use case. That said, I don't think it's appropriate to have nnz return the value of nfilled, because, as @JeffBezanson mentioned, it's not the semantic meaning of the operation, which itself is general and is defined on other data structures. countnz could certainly be extended to use a numerical tolerance, but that's a separate issue from the function names.

@eldadHaber
Copy link
Author

I think that the tolerance should be a used choice. I'm trying to construct an example to demonstrate some of these issue

@JeffBezanson
Copy link
Sponsor Member

It would make sense to add a tol argument to countnz.

What about nonzeros? It seems one might want it to just return the nzval array, and not filter out stored zeros. In that case it could use a rename as well.

@timholy
Copy link
Sponsor Member

timholy commented May 7, 2014

@eldadHaber, just so you're aware: the "fuss" over this command arose because we, unlike Matlab, now allow you to store zeros in a sparse matrix. Check out the implications (following code is Matlab):

>> A = sprand(10^4, 10^4, 0.1);
>> [i,j] = ind2sub(size(A), find(A, 1));
>> tic; A(i,j) = 0.1; toc
Elapsed time is 0.000443 seconds.
>> tic; A(i,j) = 0.0; toc
Elapsed time is 0.037815 seconds.

Notice the second timing is one hundred fold longer than the first: if you set a non-zero entry to zero, Matlab actually copies the entire data for the matrix, deleting that one from memory. That's really expensive. Matlab does this because it doesn't allow you to have any stored value of a sparse matrix be identically zero.

In contrast, we allow stored zeros. That will* make this kind of thing infinitely more efficient. But that means there's a difference between the amount of storage required and how many entries are not identically zero. Given that matlab only has the nnz command, we realized we had better differentiate these two meanings.

But breaking old code is a no-no, and usually we are reasonably good about avoiding that because of the deprecation framework. But it seems to have failed in this case.

*Currently it looks like sparsematrix/setindex! needs updating, so it stops caring about whether the supplied value is 0.

@StefanKarpinski
Copy link
Sponsor Member

You seem to be stuck in a posting loop, @jiahao.

@lindahua
Copy link
Contributor

I am ok with either option 1 or 2. Option 3 is going to cause semantic inconsistencies, and a definitely no-go.

@jiahao
Copy link
Member

jiahao commented May 21, 2014

Sorry, there appears to be a problem with my connection.

@jiahao
Copy link
Member

jiahao commented May 21, 2014

Those conversations about why nonzeros won't give you the non-zeros of an array are going to be fun.

What if we hedge this by renaming nonzeros to nzs and define nz to be a structural nonzero element of a sparse array, which may have numerical value == 0?

@mlubin
Copy link
Member

mlubin commented May 21, 2014

I'd vote for nzval, since that's the name of the member of SparseMatrixCSC.

@ViralBShah
Copy link
Member

Option 1 is the right one I feel. For dense arrays, those methods can give suggestions to the user.

@tkelman
Copy link
Contributor

tkelman commented May 21, 2014

Giving nonzeros a more specific name for its particular purpose just for sparse arrays seems fine. Besides, there are already other names for finding the numerically nonzero values of a general array: find and findnz.

@tknopp
Copy link
Contributor

tknopp commented May 22, 2014

This is now officially the most important (commented) open Julia issue :-)

It seems that people agree that nnz is a specfic name for structural non-zeros and although nfilled is somwhat clearer, in my opinion nnz is ok as it is the "common" name and it also does not conflict with a generic name. For me the name nonzeros is too generic that it should be claimed by the sparse matrix semantics. So I am with @mlubin to name this nzval or nzvals (or nzentries?) Or simply let this issue open and use the direct field accessor to nzval for the moment.

But in any case, I don't see this issue a blocker for 0.3. I see it more as a proof that we have to shrink base and move these naming discussions into a standard library that potentially could be even decoupled from core/base Julia (not necessary technically but at least conceptually)

@carlobaldassi
Copy link
Member

Ah so I'll add some noise too. As a casual sparse matrix user, I think using nz in the names for the internal structure is a little confusing with findnz, where the abbreviation really stands for non-zeros, and which is arguably a not-uncommon operation to perform on sparse matrices.

@ViralBShah
Copy link
Member

We went with findnz because we decided long back to not have our find return the values along with the indices. It could certainly use a better name for cases not related to sparse matrices.

@mlubin
Copy link
Member

mlubin commented May 23, 2014

Maybe findnz should be renamed to sparsetriplet or something along those lines?

ViralBShah pushed a commit that referenced this issue May 25, 2014
Deprecate nfilled
Deprecate nonzeros for StridedArray and BitArray
find()/findnz() over sparse matrices no longer filters out stored zeros
ViralBShah pushed a commit that referenced this issue May 25, 2014
Deprecate nfilled
Deprecate nonzeros for StridedArray and BitArray
find()/findnz() over sparse matrices no longer filters out stored zeros
Update documentation for structural nonzeros.
ViralBShah pushed a commit that referenced this issue May 26, 2014
Deprecate nfilled
Deprecate nonzeros for StridedArray and BitArray
find()/findnz() over sparse matrices no longer filters out stored zeros
Update documentation for structural nonzeros.
ViralBShah pushed a commit that referenced this issue May 26, 2014
Deprecate nfilled for SparseMatrixCSC.
Deprecate nonzeros for StridedArray and BitArray.
Update documentation for structural nonzeros.
ViralBShah pushed a commit that referenced this issue May 26, 2014
Deprecate nfilled for SparseMatrixCSC.
Deprecate nonzeros for StridedArray and BitArray.
Update documentation for structural nonzeros.
ViralBShah pushed a commit that referenced this issue May 26, 2014
Deprecate nfilled for SparseMatrixCSC.
Deprecate nonzeros for StridedArray and BitArray.
Update documentation for structural nonzeros.
ViralBShah pushed a commit that referenced this issue May 26, 2014
Deprecate nfilled for SparseMatrixCSC.
Deprecate nonzeros for StridedArray and BitArray.
Deprecate nnz for StridedArray.
Update documentation for structural nonzeros.
Update NEWS
ViralBShah pushed a commit that referenced this issue May 26, 2014
Deprecate nfilled for SparseMatrixCSC.
Deprecate nonzeros for StridedArray and BitArray.
Deprecate nnz for StridedArray.
Update documentation for structural nonzeros.
Update NEWS
@ViralBShah
Copy link
Member

Fixed by 8879b88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays:sparse Sparse arrays needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests