Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster and simpler sprand for SparseMatrixCSC #30494

Merged
merged 5 commits into from Jan 5, 2019

Conversation

@abraunst
Copy link
Contributor

commented Dec 22, 2018

IMPORTANT: If you need to reproduce matrices prior to this change, your best bet is to use the old version of the sprand function that you can find here.

The code for sprand for SparseMatrixCSC picks nonempty columns first, and then fill them (with the conditional probability of being nonempty). This seems to be done for faster construction for very sparse matrices (so that empty columns are to be expected). I think however that this does not give a real advantage, because one still needs to fill the colptr array, even for empty columns. I implemented a simpler version, and seems to be twice as fast or more in most cases. Am I overlooking something? Some benchmarks:

for n=(10,100,1000), p=(1.0,1e-1,1e-2,1e-3,1e-4)
    @show(n,p)
    @show @benchmark sprand($n,$n,$p)
end

MASTER:


n\p 1.0 0.1 0.01 0.001 0.0001
n=10 2.458 μs 2.135 μs 830.200 ns 704.667 ns 688.295 ns
n=100 82.119 μs 56.133 μs 13.024 μs 2.480 μs 895.000 ns
n=1000 9.556 ms 3.858 ms 555.889 μs 152.445 μs 17.662 μs

PR:


n\p 1.0 0.1 0.01 0.001 0.0001
n=10 751.574 ns 835.169 ns 419.209 ns 355.906 ns 348.586 ns
n=100 45.407 μs 32.657 μs 4.680 μs 970.100 ns 586.613 ns
n=1000 4.732 ms 3.362 ms 340.277 μs 45.028 μs 7.101 μs

Raw data

EDIT: Some simple innest-loop optimization improves speed further:


n\p 1.0 0.1 0.01 0.001 0.0001
n=10 361.784 ns 414.073 ns 210.166 ns 170.565 ns 168.729 ns
n=100 24.113 μs 17.407 μs 2.821 μs 700.679 ns 405.546 ns
n=1000 3.373 ms 1.223 ms 134.972 μs 24.485 μs 5.988 μs

@ViralBShah ViralBShah added the sparse label Dec 22, 2018

@ararslan ararslan requested a review from andreasnoack Dec 23, 2018

@haampie

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2018

Would this be considered a breaking change as the same random seed produces different sparse matrices?

@ViralBShah

This comment has been minimized.

Copy link
Member

commented Dec 23, 2018

I don't understand (have to look closely) what impact this has on the produced distribution.

It is a breaking change, but it will be in stdlib.

@stevengj

This comment has been minimized.

Copy link
Member

commented Dec 24, 2018

At first glance, it seems like this will produce the same distribution and is indeed a lot simpler.

(I don't think we should be guaranteeing specific results from given random seeds, as otherwise we won't be able to change any of the random algorithms.)

@stevengj

This comment has been minimized.

Copy link
Member

commented Dec 24, 2018

(The original algorithm is from #6726; see also the discussion in #6708. My fault for not noticing that one could simply use randsubseq(1:m*n); I was too focused on the CSC format as a collection of columns.)

@abraunst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 24, 2018

I don't understand (have to look closely) what impact this has on the produced distribution.

It is a breaking change, but it will be in stdlib.

Should be the same distribution (each of the n*m entries is nonzero independently with probability p). It outputs a different random matrix because it uses fewer random draws. In addition, it allocates only the output arrays (it needs no temp arrays).

@abraunst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 24, 2018

At first glance, it seems like this will produce the same distribution and is indeed a lot simpler.

(I don't think we should be guaranteeing specific results from given random seeds, as otherwise we won't be able to change any of the random algorithms.)

I agree. Just for the record, the following passes (I can add this or a similar test if you think it's useful)

p=0.1;m=1000;n=2000; 
for s in 1:100
    Random.seed!(s); v=randsubseq(1:m*n,p); x=zeros(m,n); x[v].=rand(length(v)); 
    Random.seed!(s); a=sprand(m,n,p); 
    @test x == a
end
@tkf

This comment has been minimized.

Copy link
Contributor

commented Dec 24, 2018

FYI, here is Numpy's policy on RNG https://www.numpy.org/neps/nep-0019-rng-policy.html (which was recently weakened). It'd be nice if Julia also has a written policy for the RNG explaining exactly what kind of compatibility guarantee should be expected.

@abraunst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 24, 2018

The following would fix #30502. Should I add it or better keep things separated?

--- a/stdlib/SparseArrays/src/sparsematrix.jl
+++ b/stdlib/SparseArrays/src/sparsematrix.jl
@@ -1383,7 +1383,7 @@ function _sparse_findprevnz(m::SparseMatrixCSC, i::Integer)
 end
 
 
-function _sprand(r::AbstractRNG, m::Integer, n::Integer, density::AbstractFloat, rfn)
+function sprand(r::AbstractRNG, m::Integer, n::Integer, density::AbstractFloat, rfn)
     (m < 0 || n < 0) && throw(ArgumentError("invalid Array dimensions"))
     0 <= density <= 1 || throw(ArgumentError("$density not in [0,1]"))
     j, colm  = 1, 0
@@ -1423,21 +1423,10 @@ julia> sprand(Float64, 3, 0.75)
   [3]  =  0.298614
    ```
 """
-function sprand(r::AbstractRNG, m::Integer, n::Integer, density::AbstractFloat,
-                rfn::Function, ::Type{T}=eltype(rfn(r,1))) where T
-    N = m*n
-    N == 0 && return spzeros(T,m,n)
-    N == 1 && return rand(r) <= density ? sparse([1], [1], rfn(r,1)) : spzeros(T,1,1)
-    _sprand(r,m,n,density,i->rfn(r,i))
-end
 
-function sprand(m::Integer, n::Integer, density::AbstractFloat,
-                rfn::Function, ::Type{T}=eltype(rfn(1))) where T
-    N = m*n
-    N == 0 && return spzeros(T,m,n)
-    N == 1 && return rand() <= density ? sparse([1], [1], rfn(1)) : spzeros(T,1,1)
-    _sprand(GLOBAL_RNG,m,n,density,rfn)
-end
+sprand(r::AbstractRNG, m::Integer, n::Integer, density::AbstractFloat, rfn::Function, ::Type{T}=eltype(rfn(r,1))) where T = sprand(r, Int(m), Int(n), density, i->rfn(r, i))
+
+sprand(m::Integer, n::Integer, density::AbstractFloat, rfn::Function, ::Type{T}=eltype(rfn(1))) where T = sprand(GLOBAL_RNG, m, n, density, rfn, T)
 
 truebools(r::AbstractRNG, n::Integer) = fill(true, n)

Additionally it would be nice to remove the T parameter from the versions accepting rfn, as you can just pass rfn returning the type you want (T would remain on the versions without rfn, of course).

@ViralBShah

This comment has been minimized.

Copy link
Member

commented Dec 25, 2018

A separate PR for #30502 would be better.

@abraunst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 25, 2018

FYI, here is Numpy's policy on RNG https://www.numpy.org/neps/nep-0019-rng-policy.html (which was recently weakened). It'd be nice if Julia also has a written policy for the RNG explaining exactly what kind of compatibility guarantee should be expected.

An interesting read, thanks. Can't say that the policy pictured is completely clear-cut though (e.g. breaking changes are "allowed with caution"). In any case it seems to restrict the required stability to "core" RNGs rather than more high level such as this one. Would you agree?

@abraunst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 25, 2018

A separate PR for #30502 would be better.

Ok, thanks. I'll wait to see the fate for this one first then.

@tkf

This comment has been minimized.

Copy link
Contributor

commented Dec 25, 2018

What I agree is that the performance improvement is awesome and we should have it (thanks for the PR!). What I'm not sure is to allow "non-core" random functions to change with less consideration. If Random.jl come up with some compatibility policy on versioning, I think it should be uniformly applied to any post-1.0 Julia packages defining random functions. In any case, I think an ideal situation would be that such libraries are completely decoupled from Julia's release cycle (though I know it's hard for Random.jl) so that "freezing" the random number algorithms per project can easily be done by a Project.toml. We can't do such decoupling at the moment so I guess I'll trust core devs decisions.

@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Jan 2, 2019

@abraunst

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

I've added some tests making output distribution of sprand crystal clear.

@ViralBShah

This comment has been minimized.

Copy link
Member

commented Jan 3, 2019

We should rebase after merging #30576 and make sure the sparse tests pass and then merge.

@ViralBShah

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

Can you rebase on master?

@abraunst abraunst force-pushed the abraunst:sprand branch from 9605324 to f6517ca Jan 4, 2019

@rfourquet rfourquet added the RNG label Jan 4, 2019

@rfourquet
Copy link
Contributor

left a comment

I don't really understand the former version and don't have time now to dive into it, but the proposed version here is straightforward (so much so that it doesn't requires any comment to understand the logic), which is a very good point in itself. As it's also generally faster, it seems like a no-brainer to merge this change... the question is when?! (wrt to non-breaking RNG policy)

@rfourquet

This comment has been minimized.

Copy link
Contributor

commented Jan 4, 2019

I'm of the impression that in the last year it was mentioned (more or less formally) more than once that streams of randomly generated numbers would be reproducible after Julia 1.0 is out, at least until stdlibs are versioned. I feel that merging this now would require a clear (official?) note on what is considered breaking wrt random numbers generation.

@ViralBShah

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

I don't think we ever made a promise around the structure of sparse random matrices. I think it is fine to merge.

@ViralBShah

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

Failure is unrelated.

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

From discussion with @fredrikekre in slack: I want to point out (which is suggested above and already discussed in oxfordcontrol/COSMO.jl#83) that this PR leads to different matrices, which causes some problems for us.

julia> versioninfo()
Julia Version 1.2.0-rc2.0
julia> using Random, SparseArrays
julia> Random.seed!(0); A=sprandn(3,3,0.2)
3×3 SparseMatrixCSC{Float64,Int64} with 4 stored entries:
  [3, 1]  =  1.57433
  [1, 2]  =  -0.688907
  [1, 3]  =  -0.762804
  [2, 3]  =  0.397482
​julia> versioninfo()
Julia Version 1.1.0
julia> using Random, SparseArrays
julia> Random.seed!(0); A=sprandn(3,3,0.2)
3×3 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 3]  =  0.586617

The proposed solution in COSMO.jl-discussion to hard-code the matrices is not optimal for us since we want to have a dynamic size. Our use-case is that we want to present randomly generated sparse matrices as examples in research papers, so that others easily can use the same examples. If the output of sprandn and other random generators are not considered stable, this becomes more complicated, i.e., we have to host a large number of files somewhere, the files need to loaded requiring MAT.jl or similar. Compare this to the sprandn-oneliner.

@jarlebring jarlebring referenced this pull request Jul 9, 2019
0 of 2 tasks complete
@stevengj

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

If the output of sprandn and other random generators are not considered stable, this becomes more complicated, i.e., we have to host a large number of files somewhere, the files need to loaded requiring MAT.jl or similar. Compare this to the sprandn-oneliner.

You could provide your own random-number generator. For your application, it sounds like you don't care too much about the statistical quality of the pseudorandom numbers, so you could supply e.g. a 32-bit linear congruential generator with a fixed seed in a few lines of code:

using Random
mutable struct LCG <: AbstractRNG
    state::UInt32
end
Base.rand(r::LCG, ::Type{UInt32}) =  r.state = 0x0019660d * r.state + 0x3c6ef35f
Base.rand(r::LCG, ::Random.SamplerTrivial{Random.UInt52{UInt64},UInt64}) = (rand(r, UInt32) % UInt64) + ((rand(r, UInt32) % UInt64) << 20)

at which point you can do e.g. sprand(LCG(0), 10^6, 10^6, 1e-5)

It might be good to have a package with a simple LCG random-number generator for applications such as this.

@stevengj

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

No, I take that back, even if the output of the RNG is stable, I'm don't think we want to guarantee the exact way in which the RNG is used (i.e. the exact sequence of calls) in a function like sprand (that was the change in this PR), so providing your own RNG doesn't help in your case.

It seems like you really need to just host the files (or your own algorithm to generate matrices). Depending on the precise algorithm used by sprand seems very problematic for reproducibility of a research paper.

(You could define your own file format with e.g. m, n, nnz, I, J, V stored in sequence, and provide a function to read it, pretty easily if you don't want to depend on JLD or similar.)

@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

One could use an RNG to generate I, J, V vectors and define a sparse matrix in terms of that instead of using sprand and you could even wrap that up as your own mysprand and use it:

using Random, SparseArrays

function mysprand(rng::AbstractRNG, m::Int, n::Int, p::Real)
    N = round(Int, p*m*n)
    sparse(rand(rng, 1:m, N), rand(rng, 1:n, N), rand(rng, N), m, n, (a, b) -> a)
end

mutable struct LCG <: AbstractRNG
    state::UInt32
end
Base.rand(r::LCG, ::Type{UInt32}) =  r.state = 0x0019660d * r.state + 0x3c6ef35f
Base.rand(r::LCG, ::Random.SamplerTrivial{Random.UInt52{UInt64},UInt64}) = (rand(r, UInt32) % UInt64) + ((rand(r, UInt32) % UInt64) << 20)

S = mysprand(LCG(123), 10^5, 10^4, 0.0001)
@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

In our use-case: The mysprand code you provide would not be suitable in an applied math research paper of the type I work with. It's too long and complicated. I want to encourage readers to verify my simulations and maybe use it in their own research. The 0x3c6ef35f is scary even for experienced julia users.

I agree that you can generate the sparse matrices from rand or randn instead of sprandn in a less scary way. Still, I would slightly prefer to be able to say "the matrices were generated with sprandn(n,n,0.5)" rather than "the matrices were generated by sparse(rand(1:m, N), rand(1:n, N), rand(N), m, n, (a, b) -> a)" since it is also more technical, requires more julia knowledge and some understanding of the CSC-format.

@stevengj

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

The mysprand code you provide would not be suitable in an applied math research paper of the type I work with. It's too long and complicated.

Many journals these days allow you to post code as supplementary information attached to the paper. 11 lines of code is not long and complicated in such a context.

Still, I would slightly prefer to be able to say "the matrices were generated with sprandn(n,n,0.5)"

Yes, definitely you want to be able to clearly state the distribution of the test matrices that were used, and (our) sprand has a simple well-defined distribution that is a stable part of the API (unlike Matlab's). (However, many linear-algebra papers would probably want a different distribution of test matrices, e.g. with some guarantees about the condition number. Or just use test matrices from the NIST Matrix Market.)

But I really think you need to provide either code generating the matrix or post the matrix as a file if you want readers to be confident that they will be able reproduce exactly the same matrices and not just the same distributions, especially readers from many years in the future. Requiring us to forever produce exactly the same matrix for a given RNG is really too constraining on the implementation, and is not something you would want to rely on in a paper anyway.

@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

The mysprand code you provide would not be suitable in an applied math research paper of the type I work with.

The body is literally only two lines. You could golf it down to one pretty easily, e.g.:

mysprand(rng::AbstractRNG, m::Int, n::Int, p::Real, N::Int=round(Int, p*m*n)) =
    sparse(rand(rng, 1:m, N), rand(rng, 1:n, N), rand(rng, N), m, n, (a, b) -> a)

The 0x3c6ef35f is scary even for experienced julia users.

You don't need the LCG if all you want is predictable sparse matrix construction, you can just use the default RNG, which lets you pare the mysprand definition down to this:

mysprand(m::Int, n::Int, p::Real, N::Int=round(Int, p*m*n)) =
    sparse(rand(1:m, N), rand(1:n, N), rand(N), m, n, (a, b) -> a)

If you want to simplify further, there's no need for type annotations:

mysprand(m, n, p, N=round(Int, p*m*n)) =
    sparse(rand(1:m, N), rand(1:n, N), rand(N), m, n, (a, b) -> a)

I have a hard time believing that this is too scary for anyone.

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

I have a hard time believing that this is too scary for anyone.

Thanks for the code. It was already clear to me that one can make it easier. In my case this is still too much (julia-specific) code to put in a numerical linear algebra paper. As a reader I would see it as a distraction and puts a disproportionate amount of attention on the generation of the example matrices. Even those two lines of code in a manuscript would be too much for my taste. Putting it as a separate attached file would also likely be frowned upon, considering it is just a random example.

Yes, definitely you want to be able to clearly state the distribution of the test matrices that were used, and (our) sprand has a simple well-defined distribution that is a stable part of the API (unlike Matlab's).

In my case, it is more important that the function is stable over julia versions. It is not so important what distribution is used. A stable distribution does not help me much if the way the sampling is done changes, since I want reprodicability.

I think we will not get anywhere further in this discussion and I don't think this is my decision. I can live with whatever the outcome. Whoever makes the decision should be aware that it might annoy some people who assumed sprand is stable (and hence used it in papers where it now cannot be modified) and that some unit tests in packages may fail (and need updates to pass on both 1.0 and 1.2). I doubt NonlinearEigenproblems.jl and COSMO.jl would be the only cases.

@abraunst

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

I have a hard time believing that this is too scary for anyone.

Thanks for the code. It was already clear to me that one can make it easier. In my case this is still too much (julia-specific) code to put in a numerical linear algebra paper. As a reader I would see it as a distraction and puts a disproportionate amount of attention on the generation of the example matrices. Even those two lines of code in a manuscript would be too much for my taste. Putting it as a separate attached file would also likely be frowned upon, considering it is just a random example.

Yes, definitely you want to be able to clearly state the distribution of the test matrices that were used, and (our) sprand has a simple well-defined distribution that is a stable part of the API (unlike Matlab's).

In my case, it is more important that the function is stable over julia versions. It is not so important what distribution is used. A stable distribution does not help me much if the way the sampling is done changes, since I want reprodicability.

I think we will not get anywhere further in this discussion and I don't think this is my decision. I can live with whatever the outcome. Whoever makes the decision should be aware that it might annoy some people who assumed sprand is stable (and hence used it in papers where it now cannot be modified) and that some unit tests in packages may fail (and need updates to pass on both 1.0 and 1.2). I doubt NonlinearEigenproblems.jl and COSMO.jl would be the only cases.

What about putting the matrix-generating code in your own small package and refer to it in the paper so people needing it for testing can just clone it and use it?

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

Requiring us to forever produce exactly the same matrix for a given RNG is really too constraining on the implementation, and is not something you would want to rely on in a paper anyway.

You are maybe right it's bad to rely on it. It is however not an unusual practice in my field where people have been relying on MATLAB. The generated matrices have not changed there for a very long time. This has been quite convenient. I understand if someone really needs an efficient sprandn, it outweighs this reason.

Unless: Do you mean that you would also be open to modify the randn function for the same reason in the future?

@KristofferC

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

Don't you say what julia version the code is run on?

Anyway, see this for a discussion on pretty much exactly this: https://www.numpy.org/neps/nep-0019-rng-policy.html

Especially:

The standard practice now for bit-for-bit reproducible research is to pin all of the versions of code of your software stack, possibly down to the OS itself. The landscape for accomplishing this is much easier today than it was in 2008. We now have pip. We now have virtual machines. Those who need to reproduce simulations exactly now can (and ought to) do so by using the exact same version of numpy. We do not need to maintain stream-compatibility across numpy versions to help them.

Our stream-compatibility guarantee has hindered our ability to make improvements to numpy.random. Several first-time contributors have submitted PRs to improve the distributions, usually by implementing a faster, or more accurate algorithm than the one that is currently there. Unfortunately, most of them would have required breaking the stream to do so. Blocked by our policy, and our inability to work around that policy, many of those contributors simply walked away

If you want reproducibility the way to do it is to give a manifest and a julia version.

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

Don't you say what julia version the code is run on?

Yes. I understand this means it is reproducible. If someone wants to try a different algorithm on my matrices, they would have to run an old version of julia. Doable, but this obstacle does not encourage others use my example. In NonlinearEigenproblems.jl we have benchmark examples which are randomly generated, for the only reason that one should be able to easily compare algorithms on identical examples. They were designed under the (apparently incorrect) assumption that the random number generators are always ǵenerating the same sequence indep of julia version .

Anyway, see this for a discussion on pretty much exactly this: https://www.numpy.org/neps/nep-0019-rng-policy.html

Nice. A corresponding documentation for julia random numbers would be nice.

In particular, somewhere it would be good to point out that unit tests should never use any random number generators since the sampling is not stable. This will cause problems for eg ArnoldiMethod.jl (the starting vectors are random).

@stevengj

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

unit tests should never use any random number generators since the sampling is not stable

Certainly you shouldn’t use tests that depend sensitively on the specific random sequence. It should be fine to use tests that are robust to different samples of the same distribution, but this is a good property to have for your algorithms anyway!

(You probably still want to fix the random seed, since if the tests are run millions of times you may get a very unlikely sample, such as a poorly conditioned matrix, that breaks your tests. But this caveat shouldn’t apply to changes in the random streams that occur a handful of times over the lifetime of Julia.)

@stevengj

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

(Similarly, if your paper shows test results from an sprand matrix, and I don’t get very similar results from most matrices drawn from the same distribution, then I question the validity of your result. In cases where the test matrix is supposed to be special, you need to post the matrix—it’s crazy to me to have a linear algebra paper where reproducibility depends on a particular language/library, much less a particular software version.)

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

Certainly you shouldn’t use tests that depend sensitively on the specific random sequence

You make this sound like this should have been obvious. I think these things can be very hidden and developers have not taken this into account. A policy document would have helped.

A github search gives approx 50 hits on randn in the stdlib tests. I doubt all of these tests would pass if you completely changed the sequence randn generates, and that's just in stdlib.

I could even argue in this PR: the jldoctest is wrong since it depends on the specific random sequence. See line 1487. Depends in a sensitive way on randn sequence. Consider a later PR which changes the randn-sequence because of efficiency. They will make the sprandn test fail:

```jldoctest; setup = :(using Random; Random.seed!(0))
julia> sprandn(2, 2, 0.75)
2×2 SparseMatrixCSC{Float64,Int64} with 2 stored entries:
  [1, 1]  =  0.586617
  [1, 2]  =  0.297336

Julia already got some bad reputation before 1.0 that things were changing all the time (mostly unjustified according to me). Changing the random sequence will break the code for some people. At the moment this change is not even in the NEWS-file.

@KristofferC

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

I could even argue in this PR: the jldoctest is wrong since it depends on the specific random sequence. See line 1487. Depends in a sensitive way on randn sequence. Consider a later PR which changes the randn-sequence because of efficiency. They will make the sprandn test fail

That's the intention of that test.

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

That's the intention of that test.

I don't think a sprandn unit test should test for valid changes in randn. If someone changes the sequence in randn (in package Base) you will obtain a failed test in sprandn (in package SparseArrays) although there is actually nothing wrong in either package, following the policy that the randn-sequence is allowed to change.

@stevengj

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

Documentation tests are different. If the sprand documentation shows a specific example, the example needs to be updated if the random stream changes. The doctest ensures that this is done.

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

Documentation tests are different. If the sprand documentation shows a specific example, the example needs to be updated if the random stream changes. The doctest ensures that this is done.

I think you are dismissing this too lightly. If SparseArrays was in a different repo and i.e. separate versioning cycles, how is that developer expected to make it pass on two julia versions that generate different randn-sequences? Note that sprandn calls randn. Maintaining different package versions for different julia version? If-statements on version numbers? Maybe acceptable considering its a rare situation. It's an example of a hidden complication.

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

I feel you are anyway missing my main point. To quote myself

Changing the random sequence will break the code for some people. At the moment this change is not even in the NEWS-file.

Independent of the jldoctest-example, I'm a bit surprised that nobody in this discussion has acknowledged that this can cause trouble for some developers, and that it may deserve to be mentioned in the julia 1.2 NEWS-file, at least. Just the fact that you detected that COSMO.jl's unit tests started fail, would have been enough to qualify as a mention in the NEWS-file in my opinion. A policy document would also reduce the friction, or just a reference to numpys policy document if that is accurate also for julia.

@jarlebring

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

(Personally, I don't even feel that strongly about this, although it will make this impression due to this long thread. It's not a big deal for us since I know how to fix the problems in our case and we will drop 1.0-support before our main release. I was more concerned that julia again can be criticized to make changes that break developers code. That criticism should be gone after 1.0.)

@KristofferC

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

Just the fact that you detected that COSMO.jl's unit tests started fail, would have been enough to qualify as a mention in the NEWS-file in my opinion.

I was more concerned that julia again can be criticized to make changes that break developers code. That criticism should be gone after 1.0.)

Any change at all has the possibility to "break" code (https://xkcd.com/1172/). The term "breaking code" is used when a change in julia makes code that is * confirming to the parts of Julia that the developers of Julia voluntarily determined should work in the same way in future releases *.

Even so, a NEWS entry for this as well as a policy for rng would/is likely a good idea.

@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Jul 11, 2019

I'm a bit surprised that nobody in this discussion has acknowledged that this can cause trouble for some developers, and that it may deserve to be mentioned in the julia 1.2 NEWS-file, at least.

It definitely needs to be in the NEWS file. If that's all you're asking for then certainly no one disagrees with that. The people here are acutely aware problems this may cause: we run the test suites of all registered Julia packages for new releases and make sure that they are fixed to pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.