Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make rand work with AbstractArray instead of only with Range #8309

Merged
merged 9 commits into from
Oct 1, 2014

Conversation

rfourquet
Copy link
Member

This is essentially two changes:

  1. the semi-public API to get a RandIntGen object generating random integers in 1:n is via a call to randintgen(n). This is to be able in particular to extend the mechanism to BigInt for which the existing type RandIntGen doesn't fit well (cf. fix infinite loop in rand(::Range(BigInt)) #8255). In the process, RandIntGen was simplified by making the first element of the range equal to 1: this was only used with this value AFAICS, and this was duplicating indexing logic as noted by @ivarne (who found that this change should not break packages in METADATA.jl). It seems this was first implemented in commit ab97911 by @lindahua: was there a use for the first element field .a that I overlooked?
  2. allow rand(r) to work for any indexable r, not only with ranges (e.g. rand(["head", "tail"]). It is really the continuation of commit 48f27bc, and overlaps a bit with rand vs. sample #6003. This change is so small that I think it is harmless, whatever is done wrt rand vs. sample #6003.

ps: I think the random.jl tests are not very safe as the seed is set the same at each run. Could the content of __init__ function from base/random.jl be made a public function named e.g. srand(), which could be used in the tests?

@@ -160,24 +160,24 @@ maxmultiple(k::Uint128) = div(typemax(Uint128), k + (k == 0))*k - 1
maxmultiplemix(k::Uint64) = convert(Uint64, div((k >> 32 != 0)*0x0000000000000000FFFFFFFF00000000 + 0x0000000100000000, k + (k == 0))*k - 1)

immutable RandIntGen{T<:Integer, U<:Unsigned}
a::T # first element of the range
k::U # range length or zero for full range
k::U # range length (or zero for full range)
u::U # rejection threshold
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still get any benefit from having both T and U as type parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random generation logic is implemented only with 3 types (parameter U): Uint32, Uint64, Uint128, and the generation for all types (param. T) is implemented in term of those. An instance r of RandIntGen needs to know what type of number to produce, in a call like rand(r), this is given by T. It's probably possible to have only one type parameter, but I felt that was beyond the scope of this PR.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That explains it. Thanks!

@ivarne ivarne mentioned this pull request Sep 11, 2014
@ivarne
Copy link
Sponsor Member

ivarne commented Sep 11, 2014

This change looks good to me. Any objections?

@nolta
Copy link
Member

nolta commented Sep 14, 2014

Why are you removing the a field from RandIntGen? It seems harmless.

@rfourquet
Copy link
Member Author

The a field was harmless, but also appears useless to me, at least now: the only purpose of RandIntGen is to provide a random indice in 1:n to index an arbitrary range (and now any array-like collection, with the third commit of this PR). However I asked in the opening comment if there was a use for this field that I didn't catch.

@rfourquet rfourquet changed the title Rand range Make rand work with AbstractArray instead of only with Range Sep 14, 2014
@rfourquet
Copy link
Member Author

I updated the title of this PR as it had been set automatically according to the non very descriptive branch name.

@ViralBShah
Copy link
Member

Cc: @andreasnoack

@rfourquet
Copy link
Member Author

Actually, I also wanted to restrict the possibility for the field k to be zeroed to mean "full range". This feature was not used in RandIntGen constructors accepting a range (renamed randintgen here). I will do that in a new commit in this PR, which I could remove independently if not wanted.

@ivarne
Copy link
Sponsor Member

ivarne commented Sep 14, 2014

The a field increases complexity and likely decreases performance. It is not used in Packages, and all uses in base had a=1 has a prettier fix now.

I think that our rand(Type{Int}) familiy of functions removes the need for 0 = "full range" functinality.

@rfourquet
Copy link
Member Author

Updated PR. Yes rand{Integer}(T) is the way to get "full range" numbers. So the maxmultiple functions are slightly simplified by not handling k==0. I also rewrote maxmultiplemix in a less obscure way (I can revert that) and adjusted the condition to match that of rand{T}(g::RandIntGen{T,Uint64}) (that was not a bug however).

@simonster
Copy link
Member

I would like to switch the argument order for rand! so that the output comes first (#8246). Right now we can deprecate the old argument order and not break anything, but if we allow an AbstractArray as the first argument to rand! then switching the order later would break code.

@ivarne
Copy link
Sponsor Member

ivarne commented Sep 14, 2014

+1 for @simonster's suggestion. That will mean that we can't extend the signature for rand! to AbstractArray until 0.6, because of the deprecation period in 0.4 and a error period in 0.5 in order to ensure that everyone changes the order of arguments before we can give new meaning to the old signature.

@simonster
Copy link
Member

I don't think it's likely to cause problems if we simultaneously deprecate rand!(::Range, ::AbstractArray) and extend rand! to AbstractArray with the mutated argument first. It's not possible to mutate a Range, so whether you call rand!(::Range, ::AbstractArray) or rand!(::AbstractArray, ::Range) it's clear what you mean.

@rfourquet
Copy link
Member Author

@simonster I prefer your solution to do it now! however it's then more a breaking change forcing users to change their code all at once instead of having a depreciation warning.

So should I revert this changed rand! signature (delete, comment out, or maybe put it in a submodule to be able to use it), with a TODO note saying to apply the change after #8246 is resolved? I prefer that instead of having this PR depend on that issue.

@simonster
Copy link
Member

My point is that you can still have the deprecation warning. There is no valid signature for rand! with which it would conflict.

@rfourquet
Copy link
Member Author

OK, I missed that there would both be rand!(r::Range, A::AbstractArray) (mutates A, deprecated) and rand!(A::AbstractArray, r::AbstractArray) (mutates A).

@rfourquet
Copy link
Member Author

I removed rand!(A::AbstractArray, r::AbstractArray) (renamed to rand!(A::AbstractArray, r::AbstractArray, ::()) so it can still be used by rand. If this is an accepted temporary solution, I can rebase this branch to avoid introducing this method in the first place.

@rfourquet rfourquet force-pushed the rand_range branch 2 times, most recently from 78a1f14 to 91b58f7 Compare September 17, 2014 04:41
@rfourquet
Copy link
Member Author

Rebased.

RandIntGen{T<:Unsigned}(r::UnitRange{T}) = isempty(r) ? error("range must be non-empty") : RandIntGen(first(r), convert(T,last(r) - first(r) + 1))
# generator API
# randintgen(k) returns an object generating random integers in the range 1:k
randintgen{T<:Unsigned}(k::T) = k<1 ? error("range must be non-empty") : RandIntGen(T, k)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should be more descriptive to the current situation where the range is gone.

Should the invariant (k>0) be checked in a inner constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivarne: you are right that the message is not ideal, but I didn't know the best way to change it. The invariant in an inner constructor seems like a good idea. From the user POV, the range is gone not because there is no an a field anymore, but because she can use other things (array-like) than ranges. The message could be "the collection passed to rand/rand! must not be empty" ("collection" is used in the docs).

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I did not suggest what the message should be. RandIntGen doesn't know there is a collection, so that feels wrong too. How about?

error("No integers in the range [1, $k]. Did you try to pick an element from a empty collection?")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution.

@ivarne
Copy link
Sponsor Member

ivarne commented Sep 28, 2014

Bump! Any objections to me merging this in a few days?

@ViralBShah
Copy link
Member

I think this looks good to merge. Perhaps give a day for @lindahua @andreasnoack to take a look?

@rfourquet
Copy link
Member Author

This is not ready for merging.

I rebased to accomodate with recent @JeffBezanson changes, so please review if I didn't mess up with convert/itrunc/reinterpret.

The tests don't pass now: the problem is that length(typemin(T):typemax(T))==0 for some values of T (e.g. T=Int8). I could re-enable special handling of "full-range" to make tests pass again, but I feel this problem must be addressed more generally, hence #8531. I'll wait for input to go ahead.

@rfourquet
Copy link
Member Author

Once the problem of length is fixed for "small" integers, my preferred solution would be to simply remove failing tests: for example the range r=typemin(Int):typemax(Int) is not fully indexable (there is no i such that r[i] returns the last element typemax(Int), and I don't think ranges should be handled in a special way in random.jl. Later on it would be possible to add a function length_wide which can always hold the length of the object (and can always index into this object if it's indexable), which would return e.g. an Int128 for Int ranges, and BigInt for Int128 ranges.

This is based on @ivarne idea
(cf. JuliaLang#8255 (comment)),
and continues commit 48f27bc in "avoiding duplicating the getindex logic".

The API to get a RandIntGen object is now to call randintgen(n).
This allows any Integer type to implement this function (e.g. BigInt).
Previously, a call like rand(big(1:10)) caused a stack overflow, it is now
a "no method matching" error, until randintgen(::BigInt) is implemented,
possibly using a new type similar to RandIntGen.
@mschauer
Copy link
Contributor

mschauer commented Oct 1, 2014

Sorry I missed the discussion, but I think

yes rand{Integer}(T) is the way to get "full range" numbers

is not the right perspective. The full range isn't "not really a range", in fact a lot of work was done to make full ranges working as other ranges. Also #5550.

@rfourquet
Copy link
Member Author

I don't think I said that a full range is "not really a range", and I want rand(typemin(Int):typemax(Int) to work again (and I plan to do so). In spirit, this PR was about going away from range-centric random generation, and the more general functionality relies now on length. So I think the way to solve the problem is to settle on a new function which can express the length of every AbstractArray, including ranges. I see 3 possibilities at the moment: length_wide mentioned above, a length_minus_1 function which can express the length of every non-empty ranges (so it would probably be of an unsigned type), or an index function which would return a UnitRange (at least for "dense indexables") giving the first and last valid index of a collection (this would allow to not exclude indexables with indexes starting at 0 or at a negative integer). The problem of "full range" must be solved generally, not only for rand.

@mschauer
Copy link
Contributor

mschauer commented Oct 1, 2014

I was taking about the sentiment "full ranges are not really a range", you are quoted with ">". Anyway, thanks for planning to fix this, it took me some work. The old version - you saw it - used a representation corresponding to a function length_nonempty(r) = isfull(r) ? length(r) : 0 which works together with the modulo_knuth function and was quite fast, without escaping to Int128 or even BigInt.

@JeffBezanson
Copy link
Sponsor Member

So if I understand, this commit makes some calls to rand on ranges that used to work, no longer work? What exactly is the benefit of that?

I'm all for generality, but removing functionality in the mean time doesn't help. This also caused a performance problem (#8563). It really looks like this should be reverted.

ivarne added a commit that referenced this pull request Oct 3, 2014
cc @rfourquet

This reverts commit 7e87797.
This reverts commit c754a60.
This reverts commit 38bef62.
This reverts commit d9814ff.
This reverts commit 91aa94f.
This reverts commit 57bd2b7.
This reverts commit 9d0282e.
This reverts commit 6d329ce.
This reverts commit e2b14a1.
@ivarne
Copy link
Sponsor Member

ivarne commented Oct 3, 2014

OK, I reverted this in 6117328
@rfourquet I'm really sorry how this turned out, but I agree with Jeff that it is bad to break functionality temporary. To get your work back from this revertion, just do git revert 6117328, and submitt a new PR when you are ready.

@JeffBezanson
Copy link
Sponsor Member

The other aspects of this change are fine, if they can be done in a way
that keeps all functionality. Nothing wrong with having special methods for
ranges; we are all about stuff like that.

@rfourquet
Copy link
Member Author

@ivarne no I'm sorry, it's all on me!
In this PR, some calls to rand on ranges no longer worked indeed, but only in those cases where length(range) didn't work (which calls for a solution in a broader context than random generation). The advantage was new functionalities, which I thougth to be worth enough to loose rand(typemin(Int):typemax(Int)).

@mschauer
Copy link
Contributor

mschauer commented Oct 5, 2014

Am I missing something or can't we have just both?

@rfourquet
Copy link
Member Author

@mschauer yes of course! I'm now working on a proposition, which could also handle more ranges, like rand(typemin(Int):1:typemax(Int)) etc.

@StefanKarpinski
Copy link
Sponsor Member

Unfortunately, we can't just weigh functionality gained against functionality lost – they are asymmetrical since people may be depending on existing functionality. Thus we can't break things that currently work without serious consideration.

rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014
This reverts commit 2abb59b.
This reverts commit 6117328.
rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014
This reverts commit 2abb59b.
This reverts commit 6117328.
rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014
This reverts commit 2abb59b.
This reverts commit 6117328.
rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014
This reverts commit 2abb59b.
This reverts commit 6117328.
rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014
This reverts commit 2abb59b.
This reverts commit 6117328.
@rfourquet rfourquet mentioned this pull request Oct 10, 2014
rfourquet added a commit that referenced this pull request Nov 18, 2014
This is a rewrite of 6d329ce, 9d0282e and d9814ff (from #8309,
which was reverted).
@ViralBShah ViralBShah added system:32-bit Affects only 32-bit systems domain:randomness Random number generation and the Random stdlib labels Nov 22, 2014
rfourquet added a commit that referenced this pull request Nov 22, 2014
This is a rewrite of part of #8309 and #8649.
The goal is to be able to implement the BigInt case more efficiently.
waTeim pushed a commit to waTeim/julia that referenced this pull request Nov 23, 2014
This is a rewrite of 6d329ce, 9d0282e and d9814ff (from JuliaLang#8309,
which was reverted).
rfourquet added a commit that referenced this pull request Dec 11, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange are handled
differently so that those with overflowing length still work.
In particular two rand! method are merged into one.
rfourquet added a commit that referenced this pull request Dec 11, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange are handled
differently so that those with overflowing length still work.
In particular two rand! method are merged into one.
rfourquet added a commit that referenced this pull request Dec 11, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange ranges are handled
differently so that those with overflowing length still work).
In particular two rand! method are merged into one.
rfourquet added a commit that referenced this pull request Dec 12, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange ranges are handled
differently so that those with overflowing length still work).
In particular two rand! method are merged into one.
rfourquet added a commit that referenced this pull request Dec 24, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange ranges are handled
differently so that those with overflowing length still work).
In particular two rand! method are merged into one.

Previously, RangeGenerator objects could create
(scalar of array of) random values in a range a:b, taking care of
creating first a random value in 0:b-a and then adding a. Choosing a
random value in an AbstractArray A was then using A[rand(1:length(A))].
Here, RangeGenerator is changed to only handle the creation of random
values in a range 0:k, and determining the right value of k
(length(A)-1 or b-a) and picking the right element using the random
value v (A[1+v] or a+v) is left to separate (and minimal) methods.
Hence Range and AbstractArray are handled as uniformly as possible,
given that we still want to support ranges like
typemin(Int):typemax(Int) for which length overflows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:randomness Random number generation and the Random stdlib system:32-bit Affects only 32-bit systems
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants