Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

randbool() API #9105

Closed
ViralBShah opened this issue Nov 22, 2014 · 63 comments
Closed

randbool() API #9105

ViralBShah opened this issue Nov 22, 2014 · 63 comments
Labels
domain:randomness Random number generation and the Random stdlib needs decision A decision on this change is needed

Comments

@ViralBShah
Copy link
Member

Given that we have rand(Bool), which is fully general, we can deprecate randbool(). Any reasons to keep it? Performance wise, there doesn't seem to be any difference either.

Cc: @rfourquet

@ViralBShah ViralBShah added needs decision A decision on this change is needed domain:randomness Random number generation and the Random stdlib labels Nov 22, 2014
@rfourquet
Copy link
Member

Do you mean deprecating the function or only the method with no argument? In the latter case, this would imply that after deprecation period, randbool() would produce (unless we disable it) a 0-d array (via randbool(dims...)); I proposed this but @JeffBezanson wasn't favorable (#8377 (comment)).
I have no opinion on deprecating the array version randbool(dims) function in favor of rand!(BitArray(dims)).

@ViralBShah
Copy link
Member Author

I meant getting rid of it altogether. This is a valid point that randbool is quite handy for constructing random BitArrays.

julia> randbool(5,5)
5x5 BitArray{2}:
  true  true  false   true   true
  true  true  false  false   true
  true  true   true  false   true
 false  true   true  false  false
 false  true   true  false   true

julia> rand(Bool,5,5)
5x5 Array{Bool,2}:
  true   true   true  false   true
 false  false  false  false   true
 false   true  false   true  false
  true  false   true  false  false
  true  false  false  false  false

@johnmyleswhite
Copy link
Member

I'd like to see randbool go away.

Couldn't we just define rand(Bool,5,5) as producing a BitArray?

@rfourquet
Copy link
Member

I wouldn't like rand(Bool,5,5) producing a BitArray. This reminds me too much of the "std::vector<bool> mistake" of C++. After all, we sometimes need arrays of Bool and not BitArray (cf. recent thread "NullableArray Perf" on julia-dev).

@tkelman
Copy link
Contributor

tkelman commented Nov 22, 2014

We have types, so I don't think we need as many function names for closely related functionality. Re: #5986 (comment) I'd also be fine with getting rid of sprandbool assuming sprand could be given similar functionality to rand of asking for a specific type for the nonzero values.

@kmsquire
Copy link
Member

For a BitArray, how about rand(BitArray, 5, 5)?

(Now that I look at it, it doesn't seem very obvious/nice, but maybe it sparks other ideas.)

@rfourquet
Copy link
Member

With only two more characters, rand!(BitArray(5, 5)) works already...

@rfourquet
Copy link
Member

The docs of randbool([rng][, dims...]) are:

Generate a random boolean value. Optionally, generate a "BitArray" of random boolean values.

It feels like the emphasize is more on generating a scalar value. If randbool stays, I would prefer:

Generate a "BitArray" of random boolean values. If dims is not provided, generate a scalar random boolean value.

@rfourquet
Copy link
Member

Just thinking out loud: the current API rand([rng][, S][, dims...]) could be changed to rand([rng][, S][, Container][, dims...]), where Container could be Sparse(p) or Bit (default: Array), and extending S to accept Normal. Then:

rand(Bit, 2, 3) == randbool(2, 3)
rand(Normal, 2, 3) == randn(2, 3) 
rand(Sparse(p), 2, 3) == sprand(2, 3, p)
rand(Normal, Sparse(p), 2, 3) == sprandn(m, n, p)
rand([true], Sparse(p), 2, 3) == sprandbool(2, 3, p)

Of course this is more verbose.

@ViralBShah ViralBShah changed the title Deprecate randbool()? RNG API update: deprecate randbool(), rename srand() to srand!() Nov 23, 2014
@ViralBShah
Copy link
Member Author

I have another suggestion for the RNG api that came up in the argument ordering thread. srand should perhaps be called srand!, which makes a lot more sense and is consistent with other stuff.

Note that I updated the title of this issue.

@rfourquet
Copy link
Member

Is srand used a lot in other languages, besides C? If not, it could be renamed to seed!, which I find clearer.

@ViralBShah
Copy link
Member Author

srand is a name that is common to matlab users.

@andreasnoack
Copy link
Member

I'm wondering if the renaming of srand is really necessary. I don't think that modifying the state of the RNG should require a warning of the user, which is what ! is. I think that the ! versions are mainly useful when there exists a two version with the same name where one of them works in place.

@johnmyleswhite
Copy link
Member

FWIW, I don't like the idea that ! is only there for disambiguation. I think of ! as our super informal attempt to let users know where functions live in Rust's ownership model. If anything, I'd prefer that we make ownership more explicit in the future so that it's easier to write code involving threads.

@ghost
Copy link

ghost commented Nov 25, 2014

srand is a name that is common to matlab users.

While this is certainly true, it is really not that informative. Good things, it caters to Matlab refugees, it is short, it preserves some sort of relation to the rand function. Personally I would rather have seed or seedrand though since they would be far less perplexing.

@StefanKarpinski
Copy link
Sponsor Member

srand is kind of traditional and not bad. I don't really see much need to change it.

@ViralBShah
Copy link
Member Author

Ok - what about randbool?

@ivarne
Copy link
Sponsor Member

ivarne commented Nov 25, 2014

I don't think there are good reasons to have both rand(Bool) and randbool(). The BitArray vs Array{Bool} distinction is too subtle to be helpful when writing clear code. It is also not obvious which function returns which type.

@johnmyleswhite
Copy link
Member

To me, the relevant question is whether people should generally be using Array{Bool}. Given that methods like falses and trues produce BitArray objects, I'd gather that the answer is generally no.

@garborg
Copy link
Contributor

garborg commented Nov 25, 2014

It seems like it's very situation dependent (BitArrays are smaller, allocated way faster, slower to iterate through, way slower for getindex or setindex!, etc.), and the API should move toward making Array{Bool} as convenient to use as BitArray given it's a valid choice (even if was uncommon to index into 1/4 of the elements of the array or update 1/7th of the elements over its lifetime, the current API would still feel a little paternalistic). All for making it clearer which type each bool-related rand* method returns.

@StefanKarpinski
Copy link
Sponsor Member

I think it may be sufficient to allow filling an Array{Bool} object with rand!.

@garborg
Copy link
Contributor

garborg commented Nov 25, 2014

Sorry -- I was thinking of Julia 0.3, I definitely agree that the situation on 0.4 is good (other than randbool being misleading in name and description).

@ViralBShah
Copy link
Member Author

It seems like we can go with:

Deprecating randbool().
Having rand(Bool, dims...) return a BitArray.
Use rand!(Array{Bool}) if you really want an array of Bools. This can be documented.

@garborg
Copy link
Contributor

garborg commented Nov 30, 2014

I must be in the minority here, but rand(Bool, ...) producing BitArrays is the kind of change that makes a language harder for me, personally, to use. Final elaboration:

The definition of rand loses consistency, and the symmetry between between
functions rand and rand! disappears. Methods trues and falses don't produce BitArrays because they're more desirable, rather they exist because BitArray is a special-purpose (though commonly used) container fill can't target in the same way rand (as naturally defined) couldn't -- keeping randbool's current functionality somewhere that doesn't usurp rand(Bool, ...) is consistent with the existence of trues and falses and embraces/proclaims the tradeoffs of the special-purpose container. Making the change won't affect how much BitArrays are used in Base or in critical packages.

Am I wrong in thinking that the non-memory benefit of allocating a BitArray is wiped out once the 1/7th of the elements have been updated (for example)? It really so uncommon that users would be served by faster iteration, indexing, and updating, that Base should introduce these asymmetries and complicate definitions of basic functions to force users away from using Arrays of Bool when they ask for them (we're already more thoroughly exposed to BitArrays than to many pockets of the language we should be encouraged to use)?

@nalimilan
Copy link
Member

Agreed, special-casing a type is a very bad idea, especially in a language where you can often write very abstract functions parametrized on types.

@StefanKarpinski
Copy link
Sponsor Member

I'm also not sold on this change.

@ViralBShah
Copy link
Member Author

I can see the argument for not making rand behave differently. What I do find a bit weird is that randbool produces a BitArray, from a newcomer's perspective. I guess the justification can be the same as trues and falses, but the name bool suggests that one would get Array{Bool} and not a BitArray. It used to be called randbit earlier, which I think was a better name than randbool.

@ViralBShah
Copy link
Member Author

I like that too. Basically this now boils down to deprecating trues, falses and randbool, and doing nothing further. Will do so in the next few days, unless someone beats me to it.

@StefanKarpinski @JeffBezanson Thoughts?

@carlobaldassi
Copy link
Member

I vote against removing the functions and forcing the usage of more general but verbose methods. I'm likely biased since I use these methods all the time, but I really don't see what's to be gained by removing them, at the cost of reduced code readability. I'd much prefer renaming them instead. The original bitrand, bitones and bitzeros which @ViralBShah suggested seem fine to me, even though the latter two used to make more sense when BitArrays were a parametric type. bittrues and bitfalses would also be ok. To be honest, the current trues and falses are also more or less ok in my opinion, but prefixing everything with bit- seems more consistent and explicit.

I'm not convinced by the argument for plain removal by @garborg:

That sounds like maybe the nicest option for right now -- my hesitation is that as soon as there's one more container type that wants these functionalities, the export new names route will be a worse, less consistent option than just using rand!(Container(dims)), fill!(Container(dims), false), etc., consistently for non-Array containers.

My objection is that BitArrays are, in the current state of affairs, one of the core types in Julia, since they are returned as the result of all element-wise comparisons, and thus are not like any other specialized container. So it doesn't seem unjustified to have specialized methods for constructing some common special cases. One more objection is that I don't expect to see many new array-like container kinds added to Base, while most new containers of such kind would added in packages, where it is perfectly reasonable to add specialized constructors for common cases (for example, if BitArrays were not in Base and I wanted to write a package to add them, I'd surely add those constructors - the fact of not having those methods because the container is in Base seems silly to me). Consider also that spones and spzeros also exist.

Finally, there are currently the functions bitbroadcast, bitpack and bitunpack. I don't see what's the problem in having 3 more bit- functions.

@ViralBShah
Copy link
Member Author

I am ok with having many bit* methods. I just don't like the inconsistency of true being a Bool, but trues producing BitArrays, and randbool producing a BitArray instead of an array of Bools.

I agree that it is unlikely we will add more containers to base.

@nalimilan
Copy link
Member

I guess it would be OK to have both bit* methods (for the common cases), and the more general rand!(Container(dims)) and fill!(Container(dims), value).

@johnmyleswhite
Copy link
Member

I think it's not a great idea to keep things like bitones and spones around when there's clearly a more consistent abstraction to be used instead. As one obvious example, we eventually need to support more kinds of sparse arrays in Base. By using spones to generate CSC format, we bless a certain format and thereby disincentivize using the other formats.

@carlobaldassi
Copy link
Member

Probably spones is not the best parallel to make, and it also works somehow differently.

I think my argument boils down to this: bitones is useful for exactly the same reason for which ones is useful. I don't see any advantage in removing either, only (mild) annoyances.

@ViralBShah
Copy link
Member Author

I don't see why spones can only generate the CSC format. It can easily take an optional argument to create other formats.

@garborg
Copy link
Contributor

garborg commented Jan 1, 2015

But at that point, wouldn't it be better to just make ones take that optional argument, and for BitArray as well?

@tkelman
Copy link
Contributor

tkelman commented Jan 2, 2015

@garborg as was mentioned, spones is actually a bad example (and one of those annoyingly inconsistent Matlab-heritage names) in that it doesn't really do the same thing as ones - making a full matrix in sparse format is a pretty terrible idea and almost never a thing you want to do. spones(A) is more like B=copy(A); fill!(B.nzval, 1.0); return B.

@garborg
Copy link
Contributor

garborg commented Jan 2, 2015

Oops, thanks.

On Thu, Jan 1, 2015 at 6:04 PM, Tony Kelman notifications@github.com
wrote:

@garborg https://github.com/garborg as was mentioned, spones is
actually a bad example (and one of those annoyingly inconsistent
Matlab-heritage names) in that it doesn't really do the same thing as ones

  • making a full matrix in sparse format is a pretty terrible idea and
    almost never a thing you want to do. spones(A) is more like B=copy(A);
    fill!(B.nzval, 1.0); return B.


Reply to this email directly or view it on GitHub
#9105 (comment).

@ViralBShah
Copy link
Member Author

Spones is also used to make a sparse matrix of the same structure as another matrix in Matlab. I would be open to getting rid of it, of it can be merged into ones() and have a compact invocation.

@tkelman
Copy link
Contributor

tkelman commented Jan 2, 2015

spones is getting off-topic for this issue, but I would avoid trying to cram it in to an optional API of ones() since it is not the same meaning. I do use spones somewhat often when I'm doing structure-only operations, spones(spones(A)*spones(B)) is the most concise way I've found of getting the nonzero structure of a sparse matrix product without having to worry about possible cancellation. What would make more sense to me for structure-only operations is a sparse matrix type that is inherently structure-only, a graph of where the nonzero elements are but without any values associated with them (implicitly boolean).

@ViralBShah
Copy link
Member Author

We do have similar, which is a much better name than spones. I agree that it is off-topic here.

@carlobaldassi What about @rfourquet 's suggestion of having a type named Bit, that can be used as ones(Bit, dims...). I would actually love that, compared to bitones, and it would be just as compact.

@rfourquet
Copy link
Member

I wanted today to implement the Bit solution to have a better feel of it, so your message encouraged me to do so, cf https://github.com/JuliaLang/julia/compare/rf/Bit. I find it elegant that by only defining Array(::Type{Bit}, dims) and zero(Bit)/one(Bit), you automatically get rand(Bit, dims), zeros(Bit, dims) and ones(Bit, dims). I think that this is my prefered solution. Instead of having those 3 specific methods (randbool, trues, falses), the specificity is factored out in one special type, and this sound better when writing generic code.

Moreover, I personally think that it's actually a good thing that the special treatment of BitArray is removed (I see no reason to favor BitArray over Array{Bool}, on the contrary). It is a little bit more verbose, but the user has to do a conscious choice (ones(Bool, n) or ones(Bit, n)?), instead of using the more concise trues(n) (the existence of trues gives the impression that this is the "prefered" way of getting a collection of trues). I guess this will lead to using Array{Bool} by default, as Bool is a more familiar type than Bit.

The Bit type should encapsulate a Bool if we wanted something like fill(Bit(true), n) to work, which may be useful in generic code, but seemed to me rare enough that more verbose alternatives (fill!(Array(BitOrBool, n), true)) are OK for when this occurs.

@carlobaldassi
Copy link
Member

I'm sorry to be that guy, but I still don't understand what's the issue here:

  1. Everybody agrees to remove trues and falses, for clarity. So that's a non-issue at this point; nobody's actually arguing for favoring BitArray over Array{Bool} (even though it is the de facto situation right now, since when element-wise comparisons are performed the result is a BitArray).
  2. The Bit type is very weird to me, as it represents neither the element type nor the container type. It may be an elegant implementation, but I think it would be very confusing to explain. Basically, it's like adding a flag container=BitArray in the call to ones and similar functions, except that it's done in this strange way to work around type inference issues, conflating the element type and the container type. But the very possibility of conflating the two types is a peculiarity of the BitArray container, I think, so this does not seem like an extensible approach to me at all. It's also not really needed for writing generic code, since the fill! and rand! methods are available for that purpose; on the other hand, ones and similar functions are only relevant for convenience purposes.
  3. I still don't get what's wrong with adding (or rather, keeping) three specialized convenience methods for a very specialized container, provided that their names make this fact clear.

Having said all this, it seems like I'm in the minority (probably since I'm a heavy BitArray user). So to conclude, I still favour renaming, followed by removing the functions (in which case I'll just write my own package with 3 bit- functions, and possibly blackjack and hookers...), and I'm very much against the introduction of the Bit type (unless convinced otherwise).

@StefanKarpinski
Copy link
Sponsor Member

I agree with @carlobaldassi. I'm not terribly bothered by the current API – BitArray is a special type and having special constructors for it seems reasonable to me.

@garborg
Copy link
Contributor

garborg commented Jan 2, 2015

It's more likely I'm in the minority and spoke too loudly, especially given the added exports tradeoff seems ambiguous to me with the current set of containers, so I won't push any more discussion. (Other than for what seems to be in agreement: no name randbool(...)::BitArray, rand(Bool, ...) returns Array{Bool}, no bitrand()::Bool).)

@ViralBShah
Copy link
Member Author

I am ok with the special constructors for BitArray as well, but with more consistency that have a clear distinction from the Bool types.

ViralBShah pushed a commit that referenced this issue Jan 3, 2015
For a random Bool, instead of randbool(), use rand(Bool)
Instead of randbool(dims), use bitrand(dims)
ViralBShah pushed a commit that referenced this issue Jan 3, 2015
For a random Bool, instead of randbool(), use rand(Bool)
Instead of randbool(dims), use bitrand(dims)
@rfourquet
Copy link
Member

The Bit type represents the type that BitArray's elements would be if a one-bit size was possible. It can maybe feel a bit weird at first, but in usage it it's no more difficult to write ones(Bit, n) and rand(Bit, n) than bitones(n) and bitrand(n). It's a "special" type, but which is handled essentialy in Array(Bit, n) producing a BitArray(n), which I find acceptable.

That it could be "elegant" is not important, but the useful thing in it is that it's less methods/implementations to maintain.
On the genericity, I meant for example that if I'm not sure what to choose between BitArray or Array{Bool}, I can have a method taking Bit or Bool as a parameter, and then all the underlying implementation is switched very easily. If I have to use fill!/rand! instead of ones/rand to generically test both alternatives, it's more work.
But I don't have a strong opinion on this, so won't try to push the Bit type more than that.

@ViralBShah
Copy link
Member Author

I momentarily felt the temptation for the Bit type, but being neither the element type or container type as @carlobaldassi pointed out will create trouble later on.

For now, I am restricting my attention to my original goal - fixing the randbool API, which I have a PR for now.

ViralBShah pushed a commit that referenced this issue Jan 3, 2015
For a random Bool, instead of randbool(), use rand(Bool)
Instead of randbool([rng], dims), use bitrand([rng], dims)
Add randbool deprecation to NEWS
ViralBShah pushed a commit that referenced this issue Jan 3, 2015
For a random Bool, instead of randbool(), use rand(Bool)
Instead of randbool([rng], dims), use bitrand([rng], dims)
Add randbool deprecation to NEWS
ViralBShah pushed a commit that referenced this issue Jan 3, 2015
For a random Bool, instead of randbool(), use rand(Bool)
Instead of randbool([rng], dims), use bitrand([rng], dims)
Add randbool deprecation to NEWS
@ViralBShah
Copy link
Member Author

I am closing this with the randbool deprecation, which was the original topic of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:randomness Random number generation and the Random stdlib needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests