Random API #131

ilya-g · 2018-07-09T13:15:55Z

Discussions about the Random API proposal will be held here.

Pull request: #132

voddan · 2018-07-22T10:51:01Z

Recently I came across a twitter thread on randomness by @colmmacc and wanted to share it here.

A short summary:

Implementations like list[rand() % list.size] provide biased distributions and should be treated as bugs.
- Example: say rand() returns an int uniformly from 0 to MAX_INT, and list.size is approximately 2/3 * MAX_INT. Then the code above will return elements from the first half of the list twice as often as from the other half.
Similar solutions with floating point numbers list[ (randDouble() * list.size).toInt() ] are also flawed because double numbers from 0.0 to 1.0 are not uniform.

The correct solution for getting a random element of a list is:

  fun random(size: Int): Int {
      val limit = RAND_MAX - RAND_MAX % size
      while(1) {
           val r = rand():

           if (r < limit) {
                return r % size;
           }
     }
  }

  val e = list[random(list.size)]

some more tweets on shuffling arrays...

I hope this will help us design Kotlin random APIs in a way that prevents such common misuses.

The original thread:
https://twitter.com/colmmacc/status/1012719876706840578

Wasabi375 · 2018-07-22T13:18:26Z

On the kotlin forum there was just a post about adding shuffle for arrays as well. There is also an issue about this.
https://discuss.kotlinlang.org/t/adding-shuffle-to-array-t-and-bytearray-intarray-et-al/8687
https://youtrack.jetbrains.com/issue/KT-25651

I think it might be worth to consider to add those functions to this proposal.

ilya-g · 2018-07-23T00:54:49Z

@voddan Thanks for the link.

We have taken care of the implementation and API, so that our users could avoid these and some other pitfalls of random number generation. For example, there are overloads of nextInt that generate an unbiased number in a given range, so one doesn't have to worry about rand() % n in their code.

If you like, you can review the implementation here.

voddan · 2018-07-23T11:45:33Z

I would like to pitch for some random sampling methods such as

fun <E> List<E>.sampleRandom(random: Random = DefaultRandom): E
fun <E> List<E>.sampleRandomList(size: Int, random: Random = DefaultRandom): List<E>
fun <E> List<E>.sampleRandomSet(size: Int, random: Random = DefaultRandom): Set<E>
fun <E> List<E>.sampleRandomSet(size: Int, weights: List<Int> = LazyList(this.size) {i -> 1}): Set<E>

The benefit of having such functions in the stdlib is preventing developers from implementing naive-but-incorrect implementations themselves. For example, for sampleRandomList it could be tempting to do something like the following, which is incorrect because the result has a different distribution than initial Random.nextInt()

fun <E> List<E>.sampleRandomList(sampleSize: Int): List<E> {
    val selectedIndices = mutableSetOf<Int>()

    repeat(sampleSize) {
        do {
            val i: Int = Random.nextInt(indices)
            val wasNew = selectedIndices.add(i)
        } while(!wasNew)
    }

    return this.slice(selectedIndices)
}

Another inefficient approach could be the implementation below, which is very easy to implement with the current proposal, and thus even more probable to be a source of problems.

fun <E> List<E>.sampleRandomList(sampleSize: Int): List<E> = shuffled().take(sampleSize)

As far as I understand, one of the correct and efficient algorithms for sampling is "Vose's Alias" algorithm, of which I am frankly totally ignorant, but you can read about it here http://www.keithschwarz.com/darts-dice-coins/

ilya-g · 2018-07-23T12:49:13Z

@voddan I think it's better to start another KEEP for this family of functions.

voddan · 2018-07-23T13:10:08Z

@ilya-g IMHO sampling and shuffling functions should go in the same KEEP. Partly because how similar they are, partly because how easy it is to (inefficiently) implement one through the other. That said, maybe it is better to move shuffling out of the scope of this proposal, and combine it with the sampling APIs.

ilya-g · 2018-07-23T14:25:42Z

@voddan shuffling is already supported in the standard library, just an overload that takes a particular RNG is missing in common. On the other hand, sampling is a completely new API with its own design questions.

chrismiller · 2018-07-30T08:33:27Z

There's an interesting article about the performance and correctness of selecting random numbers from a range here: http://www.pcg-random.org/posts/bounded-rands.html See the conclusion for what is probably the best overall algorithm (in C, might be different for the JVM).

ilya-g · 2018-07-31T18:31:32Z

@chrismiller Thanks for the article, it is very enlightening.

Currently we use the "Debiased Mod (x1)" algorithm to select numbers from a range. "Debiased Int Multiplication" looks enticing, however we don't have the efficient 32-bit int multiplication that returns the upper part of the product in Kotlin/JS, not speaking of 64-bit ints for which there is no such multiplication even in Kotlin/JVM.

We need to decide which one to use before releasing repeatable RNG, because different algorithms can produce different number sequences from the same source of random bits.

… pending

CreamyCookie · 2018-11-02T12:32:33Z

I don't know if this can be used here, but it seems like there is a better algorithm than the currently used xorwow (which is part of the Xorshift family): Xoroshiro128+ and related

xoroshiro128+ (named after its operations: XOR, rotate, shift, rotate) is a pseudorandom number generator intended as a successor to xorshift+. Instead of perpetuating Marsaglia's tradition of xorshift as a basic operation, xoroshiro128+ uses a shift/rotate-based linear transformation designed by Sebastiano Vigna in collaboration with David Blackman. The result is a significant improvement in speed (well below a nanosecond per integer) and a significant improvement in statistical quality.

Apparently xorwow also fails a few tests in the BigCrush test suite.

This is the authors site where they recommend xoshiro256** (or xoroshiro128** if only half the space needed) and present test results and (public domain) implementations.

JLLeitschuh · 2019-04-04T19:51:42Z

Given everyone who participated in this discussion, I'm guessing that you'd all be interested in this proposal as well.

Add a CSPRNG (eg. SecureRandom) to the Kotlin StdLib #184

I'd love your feedback.

hXtreme · 2020-03-01T06:43:44Z

Might I suggest including a sampleWithReplacement in the sampling API

quickstep24 · 2021-01-20T08:58:32Z

I would like to pitch for some random sampling methods such as
fun <E> List<E>.sampleRandom(random: Random = DefaultRandom): E
fun <E> List<E>.sampleRandomList(size: Int, random: Random = DefaultRandom): List<E>
fun <E> List<E>.sampleRandomSet(size: Int, random: Random = DefaultRandom): Set<E>
fun <E> List<E>.sampleRandomSet(size: Int, weights: List<Int> = LazyList(this.size) {i -> 1}): Set<E>
The benefit of having such functions in the stdlib is preventing developers from implementing naive-but-incorrect implementations themselves. For example, for sampleRandomList it could be tempting to do something like the following, which is incorrect because the result has a different distribution than initial Random.nextInt()
fun <E> List<E>.sampleRandomList(sampleSize: Int): List<E> {
    val selectedIndices = mutableSetOf<Int>()

    repeat(sampleSize) {
        do {
            val i: Int = Random.nextInt(indices)
            val wasNew = selectedIndices.add(i)
        } while(!wasNew)
    }

    return this.slice(selectedIndices)
}
Another inefficient approach could be the implementation below, which is very easy to implement with the current proposal, and thus even more probable to be a source of problems.
fun <E> List<E>.sampleRandomList(sampleSize: Int): List<E> = shuffled().take(sampleSize)
As far as I understand, one of the correct and efficient algorithms for sampling is "Vose's Alias" algorithm, of which I am frankly totally ignorant, but you can read about it here http://www.keithschwarz.com/darts-dice-coins/

Can you please explain, why the two methods you present for taking random samples are "naive-but-incorrect"? "Vose's Alias" which you cited is an algorithm for getting random numbers with a biased (non-uniform) distribution. I cannot see how this fits into your example.

hXtreme · 2021-01-23T16:10:02Z

The first one is incorrect because its slow and possibly non-terminating.

When sample size is close to list length (n) I believe the program has an expected(in the probability sense) time complexity of O(n log(n))

The second one is incorrect as it is very slow when list length is large and sample size is small.

quickstep24 · 2021-01-24T17:39:12Z

The first one is incorrect because its slow and possibly non-terminating.

When sample size is close to list length (n) I believe the program has an expected(in the probability sense) time complexity of O(n log(n))

The second one is incorrect as it is very slow when list length is large and sample size is small.

Of course, considering performance, the first is only recommendable for small subsets and the second one for large subsets. But they are both statistically sound. You indicated that they have a "different distribution than Random.nextInt" and I just wanted to point out that their results is just as good (or bad) as the built in RNG/nextInt.
Oh, and complexity near ´n´ is actually ´O(n²)´.

hXtreme · 2021-01-24T22:07:07Z

@quickstep24, @voddan was the one to suggest that it's a different distribution. I have no idea on that matter.

stavares · 2023-02-25T23:55:01Z

Just looking for an update to see if this "bug" is gonna be fixed anytime soon. I am resorting to using java.util.Random or java.security.SecureRandom

just wondering.

elizarov · 2023-02-26T20:10:34Z

stavares Just looking for an update to see if this "bug" is gonna be fixed anytime soon.

Can you clarify what exactly you are interested in?

Check out kotlin.random.Random. It's been there since Kotlin 1.3. There is no built-in support for secure random, though, but on JVM you can always do SecureRandom().asKotlinRandom() and use rich API of Kotlin's Random class in your code.

stavares · 2023-03-05T22:49:30Z

stavares Just looking for an update to see if this "bug" is gonna be fixed anytime soon.

Can you clarify what exactly you are interested in?

Check out kotlin.random.Random. It's been there since Kotlin 1.3. There is no built-in support for secure random, though, but on JVM you can always do SecureRandom().asKotlinRandom() and use rich API of Kotlin's Random class in your code.

I just wanted to know if it was ever going to be crypto secure but you answered my question. Thx.

ilya-g self-assigned this Jul 9, 2018

ilya-g added the stdlib label Jul 9, 2018

ilya-g mentioned this issue Jul 9, 2018

Random API proposal #132

Merged

ilya-g added a commit that referenced this issue Sep 12, 2018

Random (#131): add resolutions for two questions, update what remains…

5420263

… pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random API #131

Random API #131

ilya-g commented Jul 9, 2018 •

edited

Loading

voddan commented Jul 22, 2018 •

edited

Loading

Wasabi375 commented Jul 22, 2018

ilya-g commented Jul 23, 2018

voddan commented Jul 23, 2018 •

edited

Loading

ilya-g commented Jul 23, 2018

voddan commented Jul 23, 2018

ilya-g commented Jul 23, 2018

chrismiller commented Jul 30, 2018

ilya-g commented Jul 31, 2018

CreamyCookie commented Nov 2, 2018 •

edited

Loading

JLLeitschuh commented Apr 4, 2019

hXtreme commented Mar 1, 2020

quickstep24 commented Jan 20, 2021

hXtreme commented Jan 23, 2021 •

edited

Loading

quickstep24 commented Jan 24, 2021

hXtreme commented Jan 24, 2021

stavares commented Feb 25, 2023

elizarov commented Feb 26, 2023

stavares commented Mar 5, 2023

Random API #131

Random API #131

Comments

ilya-g commented Jul 9, 2018 • edited Loading

voddan commented Jul 22, 2018 • edited Loading

Wasabi375 commented Jul 22, 2018

ilya-g commented Jul 23, 2018

voddan commented Jul 23, 2018 • edited Loading

ilya-g commented Jul 23, 2018

voddan commented Jul 23, 2018

ilya-g commented Jul 23, 2018

chrismiller commented Jul 30, 2018

ilya-g commented Jul 31, 2018

CreamyCookie commented Nov 2, 2018 • edited Loading

JLLeitschuh commented Apr 4, 2019

hXtreme commented Mar 1, 2020

quickstep24 commented Jan 20, 2021

hXtreme commented Jan 23, 2021 • edited Loading

quickstep24 commented Jan 24, 2021

hXtreme commented Jan 24, 2021

stavares commented Feb 25, 2023

elizarov commented Feb 26, 2023

stavares commented Mar 5, 2023

ilya-g commented Jul 9, 2018 •

edited

Loading

voddan commented Jul 22, 2018 •

edited

Loading

voddan commented Jul 23, 2018 •

edited

Loading

CreamyCookie commented Nov 2, 2018 •

edited

Loading

hXtreme commented Jan 23, 2021 •

edited

Loading