Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random API #131

Open
ilya-g opened this issue Jul 9, 2018 · 19 comments
Open

Random API #131

ilya-g opened this issue Jul 9, 2018 · 19 comments
Assignees
Labels

Comments

@ilya-g
Copy link
Member

ilya-g commented Jul 9, 2018

Discussions about the Random API proposal will be held here.

Pull request: #132

@ilya-g ilya-g self-assigned this Jul 9, 2018
@ilya-g ilya-g added the stdlib label Jul 9, 2018
@voddan
Copy link
Contributor

voddan commented Jul 22, 2018

Recently I came across a twitter thread on randomness by @colmmacc and wanted to share it here.

A short summary:

  • Implementations like list[rand() % list.size] provide biased distributions and should be treated as bugs.

    • Example: say rand() returns an int uniformly from 0 to MAX_INT, and list.size is approximately 2/3 * MAX_INT. Then the code above will return elements from the first half of the list twice as often as from the other half.
  • Similar solutions with floating point numbers list[ (randDouble() * list.size).toInt() ] are also flawed because double numbers from 0.0 to 1.0 are not uniform.

  • The correct solution for getting a random element of a list is:

      fun random(size: Int): Int {
          val limit = RAND_MAX - RAND_MAX % size
          while(1) {
               val r = rand():
    
               if (r < limit) {
                    return r % size;
               }
         }
      }
    
      val e = list[random(list.size)]
    
  • some more tweets on shuffling arrays...

I hope this will help us design Kotlin random APIs in a way that prevents such common misuses.

The original thread:
https://twitter.com/colmmacc/status/1012719876706840578

@Wasabi375
Copy link
Contributor

On the kotlin forum there was just a post about adding shuffle for arrays as well. There is also an issue about this.
https://discuss.kotlinlang.org/t/adding-shuffle-to-array-t-and-bytearray-intarray-et-al/8687
https://youtrack.jetbrains.com/issue/KT-25651

I think it might be worth to consider to add those functions to this proposal.

@ilya-g
Copy link
Member Author

ilya-g commented Jul 23, 2018

@voddan Thanks for the link.

We have taken care of the implementation and API, so that our users could avoid these and some other pitfalls of random number generation. For example, there are overloads of nextInt that generate an unbiased number in a given range, so one doesn't have to worry about rand() % n in their code.

If you like, you can review the implementation here.

@voddan
Copy link
Contributor

voddan commented Jul 23, 2018

I would like to pitch for some random sampling methods such as

fun <E> List<E>.sampleRandom(random: Random = DefaultRandom): E
fun <E> List<E>.sampleRandomList(size: Int, random: Random = DefaultRandom): List<E>
fun <E> List<E>.sampleRandomSet(size: Int, random: Random = DefaultRandom): Set<E>
fun <E> List<E>.sampleRandomSet(size: Int, weights: List<Int> = LazyList(this.size) {i -> 1}): Set<E>

The benefit of having such functions in the stdlib is preventing developers from implementing naive-but-incorrect implementations themselves. For example, for sampleRandomList it could be tempting to do something like the following, which is incorrect because the result has a different distribution than initial Random.nextInt()

fun <E> List<E>.sampleRandomList(sampleSize: Int): List<E> {
    val selectedIndices = mutableSetOf<Int>()

    repeat(sampleSize) {
        do {
            val i: Int = Random.nextInt(indices)
            val wasNew = selectedIndices.add(i)
        } while(!wasNew)
    }

    return this.slice(selectedIndices)
}

Another inefficient approach could be the implementation below, which is very easy to implement with the current proposal, and thus even more probable to be a source of problems.

fun <E> List<E>.sampleRandomList(sampleSize: Int): List<E> = shuffled().take(sampleSize)

As far as I understand, one of the correct and efficient algorithms for sampling is "Vose's Alias" algorithm, of which I am frankly totally ignorant, but you can read about it here http://www.keithschwarz.com/darts-dice-coins/

@ilya-g
Copy link
Member Author

ilya-g commented Jul 23, 2018

@voddan I think it's better to start another KEEP for this family of functions.

@voddan
Copy link
Contributor

voddan commented Jul 23, 2018

@ilya-g IMHO sampling and shuffling functions should go in the same KEEP. Partly because how similar they are, partly because how easy it is to (inefficiently) implement one through the other. That said, maybe it is better to move shuffling out of the scope of this proposal, and combine it with the sampling APIs.

@ilya-g
Copy link
Member Author

ilya-g commented Jul 23, 2018

@voddan shuffling is already supported in the standard library, just an overload that takes a particular RNG is missing in common. On the other hand, sampling is a completely new API with its own design questions.

@chrismiller
Copy link

There's an interesting article about the performance and correctness of selecting random numbers from a range here: http://www.pcg-random.org/posts/bounded-rands.html See the conclusion for what is probably the best overall algorithm (in C, might be different for the JVM).

@ilya-g
Copy link
Member Author

ilya-g commented Jul 31, 2018

@chrismiller Thanks for the article, it is very enlightening.

Currently we use the "Debiased Mod (x1)" algorithm to select numbers from a range. "Debiased Int Multiplication" looks enticing, however we don't have the efficient 32-bit int multiplication that returns the upper part of the product in Kotlin/JS, not speaking of 64-bit ints for which there is no such multiplication even in Kotlin/JVM.

We need to decide which one to use before releasing repeatable RNG, because different algorithms can produce different number sequences from the same source of random bits.

@CreamyCookie
Copy link

CreamyCookie commented Nov 2, 2018

I don't know if this can be used here, but it seems like there is a better algorithm than the currently used xorwow (which is part of the Xorshift family): Xoroshiro128+ and related

xoroshiro128+ (named after its operations: XOR, rotate, shift, rotate) is a pseudorandom number generator intended as a successor to xorshift+. Instead of perpetuating Marsaglia's tradition of xorshift as a basic operation, xoroshiro128+ uses a shift/rotate-based linear transformation designed by Sebastiano Vigna in collaboration with David Blackman. The result is a significant improvement in speed (well below a nanosecond per integer) and a significant improvement in statistical quality.

Apparently xorwow also fails a few tests in the BigCrush test suite.

This is the authors site where they recommend xoshiro256** (or xoroshiro128** if only half the space needed) and present test results and (public domain) implementations.

@JLLeitschuh
Copy link

Given everyone who participated in this discussion, I'm guessing that you'd all be interested in this proposal as well.

Add a CSPRNG (eg. SecureRandom) to the Kotlin StdLib #184

I'd love your feedback.

@hXtreme
Copy link

hXtreme commented Mar 1, 2020

Might I suggest including a sampleWithReplacement in the sampling API

@quickstep24
Copy link

I would like to pitch for some random sampling methods such as

fun <E> List<E>.sampleRandom(random: Random = DefaultRandom): E
fun <E> List<E>.sampleRandomList(size: Int, random: Random = DefaultRandom): List<E>
fun <E> List<E>.sampleRandomSet(size: Int, random: Random = DefaultRandom): Set<E>
fun <E> List<E>.sampleRandomSet(size: Int, weights: List<Int> = LazyList(this.size) {i -> 1}): Set<E>

The benefit of having such functions in the stdlib is preventing developers from implementing naive-but-incorrect implementations themselves. For example, for sampleRandomList it could be tempting to do something like the following, which is incorrect because the result has a different distribution than initial Random.nextInt()

fun <E> List<E>.sampleRandomList(sampleSize: Int): List<E> {
    val selectedIndices = mutableSetOf<Int>()

    repeat(sampleSize) {
        do {
            val i: Int = Random.nextInt(indices)
            val wasNew = selectedIndices.add(i)
        } while(!wasNew)
    }

    return this.slice(selectedIndices)
}

Another inefficient approach could be the implementation below, which is very easy to implement with the current proposal, and thus even more probable to be a source of problems.

fun <E> List<E>.sampleRandomList(sampleSize: Int): List<E> = shuffled().take(sampleSize)

As far as I understand, one of the correct and efficient algorithms for sampling is "Vose's Alias" algorithm, of which I am frankly totally ignorant, but you can read about it here http://www.keithschwarz.com/darts-dice-coins/

Can you please explain, why the two methods you present for taking random samples are "naive-but-incorrect"? "Vose's Alias" which you cited is an algorithm for getting random numbers with a biased (non-uniform) distribution. I cannot see how this fits into your example.

@hXtreme
Copy link

hXtreme commented Jan 23, 2021

The first one is incorrect because its slow and possibly non-terminating.

When sample size is close to list length (n) I believe the program has an expected(in the probability sense) time complexity of O(n log(n))

The second one is incorrect as it is very slow when list length is large and sample size is small.

@quickstep24
Copy link

The first one is incorrect because its slow and possibly non-terminating.

When sample size is close to list length (n) I believe the program has an expected(in the probability sense) time complexity of O(n log(n))

The second one is incorrect as it is very slow when list length is large and sample size is small.

Of course, considering performance, the first is only recommendable for small subsets and the second one for large subsets. But they are both statistically sound. You indicated that they have a "different distribution than Random.nextInt" and I just wanted to point out that their results is just as good (or bad) as the built in RNG/nextInt.
Oh, and complexity near ´n´ is actually ´O(n²)´.

@hXtreme
Copy link

hXtreme commented Jan 24, 2021

@quickstep24, @voddan was the one to suggest that it's a different distribution. I have no idea on that matter.

@stavares
Copy link

Just looking for an update to see if this "bug" is gonna be fixed anytime soon. I am resorting to using java.util.Random or java.security.SecureRandom

just wondering.

@elizarov
Copy link
Contributor

stavares Just looking for an update to see if this "bug" is gonna be fixed anytime soon.

Can you clarify what exactly you are interested in?

Check out kotlin.random.Random. It's been there since Kotlin 1.3. There is no built-in support for secure random, though, but on JVM you can always do SecureRandom().asKotlinRandom() and use rich API of Kotlin's Random class in your code.

@stavares
Copy link

stavares commented Mar 5, 2023

stavares Just looking for an update to see if this "bug" is gonna be fixed anytime soon.

Can you clarify what exactly you are interested in?

Check out kotlin.random.Random. It's been there since Kotlin 1.3. There is no built-in support for secure random, though, but on JVM you can always do SecureRandom().asKotlinRandom() and use rich API of Kotlin's Random class in your code.

I just wanted to know if it was ever going to be crypto secure but you answered my question. Thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants