Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default sort algorithm affects global RNG #48230

Closed
gaurav-arya opened this issue Jan 11, 2023 · 0 comments · Fixed by #48241
Closed

Default sort algorithm affects global RNG #48230

gaurav-arya opened this issue Jan 11, 2023 · 0 comments · Fixed by #48241
Labels
domain:randomness Random number generation and the Random stdlib domain:sorting Put things in order

Comments

@gaurav-arya
Copy link
Contributor

gaurav-arya commented Jan 11, 2023

In #45222, the default sort algorithm was changed to an algorithm which uses randomness (at least, when Random is loaded), and therefore can modify the global RNG state.

This can lead to surprising bugs in cases where sort! is used with the expectation that it will not modify the global RNG state. For example: FluxML/Zygote.jl#1351, where Zygote's metaprogramming (IRTools) calls sort when running the program for the first time. This means that a primal program gives a different output when a function is differentiated for the first time, as compared to subsequent runs, even when the global RNG is initialized to the same thing in both cases. This is incorrect behaviour because Zygote should not affect the primal's random draws.

Arguably, this is a bug in IRTools: IRTools should either switch to a sort! algorithm that has a semantic guarantee of determinism, or somehow perform the default sort but with a different RNG that doesn't affect the global one (note that the latter is difficult to do because Julia does not have native splittable RNGs, and extra global RNGs hidden in packages sounds like a bad idea, because a computation's result would ideally be deterministic given a) the global RNG's initial state, and b) any user-provided RNGs).

Also, to clarify, for a sorting algorithm that needs randomness to make calls to rand certainly seems like the right thing to do, so the discussion in #45222 (comment) makes sense to me. (Although, perhaps there should be a facility for user-provided RNGs.).

However, in practice, the question is whether breaking the implicit assumption that the default sort is deterministic could lead to bugs. For example, the REPL calls sort! (logging mine, this is on 1.9.0-beta2):
image

If sort! does not have a semantic guarantee of not affecting the global RNG, would it be correct to say that Random.seed!(123); rand(); could have different outputs in the REPL v.s. a script?

This does not actually happen in the above example,, because the polyalg uses InsertionSort for small inputs, which does not make any calls to rand. (This is why the Zygote bug linked above was a bit difficult to discover; presumably the AST had to be big enough). However, relying on this fact seems dangerous: the question is really about getting the semantic guarantees right, and whether modifying any existing (implicit) guarantees on a function such as sort! that is used in internals could have unforeseen effects.

Please let me know if I'm missing anything big in the story here, it's very possible:) But since this change is slated for Julia 1.9, I thought it would be good to raise this concern asap in case there's something to it.

Edit: see discussion in #45222 (comment) which is highly relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:randomness Random number generation and the Random stdlib domain:sorting Put things in order
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants