Pseudorandom number generators in Clojure.
This project is in pre-release development and subject to change.
A pseudorandom number generator (PRNG) is a deterministic algorithm that generates a sequence of numbers that appear to be random.
A PRNG has the following components:
- State space, S: a finite set of possible states.
- Output space, U: a finite set of possible outputs (pseudorandom numbers) that can be generated.
- Transition function, f: a pure function that takes a state as its argument and returns a (usually different) state.
- Output function, g: a pure function that maps a state to an element in the output space. Different output functions may be used to generate different numeric types (e.g. longs or doubles) from the same underlying sequence of states.
The state of a PRNG evolves according to the recurrence relationship (LaTeX math markup):
s_{i} = f(s_{i - 1})
u_{i} = g(s{i})
where
f: S \rightarrow S
is the state transition function and
g: S \rightarrow U
is the output function.
The initial state of a PRNG is typically constructed by provided a "seed" (a source of bits) to a factory function. When initialized with the same seed, a PRNG is guaranteed to subsequently generate exactly the same sequence of numbers.
Most widely used PRNG libraries represent PRNGs as stateful objects. A
typical example is the java.util.Random
class in Java, which
encapsulates a hidden state. Calling a method to generate the next
pseudorandom number in the sequence has the side effect of mutating this
hidden state:
(def r (java.util.Random. 1234567))
(.nextDouble r)
;=> 0.24283348968807417
; nextDouble is impure! the second call returns a different result!
(.nextDouble r)
;=> 0.7520183689538884
Such statefulness makes it difficult to reason about concurrency or reproducibility. Also, as shown in this example, stateful PRNG libraries typically conflate the transition function and the output function in a single method.
The prng
library takes a different approach to PRNGs that is more
functional and Clojure-like. Instead of implicitly mutating a stateful
object, the state of the PRNG is directly evolved by calling a
transition function that returns the next state, and a pseudorandom
number is directly generated by calling an output function on a state.
States are values that can be passed to functions that produce other
values. The transition function and the output function are separated
concerns. This functional design is intended to make it easy to reason
about concurrency and reproducibility, especially when using PRNGs in
large distributed systems in which coordination is costly.
This is a quick demo that you can follow in a Clojure REPL to get
acquainted with the prng
library.
The namespace com.climate.prng.core
defines a PRNGState
protocol
whose methods can be invoked on all implementing types.
(require '[com.climate.prng.core :as prng])
Specific PRNG implementations are found in namespaces under the prefix
com.climate.prng.generators
. Currently, two implementations are
available:
- A 64-bit version of the well-known MT19937 Mersenne Twister PRNG
- A counter-based PRNG that uses the Threefish block cipher as the source of pseudorandomness
Try using the Mersenne Twister:
(require '[com.climate.prng.generators.mersenne-twister :as mt])
To instantiate a Mersenne Twister state, call a factory function with a
long[]
seed argument.
(def seed (long-array [0 1 2 3]))
(def initial-state (mt/seed-state seed)))
The seed-state
factory function returns a state value that satisfies
the PRNGState
protocol.
(satisfies? prng/PRNGState initial-state)
;=> true
For a given seed argument, the same return value ("same" in the sense of
equals()
) is returned:
(= initial-state (mt/seed-state seed))
;=> true
The PRNGState
protocol methods are pure functions that take a state
value argument and return other values, without mutating the input state
value.
To sample a long from this PRNG, simply call the ->long
protocol
method on the state. ->long
is a pure function; for a given state, it
always returns the same value, and it does not mutate its state argument.
(prng/->long initial-state)
;=> -3722652993627521228
; Multiple calls return the same value.
(prng/->long initial-state)
;=> -3722652993627521228
Similarly, the ->double
protocol method returns a double:
(prng/->double initial-state)
;=> 0.4049867569919572
The next-state
protocol method takes a state argument and returns a
new state that is one step ahead in the state sequence of the PRNG. The
input state is not mutated.
(def new-state (prng/next-state initial-state))
(= new-state initial-state)
;=> false
; Get a long from this new state
(prng/->long new-state)
;=> -909579606142348662
; Get a long from the old state (compare to previous calls)
(prng/->long initial-state)
;=> -3722652993627521228
The jump-state
protocol method takes two arguments, a state and a jump
size, and jumps ahead by the provided jump size. jump-state
returns
the same state value as iterative composition of next-state
, but it is
much faster.
(def jump-size 1000000)
(def state-after-jump (prng/jump-state initial-state jump-size))
(= state-after-jump ((apply comp (repeat jump-size prng/next-state)) initial-state))
;=> true
lein test
runs unit tests.
lein test :benchmark
runs tests with the :benchmark
test selector,
which are a few performance benchmarks. They may take a while to
complete.
The Java classes MersenneTwisterState
and ThreefishState
have public
methods that are untested for thread safety and are probably unsafe.
Although the PRNGState
protocol is designed for concurrency, it
possible to bypass the protocol and directly invoke the unsafe methods
of the implementation classes. To ensure thread safety, these
implementation classes ought to be re-implemented with Clojure-style
persistent data structures.
For the MersenneTwisterState
implementation class, the jump-state
protocol method uses a naive algorithm to compute jump-ahead from the
remainder polynomial. Maromoto and colleauges demonstrated two
alternative algorithms
to compute jump-ahead more quickly. The jump-state
method ought to be
reimplemented with one of these faster algorithms.
The implementation of the jump-state
method for ThreefishState
uses
kludgy BigInteger
and hexadecimal string manipulations to perform
binary arithmetic. Performance can almost certainly be improved by
thoughtful bit twiddling.
The Mersenne Twister is a well-known, high-quality PRNG that has been thoroughly tested for pseudorandomness. The implementation in this library is a bit-exact match to the reference implementation in C.
The counter-based Threefish PRNG has not been as well tested, and its pseudorandomness ought to be teststed with a suite of statistical tests.
Copyright © 2014 The Climate Corporation
Distributed under the Apache License, Version 2.0