Random Number Implementation and Validation

Compilable code is build with:

cmake -S . -B cmake-build
cmake --build cmake-build

Some analysis is performed in the [jupyter] folder. This is done by automatically wrapping the fortran shared library with the fantastic gfort2py.

TODOs

Sorting

Migrate wrapped GSL C calling example here
Add a pytoml, such that one can straightforwardly install the python dependencies

Mapping a random number to an interval

Test Lemire's algorithm in the integer mapping module
- Note, there could be issues arising from transcribing from C to fortran

Sampling without replacement

Test:

My choice of random seed precision (uint)
Hidde shuffle
- Contains bug/s
Time all algorithms tested in the notebook

Implement

Weighted version of reservoir sampling
- Ideally where the indices and weights are evaluated on-the-fly
- Also see the paper Weighted random sampling with a reservoir

PRNGs

Implemented XOR, which is straightforward and has a large period of over a million. However, one has to be careful with the handling of signed vs unsigned integers when transcribing from C.

An unsigned int in C can hold numbers $[0, 2^{32} - 1]$, however fortran does not support this data type. Instead, signed int32 has the range $[-2^{31}, 2^{31} - 1]$. I use a bit mask to remap negative values:

! A mask that has all the bits set to 1 except the most significant bit 
! i.e. the sign bit in a 32-bit signed integer
iand(x, Z'7FFFFFFF')

This leaves $[0, 2^{31}-1]$ unchanged. However, for negative values, $x$ is represented using two's complement notation. When you apply iand(x, Z'7FFFFFFF'), you are effectively masking out the sign bit. This operation converts a negative number into its unsigned equivalent by removing the sign bit and keeping only the lower 31 bits.

integer(int32) :: x
x = -12345678_int32  ! x = -12345678 (in two's complement)
x = iand(x, Z'7FFFFFFF')  ! Masking the sign bit
! x becomes 2015137970 (the unsigned equivalent of -12345678)

There are smarter things one can do. See this Github reference by Jonas Finker, or MR 2528 for Octopus, however the above is currently sufficient for my needs.

Mapping integers to a smaller range

Also showed that mapping $[0, P)$ to $[a, b)$ is fine when the values are real but mapping to a smaller range of integers will inevitably result in duplication of numbers, even when uniformly sampling.

See:

Lemire's mapping
Mapping in the XOR module

Random Sampling a Population with no Replacements

For my use cases, one requires random sampling with no replacement. Algorithms that randomly sample a population with no replacements include:

Reservoir sampling
- A couple of versions are shown on wikipedia
- My implementations
Skip and Gap Sampling (Vitter's Algorithm)
- Can be more efficient than standard Reservoir Sampling, especially for large streams
  - Original paper with algorithms A - D, and followed up here
  - Reservoir Algorithms: Random Sampling with a Reservoir. This link is quite thorough and covers Algorithms A, D, R, X, Z, L
  - Some more details on Knuth's Algorithm L here
- Blog post on gap sampling
  - Quite short
  - Touches on Poisson distribution, which is also utilised by hidden shuffle - worth a read, but the code is Java
Hidden Shuffle This gives a python implementation, and claims it's more efficient than the above methods
- My python implementation, transcribed from the paper
- My fortran implementation
Hash-Based Sampling
Simple Random Sampling with Sorting
- Efficient when the range (m) is small w.r.t. N (i.e. $2^{32}$)
- Guarantees uniqueness of selected items.

Some overviews on the problem, and related algorithms:

Looks like a good, recent paper "Simple, Optimal Algorithms for Random Sampling Without Replacement" giving an overview of the methods listed here
For way more detail and code examples, see this gist

Fortran implementation references:

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.idea		.idea
jupyter		jupyter
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Random Number Implementation and Validation

TODOs

PRNGs

Mapping integers to a smaller range

Random Sampling a Population with no Replacements

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Random Number Implementation and Validation

TODOs

PRNGs

Mapping integers to a smaller range

Random Sampling a Population with no Replacements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages