Compilable code is build with:
cmake -S . -B cmake-build
cmake --build cmake-buildSome analysis is performed in the [jupyter] folder. This is done by automatically wrapping the fortran shared library with the fantastic gfort2py.
Sorting
- Migrate wrapped GSL C calling example here
- Add a pytoml, such that one can straightforwardly install the python dependencies
Mapping a random number to an interval
- Test Lemire's algorithm in the integer mapping module
- Note, there could be issues arising from transcribing from C to fortran
Sampling without replacement
Test:
- My choice of random seed precision (
uint) - Hidde shuffle
- Contains bug/s
- Time all algorithms tested in the notebook
Implement
- Weighted version of reservoir sampling
- Ideally where the indices and weights are evaluated on-the-fly
- Also see the paper Weighted random sampling with a reservoir
Implemented XOR, which is straightforward and has a large period of over a million. However, one has to be careful with the handling of signed vs unsigned integers when transcribing from C.
An unsigned int in C can hold numbers int32 has the range
! A mask that has all the bits set to 1 except the most significant bit
! i.e. the sign bit in a 32-bit signed integer
iand(x, Z'7FFFFFFF')This leaves iand(x, Z'7FFFFFFF'), you are effectively masking out the sign bit. This operation converts a negative
number into its unsigned equivalent by removing the sign bit and keeping only the lower 31 bits.
integer(int32) :: x
x = -12345678_int32 ! x = -12345678 (in two's complement)
x = iand(x, Z'7FFFFFFF') ! Masking the sign bit
! x becomes 2015137970 (the unsigned equivalent of -12345678)There are smarter things one can do. See this Github reference by Jonas Finker, or MR 2528 for Octopus, however the above is currently sufficient for my needs.
Also showed that mapping
See:
- Lemire's mapping
- Mapping in the XOR module
For my use cases, one requires random sampling with no replacement. Algorithms that randomly sample a population with no replacements include:
-
Reservoir sampling
- A couple of versions are shown on wikipedia
- My implementations
-
Skip and Gap Sampling (Vitter's Algorithm)
- Can be more efficient than standard Reservoir Sampling, especially for large streams
- Original paper with algorithms A - D, and followed up here
- Reservoir Algorithms: Random Sampling with a Reservoir. This link is quite thorough and covers Algorithms A, D, R, X, Z, L
- Some more details on Knuth's Algorithm L here
-
Blog post on gap sampling
- Quite short
- Touches on Poisson distribution, which is also utilised by hidden shuffle - worth a read, but the code is Java
- Can be more efficient than standard Reservoir Sampling, especially for large streams
-
Hidden Shuffle This gives a python implementation, and claims it's more efficient than the above methods
- My python implementation, transcribed from the paper
- My fortran implementation
-
Hash-Based Sampling
-
Simple Random Sampling with Sorting
- Efficient when the range (m) is small w.r.t. N (i.e.
$2^{32}$ ) - Guarantees uniqueness of selected items.
- Efficient when the range (m) is small w.r.t. N (i.e.
Some overviews on the problem, and related algorithms:
- Looks like a good, recent paper "Simple, Optimal Algorithms for Random Sampling Without Replacement" giving an overview of the methods listed here
- For way more detail and code examples, see this gist
Fortran implementation references: