BioJulia/WaveletMatrices.jl

The Wavelet Matrix
Switch branches/tags
Nothing to show
Latest commit 36d6df3 Jun 26, 2018
 Failed to load latest commit information. .github May 3, 2018 src Jun 26, 2018 test Oct 25, 2015 .gitignore May 30, 2015 .travis.yml Oct 6, 2017 CODE_OF_CONDUCT.md May 3, 2018 CONTRIBUTING.md May 3, 2018 LICENSE.md May 30, 2015 README.md Sep 17, 2015 REQUIRE Oct 6, 2017

WaveletMatrices

An implementation of "The Wavelet Matrix" (Claude and Navarro) http://www.dcc.uchile.cl/~gnavarro/ps/spire12.4.pdf.

The wavelet matrix is a data structure to represent an immutable sequence of unsigned integers that supports some queries efficiently.

The `WaveletMatrix` type is defined as follows:

`immutable WaveletMatrix{w,T<:Unsigned,B<:AbstractBitVector} <: AbstractVector{T}`

where

• `w`: the number of bits required to encode the unsigned integers (elements)
• `T`: the type of elements
• `B`: the type of bit vectors used to store elements.

To efficiently pack a sequence of unsigned integers, `w` should be as small as possible but enough to encode those integers. For example, if you want to store integers between 0x00 and 0x03 (i.e. four distinct integers), setting `w = 2 (= ceil(log2(4)))` is the best choice.

The basic operations available on the wavelet matrix are:

• `getindex(wm::WaveletMatrix, i::Integer)`: Return `i`-th element of `wm`.
• `rank(a::Unsigned, wm::WaveletMatrix, i::Integer)`: Count the number of `a`'s occurrences in `wm` between `1:i`.
• `select(a::Unsigned, wm::WaveletMatrix, j::Integer)`: Return the position of the `j`-th occurrence of `a` in `wm`.

Example

```data = rand(0x00:0x03, 100_000)
wm = WaveletMatrix{2}(data)

wm[129]
rank(0x02, wm, 5555)
select(0x01, wm, 9876)```