---
layout: post  
---

In this post, I want to explore how to efficiently count kmers in Julia

Two out of the box solutions for counting the number of occurances of arbitrary data points are:
- [StatsBase.countmap](https://juliastats.org/StatsBase.jl/stable/counts/#StatsBase.countmap)
- [DataStructures.counter](https://juliacollections.github.io/DataStructures.jl/latest/accumulators/#Constructors-1)

They are both convenience functions for creating an in-memory dictionary mapping all of the unique values to their counts

After the kmers have been counted, I'd like to store the kmers and counts in sorted, disk-backed vectors[^1]

[^1]: [Benchmarking kmer storage implementations]({{ site.baseurl }}{% post_url 2020-12-25-ordered-dictionary-vs-sorted-vectors-for-storing-kmers %})

In order to facilitate this, I'd like to find an efficient way of counting kmers in-memory and then dumping them and their counts to a sorted memory mapped array

In [1]:
import Pkg
pkgs = [
    "BioSequences",
    "Random",
    "BenchmarkTools",
    "Primes",
    "Mmap",
    "DataStructures",
    "StatsBase",
]

Pkg.add(pkgs)
for pkg in pkgs
    eval(Meta.parse("import $pkg"))
end

[32m[1m   Updating[22m[39m registry at `~/.julia/registries/General`


[?25l[2K

[32m[1m   Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`


[?25h

[32m[1m  Resolving[22m[39m package versions...
[32m[1mNo Changes[22m[39m to `~/.julia/environments/v1.5/Project.toml`
[32m[1mNo Changes[22m[39m to `~/.julia/environments/v1.5/Manifest.toml`


In [31]:
# This macro intercepts the output that @code_warntype would write to the screen
# Because this file is originally generated in a Jupyter notebook,
# we have a rich multimedia display with text coloration options
# @code_warntype uses color to emphasize areas of concern in the functions it analyzes
# that color emphasis renders in the Jupyter notebooks,
# but not the jekyll html pages generated by the notebooks.
# By capturing the output, we also strip the output of its color,
# allowing it to render more cleanly on the jekyll pages
macro capture_code_warntype(x)
    quote
        eval(:(sprint($((@macroexpand $x).args...))))
    end
end

@capture_code_warntype (macro with 1 method)

In [2]:
K = 3

3

In [3]:
KMER_TYPE = BioSequences.DNAMer{K}

BioSequences.Mer{BioSequences.DNAAlphabet{2},3}

In [26]:
sequence = BioSequences.randdnaseq(Random.seed!(1), 10^2)

100nt DNA Sequence:
GGACTGATCCGAGAAATTTACGCTCTCATAAATGCACGG…TCCTCTACTTGCGGCCGGAGGTCCTGACAACAGCCGGTT

In [5]:
kmer = first(BioSequences.each(KMER_TYPE, sequence))

Mer iteration result:
Position: 1
Forward: TCG
Backward: CGA


Below, we'll define 3 different functions that will count kmers using a dictionary and then return a sorted kmer counts dictionary

This function utilizes an `OrderedDict` that preserves insertion order of the keys but can be sorted _in-place_ to give us our desired ordering

In [6]:
function get_kmer_counts_ordered(::Type{KMER_TYPE}, sequence) where KMER_TYPE
    canonical_kmer_counts = DataStructures.OrderedDict{KMER_TYPE, Int}()
    canonical_kmer_iterator = (BioSequences.canonical(kmer.fw) for kmer in BioSequences.each(KMER_TYPE, sequence))
    for canonical_kmer in canonical_kmer_iterator
        canonical_kmer_counts[canonical_kmer] = get(canonical_kmer_counts, canonical_kmer, 0) + 1
    end
    return sort!(canonical_kmer_counts)
end

get_kmer_counts_ordered (generic function with 1 method)

This function utilizes a `SortedDict` that enforces sort order throughout the entire life-cycle of the instance

In [7]:
function get_kmer_counts_sorted(::Type{KMER_TYPE}, sequence) where KMER_TYPE
    canonical_kmer_counts = DataStructures.SortedDict{KMER_TYPE, Int}()
    canonical_kmer_iterator = (BioSequences.canonical(kmer.fw) for kmer in BioSequences.each(KMER_TYPE, sequence))
    for canonical_kmer in canonical_kmer_iterator
        canonical_kmer_counts[canonical_kmer] = get(canonical_kmer_counts, canonical_kmer, 0) + 1
    end
    return canonical_kmer_counts
end

get_kmer_counts_sorted (generic function with 1 method)

This final option uses the `StatsBase.countmap` internal function built around a default `Dict` type that does not enforce any level of ordering or sorting.

The sorting is not in-place and the data within the `Dict` must be copied over to a `OrderedDict` before being returned

In [41]:
function get_kmer_counts(::Type{KMER_TYPE}, sequence) where KMER_TYPE
    canonical_kmer_counts = Dict{KMER_TYPE, Int}()
    canonical_kmer_iterator = (BioSequences.canonical(kmer.fw) for kmer in BioSequences.each(KMER_TYPE, sequence))
    StatsBase.addcounts_dict!(canonical_kmer_counts, canonical_kmer_iterator)
    return sort(canonical_kmer_counts)
end

get_kmer_counts (generic function with 1 method)

The three functions all return the same output, making them equivalently correct for achieving our goal

In [42]:
get_kmer_counts(KMER_TYPE, sequence) ==
get_kmer_counts_sorted(KMER_TYPE, sequence) ==
get_kmer_counts_ordered(KMER_TYPE, sequence)

true

And in the following three code blocks, we can assert that they are all type-stable in their return types

In [44]:
# @code_warntype get_kmer_counts(KMER_TYPE, sequence)
println(@capture_code_warntype @code_warntype get_kmer_counts(KMER_TYPE, sequence))

Variables
  #self#::Core.Compiler.Const(get_kmer_counts, false)
  #unused#::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},5}, false)
  sequence::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}
  #132::var"#132#133"
  canonical_kmer_counts::Dict{BioSequences.Mer{BioSequences.DNAAlphabet{2},5},Int64}
  canonical_kmer_iterator::Base.Generator{BioSequences.EveryMerIterator{BioSequences.Mer{BioSequences.DNAAlphabet{2},5},BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}},var"#132#133"}

Body::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},5},Int64}
1 ─ %1  = Core.apply_type(Main.Dict, $(Expr(:static_parameter, 1)), Main.Int)::Core.Compiler.Const(Dict{BioSequences.Mer{BioSequences.DNAAlphabet{2},5},Int64}, false)
│         (canonical_kmer_counts = (%1)())
│         (#132 = %new(Main.:(var"#132#133")))
│   %4  = #132::Core.Compiler.Const(var"#132#133"(), false)
│   %5  = BioSequences.each::Core.Compiler.Const(BioSequences.each, false

In [33]:
# @code_warntype get_kmer_counts_sorted(KMER_TYPE, sequence)
println(@capture_code_warntype @code_warntype get_kmer_counts_sorted(KMER_TYPE, sequence))

Variables
  #self#::Core.Compiler.Const(get_kmer_counts_sorted, false)
  #unused#::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},31}, false)
  sequence::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}
  #3::var"#3#4"
  canonical_kmer_counts::DataStructures.SortedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64,Base.Order.ForwardOrdering}
  canonical_kmer_iterator::Base.Generator{BioSequences.EveryMerIterator{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}},var"#3#4"}
  @_7::UNION{NOTHING, TUPLE{BIOSEQUENCES.MER{BIOSEQUENCES.DNAALPHABET{2},31},TUPLE{INT64,INT64,UINT64,UINT64}}}
  canonical_kmer::BioSequences.Mer{BioSequences.DNAAlphabet{2},31}

Body::DataStructures.SortedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64,Base.Order.ForwardOrdering}
1 ─ %1  = DataStructures.SortedDict::Core.Compiler.Const(DataStructures.SortedDict, false)
│   %2  = $(Expr(:static_parameter, 1))::Core.Com

In [34]:
# @code_warntype get_kmer_counts_ordered(KMER_TYPE, sequence)
println(@capture_code_warntype @code_warntype get_kmer_counts_ordered(KMER_TYPE, sequence))

Variables
  #self#::Core.Compiler.Const(get_kmer_counts_ordered, false)
  #unused#::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},31}, false)
  sequence::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}
  #1::var"#1#2"
  canonical_kmer_counts::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
  canonical_kmer_iterator::Base.Generator{BioSequences.EveryMerIterator{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}},var"#1#2"}
  @_7::UNION{NOTHING, TUPLE{BIOSEQUENCES.MER{BIOSEQUENCES.DNAALPHABET{2},31},TUPLE{INT64,INT64,UINT64,UINT64}}}
  canonical_kmer::BioSequences.Mer{BioSequences.DNAAlphabet{2},31}

Body::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
1 ─ %1  = DataStructures.OrderedDict::Core.Compiler.Const(OrderedCollections.OrderedDict, false)
│   %2  = $(Expr(:static_parameter, 1))::Core.Compiler.Const(BioSequences.Mer{BioSeque

In the following series of benchmark results, we can see that the implementation using the optimized `Dict`-based methods in `StatsBase` leads to an implementation that is both faster and more memory efficient than the `SortedDict` implementation, however is has ~2x the memory usage and is a little slower than the implementation using the `OrderedDict`

In [45]:
BenchmarkTools.@benchmark get_kmer_counts($KMER_TYPE, $sequence)

BenchmarkTools.Trial: 
  memory estimate:  61.00 KiB
  allocs estimate:  42
  --------------
  minimum time:     45.043 μs (0.00% GC)
  median time:      48.998 μs (0.00% GC)
  mean time:        61.067 μs (15.47% GC)
  maximum time:     10.853 ms (99.29% GC)
  --------------
  samples:          10000
  evals/sample:     1

Using a sorted Dictionary via `get_kmer_counts_sorted` is the slowest and most memory intensive, but has the fewest number of total allocations made. I imagine the extra memory overhead is due to the tree-based data structure and the extra runtime due to the $$O(logN)$$ runtime mentioned in the [documentation](https://juliacollections.github.io/DataStructures.jl/stable/sorted_containers/#Sorted-Containers-1)

In [46]:
BenchmarkTools.@benchmark get_kmer_counts_sorted($KMER_TYPE, $sequence)

BenchmarkTools.Trial: 
  memory estimate:  73.47 KiB
  allocs estimate:  28
  --------------
  minimum time:     113.296 μs (0.00% GC)
  median time:      117.337 μs (0.00% GC)
  mean time:        129.898 μs (7.72% GC)
  maximum time:     7.498 ms (97.84% GC)
  --------------
  samples:          10000
  evals/sample:     1

Creating an `OrderedDict` and sorting it in-place after all of the data has been added appears to be fastest and least memory intensive solution

In [47]:
BenchmarkTools.@benchmark get_kmer_counts_ordered($KMER_TYPE, $sequence)

BenchmarkTools.Trial: 
  memory estimate:  37.03 KiB
  allocs estimate:  29
  --------------
  minimum time:     41.055 μs (0.00% GC)
  median time:      43.212 μs (0.00% GC)
  mean time:        52.685 μs (15.51% GC)
  maximum time:     11.520 ms (99.06% GC)
  --------------
  samples:          10000
  evals/sample:     1

Let's try a series of benchmarks to make sure that these patterns hold on larger datasets

In [75]:
function my_display(results, indent)
    for line in split(sprint(show, "text/plain", results), '\n')
        println(repeat("\t", indent) * "$line")
    end
end

for k in [17, 31]
    for s_length in [10^i for i in 3:5]
        println("k = $k")
        println("\tsequence_length = $s_length")
        KMER_TYPE = BioSequences.DNAMer{k}
        sequence = BioSequences.randdnaseq(s_length)
        println("\t\tdefault")
        default_results = BenchmarkTools.@benchmark get_kmer_counts($KMER_TYPE, $sequence)
        my_display(default_results, 3)
        println("\t\tsorted")
        sorted_results = BenchmarkTools.@benchmark get_kmer_counts_sorted($KMER_TYPE, $sequence)
        my_display(sorted_results, 3)
        println("\t\tordered")
        ordered_results = BenchmarkTools.@benchmark get_kmer_counts_ordered($KMER_TYPE, $sequence)
        my_display(ordered_results, 3)
    end
end

k = 17
	sequence_length = 1000
		default
			BenchmarkTools.Trial: 
			  memory estimate:  186.72 KiB
			  allocs estimate:  50
			  --------------
			  minimum time:     120.571 μs (0.00% GC)
			  median time:      147.005 μs (0.00% GC)
			  mean time:        202.911 μs (16.70% GC)
			  maximum time:     18.398 ms (99.12% GC)
			  --------------
			  samples:          10000
			  evals/sample:     1
		sorted
			BenchmarkTools.Trial: 
			  memory estimate:  145.73 KiB
			  allocs estimate:  31
			  --------------
			  minimum time:     151.289 μs (0.00% GC)
			  median time:      191.273 μs (0.00% GC)
			  mean time:        228.298 μs (7.83% GC)
			  maximum time:     11.964 ms (97.72% GC)
			  --------------
			  samples:          10000
			  evals/sample:     1
		ordered
			BenchmarkTools.Trial: 
			  memory estimate:  94.41 KiB
			  allocs estimate:  32
			  --------------
			  minimum time:     76.456 μs (0.00% GC)
			  median time:      90.848 μs (0.00% GC)
			  mean time:        111

The `SortedDict` implementation remained the slowest implementation regardless of the data set. The cost of copying the default `Dict` continued to grow until it required substantially more memory than the `SortedDict` implementation and was no longer competitive in total runtime to the `OrderedDict` implementation

We'll evaluate one last scenario. Rather than assessing the sorted kmer counts for one sequence, we'd like to get the sorted kmer counts for a collection of sequences

In that case, there is no need to sort until the very end, potentially saving us the additional data copy penalties of the `StatsBase` optimized counting methods

Let's create two new kmer counting methods that don't sort, then wrap them with functions that integrate the counts for each sequence assessed. The integrated count dictionary will then be sorted and returned to give us our desired output.

In [51]:
function get_kmer_counts_ordered_no_sort(::Type{KMER_TYPE}, sequence) where KMER_TYPE
    canonical_kmer_counts = DataStructures.OrderedDict{KMER_TYPE, Int}()
    canonical_kmer_iterator = (BioSequences.canonical(kmer.fw) for kmer in BioSequences.each(KMER_TYPE, sequence))
    for canonical_kmer in canonical_kmer_iterator
        canonical_kmer_counts[canonical_kmer] = get(canonical_kmer_counts, canonical_kmer, 0) + 1
    end
    return canonical_kmer_counts
end

get_kmer_counts_ordered_no_sort (generic function with 1 method)

In [52]:
function get_kmer_counts_no_sort(::Type{KMER_TYPE}, sequence) where KMER_TYPE
    canonical_kmer_counts = Dict{KMER_TYPE, Int}()
    canonical_kmer_iterator = (BioSequences.canonical(kmer.fw) for kmer in BioSequences.each(KMER_TYPE, sequence))
    StatsBase.addcounts_dict!(canonical_kmer_counts, canonical_kmer_iterator)
    return canonical_kmer_counts
end

get_kmer_counts_no_sort (generic function with 1 method)

In [53]:
function default_count_merge(::Type{KMER_TYPE}, sequences) where KMER_TYPE
    joint_kmer_counts = DataStructures.OrderedDict{KMER_TYPE, Int}()
    for sequence in sequences
        sequence_kmer_counts = get_kmer_counts_no_sort(KMER_TYPE, sequence)
        merge!(+, joint_kmer_counts, sequence_kmer_counts)
    end
    sort!(joint_kmer_counts)
end

default_count_merge (generic function with 1 method)

In [54]:
function ordered_count_merge(::Type{KMER_TYPE}, sequences) where KMER_TYPE
    joint_kmer_counts = DataStructures.OrderedDict{KMER_TYPE, Int}()
    for sequence in sequences
        sequence_kmer_counts = get_kmer_counts_ordered_no_sort(KMER_TYPE, sequence)
        merge!(+, joint_kmer_counts, sequence_kmer_counts)
    end
    sort!(joint_kmer_counts)
end

ordered_count_merge (generic function with 1 method)

Here we just sanity check that the functions are all type stable

In [68]:
# @code_warntype get_kmer_counts_ordered_no_sort(KMER_TYPE, sequence)
println(@capture_code_warntype @code_warntype get_kmer_counts_ordered_no_sort(KMER_TYPE, sequence))

Variables
  #self#::Core.Compiler.Const(get_kmer_counts_ordered_no_sort, false)
  #unused#::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},31}, false)
  sequence::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}
  #263::var"#263#264"
  canonical_kmer_counts::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
  canonical_kmer_iterator::Base.Generator{BioSequences.EveryMerIterator{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}},var"#263#264"}
  @_7::UNION{NOTHING, TUPLE{BIOSEQUENCES.MER{BIOSEQUENCES.DNAALPHABET{2},31},TUPLE{INT64,INT64,UINT64,UINT64}}}
  canonical_kmer::BioSequences.Mer{BioSequences.DNAAlphabet{2},31}

Body::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
1 ─ %1  = DataStructures.OrderedDict::Core.Compiler.Const(OrderedCollections.OrderedDict, false)
│   %2  = $(Expr(:static_parameter, 1))::Core.Compiler.Const(BioSequ

In [69]:
# @code_warntype get_kmer_counts_no_sort(KMER_TYPE, sequence)
println(@capture_code_warntype @code_warntype get_kmer_counts_no_sort(KMER_TYPE, sequence))

Variables
  #self#::Core.Compiler.Const(get_kmer_counts_no_sort, false)
  #unused#::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},31}, false)
  sequence::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}
  #265::var"#265#266"
  canonical_kmer_counts::Dict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
  canonical_kmer_iterator::Base.Generator{BioSequences.EveryMerIterator{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}},var"#265#266"}

Body::Dict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
1 ─ %1  = Core.apply_type(Main.Dict, $(Expr(:static_parameter, 1)), Main.Int)::Core.Compiler.Const(Dict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}, false)
│         (canonical_kmer_counts = (%1)())
│         (#265 = %new(Main.:(var"#265#266")))
│   %4  = #265::Core.Compiler.Const(var"#265#266"(), false)
│   %5  = BioSequences.each::Core.Compiler.Const(BioSequences.each, false)
│   %6  = $

In [70]:
# @code_warntype default_count_merge(KMER_TYPE, [sequence])
println(@capture_code_warntype @code_warntype default_count_merge(KMER_TYPE, [sequence]))

Variables
  #self#::Core.Compiler.Const(default_count_merge, false)
  #unused#::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},31}, false)
  sequences::Array{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}},1}
  joint_kmer_counts::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
  @_5::UNION{NOTHING, TUPLE{BIOSEQUENCES.LONGSEQUENCE{BIOSEQUENCES.DNAALPHABET{4}},INT64}}
  sequence::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}
  sequence_kmer_counts::Dict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}

Body::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
1 ─ %1  = DataStructures.OrderedDict::Core.Compiler.Const(OrderedCollections.OrderedDict, false)
│   %2  = $(Expr(:static_parameter, 1))::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},31}, false)
│   %3  = Core.apply_type(%1, %2, Main.Int)::Core.Compiler.Const(OrderedCollections.OrderedDict{BioSequenc

In [71]:
# @code_warntype ordered_count_merge(KMER_TYPE, [sequence])
println(@capture_code_warntype @code_warntype ordered_count_merge(KMER_TYPE, [sequence]))

Variables
  #self#::Core.Compiler.Const(ordered_count_merge, false)
  #unused#::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},31}, false)
  sequences::Array{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}},1}
  joint_kmer_counts::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
  @_5::UNION{NOTHING, TUPLE{BIOSEQUENCES.LONGSEQUENCE{BIOSEQUENCES.DNAALPHABET{4}},INT64}}
  sequence::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}
  sequence_kmer_counts::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}

Body::OrderedCollections.OrderedDict{BioSequences.Mer{BioSequences.DNAAlphabet{2},31},Int64}
1 ─ %1  = DataStructures.OrderedDict::Core.Compiler.Const(OrderedCollections.OrderedDict, false)
│   %2  = $(Expr(:static_parameter, 1))::Core.Compiler.Const(BioSequences.Mer{BioSequences.DNAAlphabet{2},31}, false)
│   %3  = Core.apply_type(%1, %2, Main.Int)::Core.Compiler.Const(OrderedCollecti

Confirm that they return the same output

In [62]:
default_count_merge(KMER_TYPE, [sequence]) == ordered_count_merge(KMER_TYPE, [sequence])

true

In the following benchmarks, the two options appear to be rather similar.

In [74]:
for k in [17, 31]
    println("k = $k")
    for sequence_length in [10^i for i in 2:3]
        println("\tsequence_length = $sequence_length")
        for n_sequences in [10^i for i in 2:3]
            println("\t\t# sequences = $n_sequences")
            KMER_TYPE = BioSequences.DNAMer{k}
            sequences = collect(BioSequences.randdnaseq(sequence_length) for i in 1:n_sequences)
            println("\t\t\tdefault")
            default_results = BenchmarkTools.@benchmark default_count_merge($KMER_TYPE, $sequences)
            my_display(default_results, 4)
            println("\t\t\tordered")
            ordered_results = BenchmarkTools.@benchmark ordered_count_merge($KMER_TYPE, $sequences)
            my_display(ordered_results, 4)
        end
    end
end

k = 17
	sequence_length = 100
		# sequences = 100
			default
				BenchmarkTools.Trial: 
				  memory estimate:  1.48 MiB
				  allocs estimate:  1046
				  --------------
				  minimum time:     1.360 ms (0.00% GC)
				  median time:      1.651 ms (0.00% GC)
				  mean time:        1.897 ms (9.87% GC)
				  maximum time:     12.382 ms (84.79% GC)
				  --------------
				  samples:          2634
				  evals/sample:     1
			ordered
				BenchmarkTools.Trial: 
				  memory estimate:  1.45 MiB
				  allocs estimate:  1946
				  --------------
				  minimum time:     1.450 ms (0.00% GC)
				  median time:      1.729 ms (0.00% GC)
				  mean time:        1.926 ms (8.62% GC)
				  maximum time:     12.761 ms (71.43% GC)
				  --------------
				  samples:          2595
				  evals/sample:     1
		# sequences = 1000
			default
				BenchmarkTools.Trial: 
				  memory estimate:  14.68 MiB
				  allocs estimate:  10056
				  --------------
				  minimum time:     17.040 ms (0.00% GC)
				  media

The `OrderedDict` implementation uses less overall memory, but does so with more individual allocations.

I think I'll go ahead with the `OrderedDict` version of the two in the interest of memory conservation, although I don't think we could go wrong selecting either one!