Skip to content

onehotbatch performance #1844

@ChristianMichelsen

Description

@ChristianMichelsen

After a discourse thread, I was recommended to create a github issue about the performance of onehotbatch. Let's use the following MWE:

using Flux
using BenchmarkTools

const bases_dna = ['A', 'C', 'G', 'T']

function ohe_custom(sequence)
    return collect(sequence) .== permutedims(bases_dna)
end

function ohe_flux(sequence)
    return Flux.onehotbatch(collect(sequence), bases_dna)
end

sequence = "CCGAGGGCTATGGTTTGGAAGTTAGAACCCTGGGGCTTCTCGCGGA"

So right now the dimensions of the two functions are different (transposed), but that can easily be changed with a permutedims applied to any one of them, otherwise they return the same onehot encoded matrix. So far, so good. However, when benchmarking them, we find the following:

@btime ohe_custom(sequence);
# output: 550.514 ns (5 allocations: 464 bytes)

and

@btime ohe_flux(sequence);
# output: 69.274 μs (374 allocations: 17.30 KiB)

As we can see, ohe_flux is more than 100 times slower than ohe_custom and with 70 times more allocations.

Another minor detail is the size of the different outputs:

Base.summarysize(sequence)
# output: 54
Base.summarysize(ohe_custom(sequence))
# output: 96
Base.summarysize(ohe_flux(sequence))
# output: 232

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions