-
-
Notifications
You must be signed in to change notification settings - Fork 615
Closed
Labels
Description
After a discourse thread, I was recommended to create a github issue about the performance of onehotbatch. Let's use the following MWE:
using Flux
using BenchmarkTools
const bases_dna = ['A', 'C', 'G', 'T']
function ohe_custom(sequence)
return collect(sequence) .== permutedims(bases_dna)
end
function ohe_flux(sequence)
return Flux.onehotbatch(collect(sequence), bases_dna)
end
sequence = "CCGAGGGCTATGGTTTGGAAGTTAGAACCCTGGGGCTTCTCGCGGA"So right now the dimensions of the two functions are different (transposed), but that can easily be changed with a permutedims applied to any one of them, otherwise they return the same onehot encoded matrix. So far, so good. However, when benchmarking them, we find the following:
@btime ohe_custom(sequence);
# output: 550.514 ns (5 allocations: 464 bytes)and
@btime ohe_flux(sequence);
# output: 69.274 μs (374 allocations: 17.30 KiB)As we can see, ohe_flux is more than 100 times slower than ohe_custom and with 70 times more allocations.
Another minor detail is the size of the different outputs:
Base.summarysize(sequence)
# output: 54Base.summarysize(ohe_custom(sequence))
# output: 96Base.summarysize(ohe_flux(sequence))
# output: 232