Skip to content
This repository has been archived by the owner on Oct 28, 2023. It is now read-only.

Improve performance by about 50% #14

Merged
merged 2 commits into from
Sep 11, 2019
Merged

Commits on Sep 10, 2019

  1. Improve performance by about 50%

    I ran cargo flamegraph, and it turns out a huge portion of the runtime was spent in find_match and find_better_match.  It's a very, very hot loop.
    
    Almost all of the work done in the inner loop (find_better_match) is a function of two u8s... it can be precomputed!
    
    Also, the alpha masks can be rendered into these precomputed cost functions, avoiding the need to do any alpha computations in the loop.
    
    I ran all the examples and couldn't find any visible artifacts.
    austinjones authored and Austin Jones committed Sep 10, 2019
    Configuration menu
    Copy the full SHA
    680deee View commit details
    Browse the repository at this point in the history
  2. Fix bugs: exp() numerical precision, and loop bounds bug

    I found a few small bugs while looking at @zicklag's comments on EmbarkStudios#14
    
    First: there is a numerical precision bug with the calculation of distance gaussians.  The exp() function used to be f64::exp(), and I was using f32::exp().
    
    Second: there were missing entries in the precomputed function table.  Loop bounds are exclusive... but 256u8 is not a u8... so it needs 0..=255u8 which was made for this situation.
    Austin Jones committed Sep 10, 2019
    Configuration menu
    Copy the full SHA
    b555fad View commit details
    Browse the repository at this point in the history