Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce unnecessary allocations in Rule creation #204

Merged
merged 1 commit into from
Apr 6, 2017

Commits on May 1, 2015

  1. Reduce unnecessary allocations in Rule creation

    The pattern of doing `[var].flatten` to coerce values to Arrays makes
    three Array allocations every time. The first is for the array enclosing
    var, the other two are in the C implementation of `flatten` to perform a
    depth-first traversal over any nested arrays found within the outer-most
    array[1]. Rules are allocated a *lot* during the typical execution of an
    application using CanCanCan for authorization. Since the parameters to
    Ability#can are expected to not be deeply nested, we can cut out the
    extra allocations by using Kernel#Array. This benchmark:
    
    [Array#flatten] Mem diff: 18.40625 MB
    [Kernel#Array] Mem diff: 1.01953125 MB
    
    Shows a savings of 17MB of allocations when instantiating 10,000
    Rules[2].
    This translates to a 2x speed improvement in Rule instantiation time:
    
    Calculating -------------------------------------
           Array#flatten    16.147k i/100ms
            Kernel#Array    34.565k i/100ms
    -------------------------------------------------
           Array#flatten    218.381k (± 9.7%) i/s -      1.098M
            Kernel#Array    460.892k (±30.7%) i/s -      2.108M
    
    Comparison:
            Kernel#Array:   460892.2 i/s
           Array#flatten:   218380.6 i/s - 2.11x slower
    
    [1] The first array used by flatten is a stack to push sub-arrays onto
    while traversing, the second is for the result array.
    
    [2] Memory figures are RSS, which doesn't consider shared memory, but we
    have none for this simple benchmark. GC was disabled during the
    benchmark eliminate the effects of unpredictable GC activity. Most of
    the intermediary arrays are immediately garbage, but the act of
    allocating them increases the work the garbage collector has to do.
    Benchmark/ips was used with GC enabled in the second benchmark.
    Benchmark script used is available here:
    https://gist.github.com/timraymond/8d7014e0c7804f0fe508
    timraymond committed May 1, 2015
    Configuration menu
    Copy the full SHA
    c81cf44 View commit details
    Browse the repository at this point in the history