Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] apd: embed small coefficient values in Decimal struct #101

Conversation

nvanbenschoten
Copy link
Member

Replaces cockroachdb/cockroach#74369.

This commit introduces a performance optimization that embeds small coefficient values directly in their Decimal struct, instead of storing these values in a separate heap allocation.

Each Decimal maintains (through big.Int) an internal reference to a variable-length coefficient, which is represented by a []big.Word. The coeffInline field and the lazyInit method combine to allow us to inline this coefficient within the Decimal struct when its value is sufficiently small. In lazyInit, we point the Coeff field's backing slice at the coeffInline array. big.Int will avoid re-allocating this array until it is provided with a value that exceeds the initial capacity. We set this initial capacity to accommodate any coefficient that would fit in a 64-bit integer (i.e. values up to 2^64).

This is an alternative to an optimization that many other arbitrary precision decimal libraries have where small coefficient values are stored as numeric fields in their data type's struct. Only when this coefficient value gets large enough do these libraries fall back to a variable-length coefficient with internal indirection. We can see the optimization in practice in the ericlagergren/decimal library, where each struct contains a compact uint64 and an unscaled big.Int. Prior concern from the authors of cockroachdb/apd regarding this form of compact representation optimization was that it traded performance for complexity. The optimization fractured control flow, leaking out across the library and leading to more complex, error-prone code.

The approach taken in this commit does not have the same issue. All arithmetic on the decimal's coefficient is still deferred to bit.Int. In fact, the entire optimization is best-effort, and bugs that lead to missed calls to lazyInit are merely missed opportunities to avoid a heap allocation, and nothing more serious. The internal reference within Decimal is vulnerable to memory aliasing bugs when a Decimal is copied by value (and not through the Decimal.Set method), but this is not a new concern.

Impact on benchmarks:

name                   old time/op    new time/op    delta
GDA/compare-10           61.5µs ±20%    54.5µs ±18%  -11.34%  (p=0.035 n=10+10)
GDA/reduce-10            18.8µs ± 7%    17.5µs ± 6%   -7.16%  (p=0.001 n=10+10)
GDA/remainder-10         67.0µs ±11%    62.2µs ± 6%   -7.09%  (p=0.023 n=10+10)
Exp/P5/S-4/D-2-10        2.56µs ± 4%    2.51µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-4/D2-10         2.65µs ± 4%    2.60µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D-2-10        6.44µs ± 5%    6.31µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D2-10         6.27µs ± 4%    6.15µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D-2-10         19.3µs ± 4%    19.1µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D2-10          19.1µs ± 4%    18.9µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-10-10      10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-2-10       5.59µs ± 4%    5.49µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D2-10        4.68µs ± 4%    4.62µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D10-10       10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-10-10      22.2µs ± 4%    22.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-2-10       12.2µs ± 4%    12.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D2-10        12.2µs ± 4%    12.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D10-10       21.8µs ± 4%    21.7µs ± 4%     ~     (p=0.247 n=10+10)
Exp/P10/S2/D-10-10       44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.165 n=10+10)
Exp/P10/S2/D-2-10        31.8µs ± 4%    31.7µs ± 4%     ~     (p=0.190 n=10+10)
Exp/P10/S2/D2-10         28.7µs ± 4%    28.6µs ± 4%     ~     (p=0.139 n=10+10)
Exp/P10/S2/D10-10        44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P100/S-4/D-100-10     561µs ± 4%     565µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S-4/D-10-10      297µs ± 4%     299µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-4/D-2-10       286µs ± 4%     289µs ± 4%     ~     (p=0.315 n=10+10)
Exp/P100/S-4/D2-10        287µs ± 4%     289µs ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S-4/D10-10       292µs ± 4%     294µs ± 4%     ~     (p=0.529 n=10+10)
Exp/P100/S-4/D100-10      550µs ± 4%     554µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-100-10    1.05ms ± 4%    1.05ms ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D-10-10      637µs ± 4%     641µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-2-10       636µs ± 4%     641µs ± 4%     ~     (p=0.436 n=10+10)
Exp/P100/S-1/D2-10        612µs ± 4%     616µs ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D10-10       621µs ± 4%     625µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-1/D100-10     1.07ms ± 3%    1.08ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-100-10     1.54ms ± 3%    1.55ms ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S2/D-10-10      1.06ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-2-10        918µs ± 4%     924µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D2-10        1.01ms ± 4%    1.02ms ± 4%     ~     (p=0.853 n=10+10)
Exp/P100/S2/D10-10       1.05ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D100-10      1.58ms ± 4%    1.59ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P2/S-100/D2-10        25.1µs ± 4%    25.0µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-10/D2-10         22.6µs ± 4%    22.6µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-2/D2-10          24.9µs ± 4%    24.8µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S2/D2-10           28.1µs ± 4%    27.9µs ± 4%     ~     (p=0.139 n=10+10)
Ln/P2/S10/D2-10          27.0µs ± 4%    26.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S100/D2-10         28.7µs ± 4%    28.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-100/D2-10       78.1µs ± 4%    78.1µs ± 4%     ~     (p=0.218 n=10+10)
Ln/P10/S-100/D10-10      74.3µs ± 4%    74.3µs ± 4%     ~     (p=0.190 n=10+10)
Ln/P10/S-10/D2-10        83.4µs ± 4%    83.4µs ± 4%     ~     (p=0.315 n=10+10)
Ln/P10/S-10/D10-10       80.9µs ± 4%    80.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-2/D2-10         82.1µs ± 4%    82.2µs ± 5%     ~     (p=0.280 n=10+10)
Ln/P10/S-2/D10-10        90.6µs ± 4%    90.6µs ± 5%     ~     (p=0.190 n=10+10)
Ln/P10/S2/D2-10          80.6µs ± 4%    80.6µs ± 4%     ~     (p=0.184 n=10+10)
Ln/P10/S2/D10-10         79.7µs ± 4%    79.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S10/D2-10         85.2µs ± 4%    85.2µs ± 5%     ~     (p=0.218 n=10+10)
Ln/P10/S10/D10-10        74.8µs ± 4%    74.7µs ± 4%     ~     (p=0.247 n=10+10)
Ln/P10/S100/D2-10        83.8µs ± 4%    83.6µs ± 4%     ~     (p=0.165 n=10+10)
Ln/P10/S100/D10-10       66.1µs ± 4%    65.9µs ± 5%     ~     (p=0.143 n=10+10)
Ln/P100/S-100/D2-10      3.87ms ± 4%    3.90ms ± 5%     ~     (p=0.853 n=10+10)
Ln/P100/S-100/D10-10     3.34ms ± 4%    3.35ms ± 5%     ~     (p=0.529 n=10+10)
Ln/P100/S-100/D100-10    3.46ms ± 4%    3.48ms ± 5%     ~     (p=0.971 n=10+10)
Ln/P100/S-10/D2-10       3.63ms ± 4%    3.65ms ± 4%     ~     (p=0.739 n=10+10)
Ln/P100/S-10/D10-10      3.43ms ± 4%    3.44ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S-10/D100-10     3.42ms ± 4%    3.45ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D2-10        3.51ms ± 4%    3.53ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P100/S-2/D10-10       3.21ms ± 4%    3.24ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D100-10      3.47ms ± 4%    3.50ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S2/D2-10         3.56ms ± 4%    3.59ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S2/D10-10        3.82ms ± 4%    3.84ms ± 4%     ~     (p=1.000 n=10+10)
Ln/P100/S2/D100-10       3.72ms ± 4%    3.75ms ± 5%     ~     (p=0.353 n=10+10)
Ln/P100/S10/D2-10        3.78ms ± 4%    3.81ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S10/D10-10       3.50ms ± 4%    3.52ms ± 4%     ~     (p=0.529 n=10+10)
Ln/P100/S10/D100-10      3.49ms ± 4%    3.52ms ± 4%     ~     (p=0.684 n=10+10)
Ln/P100/S100/D2-10       3.82ms ± 4%    3.85ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D10-10      3.72ms ± 4%    3.75ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D100-10     3.65ms ± 4%    3.68ms ± 4%     ~     (p=0.315 n=10+10)
GDA/abs-10               10.4µs ± 6%    10.3µs ± 6%     ~     (p=0.564 n=10+10)
GDA/base-10               110µs ± 4%     111µs ± 5%     ~     (p=0.869 n=10+10)
GDA/comparetotal-10      30.2µs ± 5%    30.3µs ± 5%     ~     (p=0.631 n=10+10)
GDA/divide-10             391µs ± 7%     390µs ± 3%     ~     (p=0.853 n=10+10)
GDA/exp-10                127ms ± 4%     127ms ± 4%     ~     (p=0.796 n=10+10)
GDA/ln-10                83.6ms ± 3%    83.6ms ± 4%     ~     (p=0.218 n=10+10)
GDA/log10-10              106ms ± 4%     105ms ± 4%     ~     (p=0.218 n=10+10)
GDA/minus-10             11.7µs ± 4%    11.7µs ± 7%     ~     (p=0.579 n=10+10)
GDA/multiply-10          77.3µs ± 9%    80.8µs ± 9%     ~     (p=0.089 n=10+10)
GDA/plus-10              42.9µs ± 5%    44.2µs ± 4%     ~     (p=0.052 n=10+10)
GDA/power-10              218ms ± 3%     218ms ± 3%     ~     (p=0.631 n=10+10)
GDA/powersqrt-10          450ms ± 2%     448ms ± 2%     ~     (p=0.280 n=10+10)
GDA/quantize-10           143µs ±12%     128µs ± 8%     ~     (p=0.052 n=10+10)
GDA/randoms-10           3.05ms ± 5%    3.02ms ± 5%     ~     (p=0.143 n=10+10)
GDA/rounding-10           659µs ± 4%     658µs ± 4%     ~     (p=0.579 n=10+10)
GDA/squareroot-10        33.1ms ± 3%    32.7ms ± 3%     ~     (p=0.143 n=10+10)
GDA/subtract-10           172µs ±11%     170µs ± 8%     ~     (p=0.912 n=10+10)
GDA/tointegral-10        32.9µs ± 6%    32.2µs ± 5%     ~     (p=0.190 n=10+10)
GDA/tointegralx-10       33.6µs ± 6%    32.7µs ± 4%     ~     (p=0.190 n=10+10)
GDA/cuberoot-apd-10      2.08ms ± 3%    2.08ms ± 4%     ~     (p=0.165 n=10+10)
GDA/add-10                904µs ± 5%     919µs ± 5%   +1.65%  (p=0.043 n=10+10)
GDA/divideint-10         28.7µs ± 4%    29.7µs ± 1%   +3.51%  (p=0.026 n=9+6)

name                   old alloc/op   new alloc/op   delta
GDA/compare-10           10.1kB ± 0%     7.0kB ± 0%  -30.72%  (p=0.000 n=10+10)
GDA/quantize-10          42.3kB ± 0%    36.5kB ± 0%  -13.63%  (p=0.000 n=9+10)
GDA/divideint-10         6.51kB ± 0%    5.70kB ± 0%  -12.41%  (p=0.000 n=10+10)
GDA/remainder-10         22.4kB ± 0%    20.0kB ± 0%  -10.60%  (p=0.000 n=10+10)
GDA/tointegralx-10       19.2kB ± 0%    17.4kB ± 0%   -9.28%  (p=0.000 n=10+10)
GDA/tointegral-10        19.1kB ± 0%    17.4kB ± 0%   -8.83%  (p=0.000 n=10+10)
GDA/reduce-10            2.22kB ± 0%    2.02kB ± 0%   -8.66%  (p=0.000 n=10+10)
GDA/divide-10            74.3kB ± 0%    70.1kB ± 0%   -5.62%  (p=0.000 n=10+10)
GDA/squareroot-10        8.30MB ± 0%    7.96MB ± 0%   -4.08%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10        1.23kB ± 0%    1.18kB ± 0%   -3.92%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10       4.50kB ± 0%    4.33kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10        2.83kB ± 0%    2.72kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10         1.29kB ± 0%    1.24kB ± 0%   -3.72%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10        2.02kB ± 0%    1.94kB ± 0%   -3.71%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10        9.59kB ± 0%    9.24kB ± 0%   -3.66%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10         7.53kB ± 0%    7.26kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10          7.76kB ± 0%    7.48kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10         9.75kB ± 0%    9.40kB ± 0%   -3.60%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10       2.32kB ± 0%    2.24kB ± 0%   -3.58%  (p=0.000 n=10+10)
GDA/rounding-10           246kB ± 0%     237kB ± 0%   -3.56%  (p=0.000 n=10+10)
GDA/randoms-10           1.16MB ± 0%    1.12MB ± 0%   -3.55%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10       9.79kB ± 0%    9.44kB ± 0%   -3.54%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10        10.7kB ± 0%    10.3kB ± 0%   -3.53%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10         2.87kB ± 0%    2.77kB ± 0%   -3.52%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10      4.95kB ± 0%    4.78kB ± 0%   -3.47%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10        4.64kB ± 0%    4.49kB ± 0%   -3.40%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       261kB ± 0%     252kB ± 0%   -3.36%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10       5.17kB ± 0%    5.00kB ± 0%   -3.33%  (p=0.000 n=10+10)
GDA/powersqrt-10         77.6MB ± 0%    75.1MB ± 0%   -3.29%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10         17.1kB ± 0%    16.5kB ± 0%   -3.26%  (p=0.000 n=9+10)
Ln/P10/S-2/D10-10        18.6kB ± 0%    18.0kB ± 0%   -3.24%  (p=0.000 n=9+8)
Exp/P10/S-4/D-10-10      2.56kB ± 0%    2.48kB ± 0%   -3.24%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10          17.0kB ± 0%    16.5kB ± 0%   -3.23%  (p=0.000 n=7+10)
Ln/P10/S10/D10-10        15.8kB ± 0%    15.3kB ± 0%   -3.23%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10         18.0kB ± 0%    17.4kB ± 0%   -3.22%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10        17.0kB ± 0%    16.4kB ± 0%   -3.22%  (p=0.000 n=8+10)
Ln/P10/S2/D10-10         16.5kB ± 0%    16.0kB ± 0%   -3.21%  (p=0.000 n=10+10)
Ln/P10/S-10/D10-10       17.1kB ± 0%    16.5kB ± 0%   -3.18%  (p=0.000 n=10+7)
Exp/P10/S-4/D10-10       2.61kB ± 0%    2.52kB ± 0%   -3.18%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10          10.3kB ± 0%     9.9kB ± 0%   -3.15%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10       17.1kB ± 0%    16.6kB ± 0%   -3.11%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10        17.8kB ± 0%    17.2kB ± 0%   -3.11%  (p=0.000 n=8+10)
Ln/P2/S2/D2-10           10.3kB ± 0%     9.9kB ± 0%   -3.10%  (p=0.000 n=10+9)
Ln/P2/S-2/D2-10          9.62kB ± 0%    9.32kB ± 0%   -3.09%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10       15.0kB ± 0%    14.5kB ± 0%   -3.09%  (p=0.000 n=9+10)
Ln/P10/S-100/D10-10      16.3kB ± 0%    15.8kB ± 0%   -3.08%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10         8.91kB ± 0%    8.64kB ± 0%   -3.07%  (p=0.000 n=10+9)
GDA/subtract-10           100kB ± 0%      97kB ± 0%   -2.89%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10         10.9kB ± 0%    10.6kB ± 0%   -2.86%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10        9.81kB ± 0%    9.53kB ± 0%   -2.83%  (p=0.000 n=10+10)
GDA/ln-10                10.2MB ± 0%    10.0MB ± 0%   -2.09%  (p=0.000 n=10+10)
GDA/log10-10             12.5MB ± 0%    12.3MB ± 0%   -1.87%  (p=0.000 n=10+10)
GDA/power-10             27.0MB ± 0%    26.5MB ± 0%   -1.78%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10      31.9kB ± 0%    31.4kB ± 0%   -1.46%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10     33.5kB ± 0%    33.0kB ± 0%   -1.41%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10       31.8kB ± 0%    31.4kB ± 0%   -1.40%  (p=0.000 n=10+9)
Exp/P100/S-4/D-2-10      34.6kB ± 0%    34.1kB ± 0%   -1.36%  (p=0.000 n=10+9)
Exp/P100/S-1/D10-10      65.0kB ± 0%    64.2kB ± 0%   -1.27%  (p=0.000 n=8+10)
Exp/P100/S-1/D2-10       64.9kB ± 0%    64.1kB ± 0%   -1.24%  (p=0.000 n=10+10)
Exp/P100/S-1/D-10-10     70.1kB ± 0%    69.3kB ± 0%   -1.19%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10      72.6kB ± 0%    71.8kB ± 0%   -1.14%  (p=0.000 n=9+9)
Exp/P100/S2/D10-10        115kB ± 0%     114kB ± 0%   -1.05%  (p=0.000 n=10+10)
Exp/P100/S2/D2-10         113kB ± 0%     112kB ± 0%   -1.01%  (p=0.000 n=10+10)
Exp/P100/S2/D-2-10        107kB ± 0%     106kB ± 0%   -1.00%  (p=0.000 n=10+10)
Exp/P100/S2/D-10-10       117kB ± 0%     116kB ± 0%   -0.96%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       318kB ± 0%     315kB ± 0%   -0.83%  (p=0.000 n=10+8)
Ln/P100/S-100/D100-10     328kB ± 0%     326kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S-10/D2-10        336kB ± 0%     334kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         326kB ± 0%     324kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S100/D2-10        353kB ± 0%     350kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S10/D10-10        326kB ± 0%     324kB ± 0%   -0.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      340kB ± 0%     338kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         351kB ± 0%     348kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S-10/D100-10      323kB ± 0%     320kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        344kB ± 0%     341kB ± 0%   -0.77%  (p=0.000 n=9+6)
Ln/P100/S-100/D10-10      311kB ± 0%     309kB ± 0%   -0.76%  (p=0.000 n=10+7)
Ln/P100/S10/D100-10       323kB ± 0%     321kB ± 0%   -0.76%  (p=0.000 n=8+9)
Ln/P100/S-100/D2-10       362kB ± 0%     360kB ± 0%   -0.75%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       345kB ± 0%     342kB ± 0%   -0.74%  (p=0.000 n=9+8)
Exp/P100/S-1/D100-10     94.0kB ± 0%    93.3kB ± 0%   -0.74%  (p=0.000 n=8+10)
Exp/P100/S-4/D100-10     47.9kB ± 0%    47.5kB ± 0%   -0.74%  (p=0.000 n=6+9)
Ln/P100/S2/D2-10          333kB ± 0%     330kB ± 0%   -0.74%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10    97.5kB ± 0%    96.8kB ± 0%   -0.70%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10    51.5kB ± 0%    51.2kB ± 0%   -0.70%  (p=0.000 n=10+10)
Ln/P100/S10/D2-10         346kB ± 0%     343kB ± 0%   -0.69%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        306kB ± 0%     304kB ± 0%   -0.68%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       329kB ± 0%     326kB ± 0%   -0.67%  (p=0.000 n=10+8)
Exp/P100/S2/D100-10       150kB ± 0%     149kB ± 0%   -0.67%  (p=0.000 n=10+10)
Exp/P100/S2/D-100-10      148kB ± 0%     147kB ± 0%   -0.62%  (p=0.000 n=9+10)
GDA/exp-10               61.3MB ± 0%    61.1MB ± 0%   -0.18%  (p=0.000 n=10+10)
GDA/base-10              24.4kB ± 0%    24.4kB ± 0%     ~     (all equal)
GDA/comparetotal-10      7.11kB ± 0%    7.11kB ± 0%     ~     (all equal)
GDA/add-10                712kB ± 0%     720kB ± 0%   +1.18%  (p=0.000 n=10+8)
GDA/plus-10              46.6kB ± 0%    47.5kB ± 0%   +2.04%  (p=0.000 n=10+9)
GDA/multiply-10          35.5kB ± 0%    39.2kB ± 0%  +10.32%  (p=0.000 n=10+10)
GDA/minus-10             2.43kB ± 0%    2.82kB ± 0%  +16.12%  (p=0.000 n=10+10)
GDA/abs-10               2.33kB ± 0%    2.82kB ± 0%  +21.31%  (p=0.000 n=10+10)

name                   old allocs/op  new allocs/op  delta
GDA/reduce-10               187 ± 0%        43 ± 0%  -77.01%  (p=0.000 n=10+10)
GDA/compare-10              638 ± 0%       250 ± 0%  -60.82%  (p=0.000 n=10+10)
GDA/minus-10                164 ± 0%        73 ± 0%  -55.49%  (p=0.000 n=10+10)
GDA/abs-10                  151 ± 0%        73 ± 0%  -51.66%  (p=0.000 n=10+10)
GDA/quantize-10           1.70k ± 0%     0.98k ± 0%  -42.30%  (p=0.000 n=10+10)
GDA/tointegralx-10          677 ± 0%       454 ± 0%  -32.94%  (p=0.000 n=10+10)
GDA/remainder-10            923 ± 0%       626 ± 0%  -32.18%  (p=0.000 n=10+10)
GDA/tointegral-10           665 ± 0%       454 ± 0%  -31.73%  (p=0.000 n=10+10)
GDA/divideint-10            357 ± 0%       256 ± 0%  -28.29%  (p=0.000 n=10+10)
GDA/subtract-10           2.85k ± 0%     2.14k ± 0%  -24.89%  (p=0.000 n=10+10)
GDA/divide-10             2.51k ± 0%     1.91k ± 0%  -24.01%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10          47.0 ± 0%      37.0 ± 0%  -21.28%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10           50.0 ± 0%      40.0 ± 0%  -20.00%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10          76.0 ± 0%      62.0 ± 0%  -18.42%  (p=0.000 n=10+10)
GDA/rounding-10           9.20k ± 0%     7.59k ± 0%  -17.52%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10         87.0 ± 0%      72.0 ± 0%  -17.24%  (p=0.000 n=10+10)
Exp/P10/S-4/D10-10         95.0 ± 0%      79.0 ± 0%  -16.84%  (p=0.000 n=10+10)
GDA/randoms-10            40.4k ± 0%     33.6k ± 0%  -16.82%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10           104 ± 0%        87 ± 0%  -16.35%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10            105 ± 0%        88 ± 0%  -16.19%  (p=0.000 n=10+10)
Exp/P10/S-4/D-10-10        93.0 ± 0%      78.0 ± 0%  -16.13%  (p=0.000 n=10+10)
GDA/squareroot-10          275k ± 0%      230k ± 0%  -16.08%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10         174 ± 0%       147 ± 0%  -15.52%  (p=0.000 n=10+10)
GDA/plus-10                 587 ± 0%       496 ± 0%  -15.50%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10          163 ± 0%       138 ± 0%  -15.34%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10           166 ± 0%       141 ± 0%  -15.06%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10          182 ± 0%       155 ± 0%  -14.84%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10          335 ± 0%       286 ± 0%  -14.63%  (p=0.000 n=10+10)
Ln/P2/S-2/D2-10             342 ± 0%       292 ± 0%  -14.62%  (p=0.000 n=10+10)
Ln/P2/S2/D2-10              363 ± 0%       310 ± 0%  -14.60%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10             366 ± 0%       313 ± 0%  -14.48%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10           335 ± 0%       287 ± 0%  -14.33%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       7.67k ± 0%     6.58k ± 0%  -14.27%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10            582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10             582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S10/D10-10           541 ± 0%       464 ± 0%  -14.23%  (p=0.000 n=10+10)
Ln/P10/S2/D10-10            563 ± 0%       483 ± 0%  -14.21%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10            317 ± 0%       272 ± 0%  -14.20%  (p=0.000 n=10+10)
GDA/powersqrt-10          2.63M ± 0%     2.26M ± 0%  -14.17%  (p=0.000 n=10+10)
Ln/P10/S-2/D10-10           637 ± 0%       547 ± 0%  -14.13%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10           581 ± 0%       499 ± 0%  -14.11%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10           369 ± 0%       317 ± 0%  -14.09%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10            342 ± 0%       294 ± 0%  -14.04%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10            613 ± 0%       527 ± 0%  -14.03%  (p=0.000 n=9+10)
Ln/P10/S-10/D10-10          581 ± 0%       500 ± 0%  -13.99%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10            272 ± 0%       234 ± 0%  -13.97%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10          575 ± 0%       495 ± 0%  -13.91%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10           597 ± 0%       514 ± 0%  -13.90%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10          504 ± 0%       434 ± 0%  -13.89%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10             281 ± 0%       242 ± 0%  -13.88%  (p=0.000 n=10+10)
Ln/P10/S-100/D10-10         549 ± 0%       473 ± 0%  -13.84%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10           336 ± 0%       290 ± 0%  -13.69%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10            376 ± 0%       325 ± 0%  -13.56%  (p=0.000 n=10+10)
GDA/ln-10                  269k ± 0%      239k ± 0%  -10.88%  (p=0.000 n=10+10)
GDA/log10-10               326k ± 0%      292k ± 0%  -10.44%  (p=0.000 n=10+10)
GDA/power-10               699k ± 0%      627k ± 0%  -10.30%  (p=0.000 n=10+10)
GDA/add-10                13.2k ± 0%     11.9k ± 0%   -9.68%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10         701 ± 0%       638 ± 0%   -8.99%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10        715 ± 0%       651 ± 0%   -8.95%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10          686 ± 0%       626 ± 0%   -8.75%  (p=0.000 n=10+10)
Exp/P100/S-4/D-2-10         723 ± 0%       661 ± 0%   -8.58%  (p=0.000 n=10+10)
Exp/P100/S-1/D10-10       1.36k ± 0%     1.26k ± 0%   -7.99%  (p=0.000 n=10+10)
Exp/P100/S-1/D2-10        1.35k ± 0%     1.24k ± 0%   -7.72%  (p=0.000 n=9+10)
Exp/P100/S-1/D-10-10      1.40k ± 0%     1.29k ± 0%   -7.70%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10       1.43k ± 0%     1.32k ± 0%   -7.48%  (p=0.000 n=9+10)
Exp/P100/S2/D-2-10        2.03k ± 0%     1.89k ± 0%   -6.82%  (p=0.000 n=9+10)
Exp/P100/S2/D10-10        2.28k ± 0%     2.13k ± 0%   -6.81%  (p=0.000 n=10+9)
Exp/P100/S2/D-10-10       2.19k ± 0%     2.05k ± 0%   -6.66%  (p=0.001 n=8+9)
Exp/P100/S2/D2-10         2.21k ± 0%     2.06k ± 0%   -6.61%  (p=0.000 n=9+10)
Ln/P100/S-10/D2-10        6.00k ± 0%     5.65k ± 0%   -5.87%  (p=0.000 n=10+9)
Ln/P100/S100/D2-10        6.25k ± 0%     5.89k ± 0%   -5.80%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         5.79k ± 0%     5.46k ± 0%   -5.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      6.04k ± 0%     5.69k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         6.21k ± 0%     5.86k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        6.09k ± 0%     5.74k ± 0%   -5.73%  (p=0.000 n=10+8)
Exp/P100/S-4/D100-10        855 ± 0%       806 ± 0%   -5.73%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       5.62k ± 0%     5.30k ± 0%   -5.72%  (p=0.000 n=10+8)
Ln/P100/S-100/D2-10       6.34k ± 0%     5.98k ± 0%   -5.68%  (p=0.000 n=10+10)
Ln/P100/S10/D100-10       5.73k ± 0%     5.41k ± 0%   -5.67%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       6.08k ± 0%     5.74k ± 0%   -5.66%  (p=0.000 n=9+8)
Ln/P100/S10/D2-10         6.13k ± 0%     5.78k ± 0%   -5.66%  (p=0.000 n=8+10)
Ln/P100/S-10/D100-10      5.69k ± 0%     5.37k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S10/D10-10        5.74k ± 0%     5.41k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S-100/D10-10      5.55k ± 0%     5.23k ± 0%   -5.63%  (p=0.000 n=10+7)
Ln/P100/S2/D2-10          5.86k ± 0%     5.53k ± 0%   -5.62%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10       893 ± 0%       843 ± 0%   -5.60%  (p=0.000 n=10+10)
Ln/P100/S-100/D100-10     5.73k ± 0%     5.41k ± 0%   -5.54%  (p=0.000 n=10+10)
Exp/P100/S-1/D100-10      1.68k ± 0%     1.58k ± 0%   -5.52%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        5.40k ± 0%     5.10k ± 0%   -5.49%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       5.77k ± 0%     5.45k ± 0%   -5.49%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10     1.68k ± 0%     1.59k ± 0%   -5.41%  (p=0.000 n=10+10)
Exp/P100/S2/D100-10       2.61k ± 0%     2.48k ± 0%   -5.03%  (p=0.000 n=8+10)
Exp/P100/S2/D-100-10      2.47k ± 0%     2.35k ± 0%   -4.90%  (p=0.000 n=10+10)
GDA/multiply-10             931 ± 0%       889 ± 0%   -4.51%  (p=0.000 n=10+10)
GDA/exp-10                 911k ± 0%      895k ± 0%   -1.75%  (p=0.000 n=10+10)
GDA/base-10               2.11k ± 0%     2.11k ± 0%     ~     (all equal)
GDA/comparetotal-10         254 ± 0%       254 ± 0%     ~     (all equal)

cc. @mjibson

They cause the benchmarks to run for a very long time.
See golang/go#27217.

Adjust the benchmarks to have an explicit setup phase
and run phase, separated by `b.ResetTimer`.
This reduces the size of the Decimal struct from 48 bytes to 40 bytes.
This commit introduces a performance optimization that embeds small
coefficient values directly in their `Decimal` struct, instead of
storing these values in a separate heap allocation.

Each `Decimal` maintains (through `big.Int`) an internal reference to a
variable-length coefficient, which is represented by a `[]big.Word`. The
`coeffInline` field and the `lazyInit` method combine to allow us to
inline this coefficient within the `Decimal` struct when its value is
sufficiently small. In `lazyInit`, we point the `Coeff` field's backing
slice at the `coeffInline` array. `big.Int` will avoid re-allocating
this array until it is provided with a value that exceeds the initial
capacity. We set this initial capacity to accommodate any coefficient
that would fit in a 64-bit integer (i.e. values up to 2^64).

This is an alternative to an optimization that many other arbitrary
precision decimal libraries have where small coefficient values are
stored as numeric fields in their data type's struct. Only when this
coefficient value gets large enough do these libraries fall back to a
variable-length coefficient with internal indirection. We can see the
optimization in practice in the `ericlagergren/decimal` library, where
each struct contains a `compact uint64` and an `unscaled big.Int`. Prior
concern from the authors of `cockroachdb/apd` regarding this form of
compact representation optimization was that it traded performance for
complexity. The optimization fractured control flow, leaking out across
the library and leading to more complex, error-prone code.

The approach taken in this commit does not have the same issue. All
arithmetic on the decimal's coefficient is still deferred to `bit.Int`.
In fact, the entire optimization is best-effort, and bugs that lead to
missed calls to `lazyInit` are merely missed opportunities to avoid a
heap allocation, and nothing more serious. The internal reference within
Decimal is vulnerable to memory aliasing bugs when a `Decimal` is copied
by value (and not through the `Decimal.Set` method), but this is not a
new concern.

Impact on benchmarks:

```
name                   old time/op    new time/op    delta
GDA/compare-10           61.5µs ±20%    54.5µs ±18%  -11.34%  (p=0.035 n=10+10)
GDA/reduce-10            18.8µs ± 7%    17.5µs ± 6%   -7.16%  (p=0.001 n=10+10)
GDA/remainder-10         67.0µs ±11%    62.2µs ± 6%   -7.09%  (p=0.023 n=10+10)
Exp/P5/S-4/D-2-10        2.56µs ± 4%    2.51µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-4/D2-10         2.65µs ± 4%    2.60µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D-2-10        6.44µs ± 5%    6.31µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D2-10         6.27µs ± 4%    6.15µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D-2-10         19.3µs ± 4%    19.1µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D2-10          19.1µs ± 4%    18.9µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-10-10      10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-2-10       5.59µs ± 4%    5.49µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D2-10        4.68µs ± 4%    4.62µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D10-10       10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-10-10      22.2µs ± 4%    22.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-2-10       12.2µs ± 4%    12.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D2-10        12.2µs ± 4%    12.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D10-10       21.8µs ± 4%    21.7µs ± 4%     ~     (p=0.247 n=10+10)
Exp/P10/S2/D-10-10       44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.165 n=10+10)
Exp/P10/S2/D-2-10        31.8µs ± 4%    31.7µs ± 4%     ~     (p=0.190 n=10+10)
Exp/P10/S2/D2-10         28.7µs ± 4%    28.6µs ± 4%     ~     (p=0.139 n=10+10)
Exp/P10/S2/D10-10        44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P100/S-4/D-100-10     561µs ± 4%     565µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S-4/D-10-10      297µs ± 4%     299µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-4/D-2-10       286µs ± 4%     289µs ± 4%     ~     (p=0.315 n=10+10)
Exp/P100/S-4/D2-10        287µs ± 4%     289µs ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S-4/D10-10       292µs ± 4%     294µs ± 4%     ~     (p=0.529 n=10+10)
Exp/P100/S-4/D100-10      550µs ± 4%     554µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-100-10    1.05ms ± 4%    1.05ms ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D-10-10      637µs ± 4%     641µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-2-10       636µs ± 4%     641µs ± 4%     ~     (p=0.436 n=10+10)
Exp/P100/S-1/D2-10        612µs ± 4%     616µs ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D10-10       621µs ± 4%     625µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-1/D100-10     1.07ms ± 3%    1.08ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-100-10     1.54ms ± 3%    1.55ms ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S2/D-10-10      1.06ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-2-10        918µs ± 4%     924µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D2-10        1.01ms ± 4%    1.02ms ± 4%     ~     (p=0.853 n=10+10)
Exp/P100/S2/D10-10       1.05ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D100-10      1.58ms ± 4%    1.59ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P2/S-100/D2-10        25.1µs ± 4%    25.0µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-10/D2-10         22.6µs ± 4%    22.6µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-2/D2-10          24.9µs ± 4%    24.8µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S2/D2-10           28.1µs ± 4%    27.9µs ± 4%     ~     (p=0.139 n=10+10)
Ln/P2/S10/D2-10          27.0µs ± 4%    26.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S100/D2-10         28.7µs ± 4%    28.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-100/D2-10       78.1µs ± 4%    78.1µs ± 4%     ~     (p=0.218 n=10+10)
Ln/P10/S-100/D10-10      74.3µs ± 4%    74.3µs ± 4%     ~     (p=0.190 n=10+10)
Ln/P10/S-10/D2-10        83.4µs ± 4%    83.4µs ± 4%     ~     (p=0.315 n=10+10)
Ln/P10/S-10/D10-10       80.9µs ± 4%    80.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-2/D2-10         82.1µs ± 4%    82.2µs ± 5%     ~     (p=0.280 n=10+10)
Ln/P10/S-2/D10-10        90.6µs ± 4%    90.6µs ± 5%     ~     (p=0.190 n=10+10)
Ln/P10/S2/D2-10          80.6µs ± 4%    80.6µs ± 4%     ~     (p=0.184 n=10+10)
Ln/P10/S2/D10-10         79.7µs ± 4%    79.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S10/D2-10         85.2µs ± 4%    85.2µs ± 5%     ~     (p=0.218 n=10+10)
Ln/P10/S10/D10-10        74.8µs ± 4%    74.7µs ± 4%     ~     (p=0.247 n=10+10)
Ln/P10/S100/D2-10        83.8µs ± 4%    83.6µs ± 4%     ~     (p=0.165 n=10+10)
Ln/P10/S100/D10-10       66.1µs ± 4%    65.9µs ± 5%     ~     (p=0.143 n=10+10)
Ln/P100/S-100/D2-10      3.87ms ± 4%    3.90ms ± 5%     ~     (p=0.853 n=10+10)
Ln/P100/S-100/D10-10     3.34ms ± 4%    3.35ms ± 5%     ~     (p=0.529 n=10+10)
Ln/P100/S-100/D100-10    3.46ms ± 4%    3.48ms ± 5%     ~     (p=0.971 n=10+10)
Ln/P100/S-10/D2-10       3.63ms ± 4%    3.65ms ± 4%     ~     (p=0.739 n=10+10)
Ln/P100/S-10/D10-10      3.43ms ± 4%    3.44ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S-10/D100-10     3.42ms ± 4%    3.45ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D2-10        3.51ms ± 4%    3.53ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P100/S-2/D10-10       3.21ms ± 4%    3.24ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D100-10      3.47ms ± 4%    3.50ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S2/D2-10         3.56ms ± 4%    3.59ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S2/D10-10        3.82ms ± 4%    3.84ms ± 4%     ~     (p=1.000 n=10+10)
Ln/P100/S2/D100-10       3.72ms ± 4%    3.75ms ± 5%     ~     (p=0.353 n=10+10)
Ln/P100/S10/D2-10        3.78ms ± 4%    3.81ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S10/D10-10       3.50ms ± 4%    3.52ms ± 4%     ~     (p=0.529 n=10+10)
Ln/P100/S10/D100-10      3.49ms ± 4%    3.52ms ± 4%     ~     (p=0.684 n=10+10)
Ln/P100/S100/D2-10       3.82ms ± 4%    3.85ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D10-10      3.72ms ± 4%    3.75ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D100-10     3.65ms ± 4%    3.68ms ± 4%     ~     (p=0.315 n=10+10)
GDA/abs-10               10.4µs ± 6%    10.3µs ± 6%     ~     (p=0.564 n=10+10)
GDA/base-10               110µs ± 4%     111µs ± 5%     ~     (p=0.869 n=10+10)
GDA/comparetotal-10      30.2µs ± 5%    30.3µs ± 5%     ~     (p=0.631 n=10+10)
GDA/divide-10             391µs ± 7%     390µs ± 3%     ~     (p=0.853 n=10+10)
GDA/exp-10                127ms ± 4%     127ms ± 4%     ~     (p=0.796 n=10+10)
GDA/ln-10                83.6ms ± 3%    83.6ms ± 4%     ~     (p=0.218 n=10+10)
GDA/log10-10              106ms ± 4%     105ms ± 4%     ~     (p=0.218 n=10+10)
GDA/minus-10             11.7µs ± 4%    11.7µs ± 7%     ~     (p=0.579 n=10+10)
GDA/multiply-10          77.3µs ± 9%    80.8µs ± 9%     ~     (p=0.089 n=10+10)
GDA/plus-10              42.9µs ± 5%    44.2µs ± 4%     ~     (p=0.052 n=10+10)
GDA/power-10              218ms ± 3%     218ms ± 3%     ~     (p=0.631 n=10+10)
GDA/powersqrt-10          450ms ± 2%     448ms ± 2%     ~     (p=0.280 n=10+10)
GDA/quantize-10           143µs ±12%     128µs ± 8%     ~     (p=0.052 n=10+10)
GDA/randoms-10           3.05ms ± 5%    3.02ms ± 5%     ~     (p=0.143 n=10+10)
GDA/rounding-10           659µs ± 4%     658µs ± 4%     ~     (p=0.579 n=10+10)
GDA/squareroot-10        33.1ms ± 3%    32.7ms ± 3%     ~     (p=0.143 n=10+10)
GDA/subtract-10           172µs ±11%     170µs ± 8%     ~     (p=0.912 n=10+10)
GDA/tointegral-10        32.9µs ± 6%    32.2µs ± 5%     ~     (p=0.190 n=10+10)
GDA/tointegralx-10       33.6µs ± 6%    32.7µs ± 4%     ~     (p=0.190 n=10+10)
GDA/cuberoot-apd-10      2.08ms ± 3%    2.08ms ± 4%     ~     (p=0.165 n=10+10)
GDA/add-10                904µs ± 5%     919µs ± 5%   +1.65%  (p=0.043 n=10+10)
GDA/divideint-10         28.7µs ± 4%    29.7µs ± 1%   +3.51%  (p=0.026 n=9+6)

name                   old alloc/op   new alloc/op   delta
GDA/compare-10           10.1kB ± 0%     7.0kB ± 0%  -30.72%  (p=0.000 n=10+10)
GDA/quantize-10          42.3kB ± 0%    36.5kB ± 0%  -13.63%  (p=0.000 n=9+10)
GDA/divideint-10         6.51kB ± 0%    5.70kB ± 0%  -12.41%  (p=0.000 n=10+10)
GDA/remainder-10         22.4kB ± 0%    20.0kB ± 0%  -10.60%  (p=0.000 n=10+10)
GDA/tointegralx-10       19.2kB ± 0%    17.4kB ± 0%   -9.28%  (p=0.000 n=10+10)
GDA/tointegral-10        19.1kB ± 0%    17.4kB ± 0%   -8.83%  (p=0.000 n=10+10)
GDA/reduce-10            2.22kB ± 0%    2.02kB ± 0%   -8.66%  (p=0.000 n=10+10)
GDA/divide-10            74.3kB ± 0%    70.1kB ± 0%   -5.62%  (p=0.000 n=10+10)
GDA/squareroot-10        8.30MB ± 0%    7.96MB ± 0%   -4.08%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10        1.23kB ± 0%    1.18kB ± 0%   -3.92%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10       4.50kB ± 0%    4.33kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10        2.83kB ± 0%    2.72kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10         1.29kB ± 0%    1.24kB ± 0%   -3.72%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10        2.02kB ± 0%    1.94kB ± 0%   -3.71%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10        9.59kB ± 0%    9.24kB ± 0%   -3.66%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10         7.53kB ± 0%    7.26kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10          7.76kB ± 0%    7.48kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10         9.75kB ± 0%    9.40kB ± 0%   -3.60%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10       2.32kB ± 0%    2.24kB ± 0%   -3.58%  (p=0.000 n=10+10)
GDA/rounding-10           246kB ± 0%     237kB ± 0%   -3.56%  (p=0.000 n=10+10)
GDA/randoms-10           1.16MB ± 0%    1.12MB ± 0%   -3.55%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10       9.79kB ± 0%    9.44kB ± 0%   -3.54%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10        10.7kB ± 0%    10.3kB ± 0%   -3.53%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10         2.87kB ± 0%    2.77kB ± 0%   -3.52%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10      4.95kB ± 0%    4.78kB ± 0%   -3.47%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10        4.64kB ± 0%    4.49kB ± 0%   -3.40%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       261kB ± 0%     252kB ± 0%   -3.36%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10       5.17kB ± 0%    5.00kB ± 0%   -3.33%  (p=0.000 n=10+10)
GDA/powersqrt-10         77.6MB ± 0%    75.1MB ± 0%   -3.29%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10         17.1kB ± 0%    16.5kB ± 0%   -3.26%  (p=0.000 n=9+10)
Ln/P10/S-2/D10-10        18.6kB ± 0%    18.0kB ± 0%   -3.24%  (p=0.000 n=9+8)
Exp/P10/S-4/D-10-10      2.56kB ± 0%    2.48kB ± 0%   -3.24%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10          17.0kB ± 0%    16.5kB ± 0%   -3.23%  (p=0.000 n=7+10)
Ln/P10/S10/D10-10        15.8kB ± 0%    15.3kB ± 0%   -3.23%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10         18.0kB ± 0%    17.4kB ± 0%   -3.22%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10        17.0kB ± 0%    16.4kB ± 0%   -3.22%  (p=0.000 n=8+10)
Ln/P10/S2/D10-10         16.5kB ± 0%    16.0kB ± 0%   -3.21%  (p=0.000 n=10+10)
Ln/P10/S-10/D10-10       17.1kB ± 0%    16.5kB ± 0%   -3.18%  (p=0.000 n=10+7)
Exp/P10/S-4/D10-10       2.61kB ± 0%    2.52kB ± 0%   -3.18%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10          10.3kB ± 0%     9.9kB ± 0%   -3.15%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10       17.1kB ± 0%    16.6kB ± 0%   -3.11%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10        17.8kB ± 0%    17.2kB ± 0%   -3.11%  (p=0.000 n=8+10)
Ln/P2/S2/D2-10           10.3kB ± 0%     9.9kB ± 0%   -3.10%  (p=0.000 n=10+9)
Ln/P2/S-2/D2-10          9.62kB ± 0%    9.32kB ± 0%   -3.09%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10       15.0kB ± 0%    14.5kB ± 0%   -3.09%  (p=0.000 n=9+10)
Ln/P10/S-100/D10-10      16.3kB ± 0%    15.8kB ± 0%   -3.08%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10         8.91kB ± 0%    8.64kB ± 0%   -3.07%  (p=0.000 n=10+9)
GDA/subtract-10           100kB ± 0%      97kB ± 0%   -2.89%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10         10.9kB ± 0%    10.6kB ± 0%   -2.86%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10        9.81kB ± 0%    9.53kB ± 0%   -2.83%  (p=0.000 n=10+10)
GDA/ln-10                10.2MB ± 0%    10.0MB ± 0%   -2.09%  (p=0.000 n=10+10)
GDA/log10-10             12.5MB ± 0%    12.3MB ± 0%   -1.87%  (p=0.000 n=10+10)
GDA/power-10             27.0MB ± 0%    26.5MB ± 0%   -1.78%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10      31.9kB ± 0%    31.4kB ± 0%   -1.46%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10     33.5kB ± 0%    33.0kB ± 0%   -1.41%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10       31.8kB ± 0%    31.4kB ± 0%   -1.40%  (p=0.000 n=10+9)
Exp/P100/S-4/D-2-10      34.6kB ± 0%    34.1kB ± 0%   -1.36%  (p=0.000 n=10+9)
Exp/P100/S-1/D10-10      65.0kB ± 0%    64.2kB ± 0%   -1.27%  (p=0.000 n=8+10)
Exp/P100/S-1/D2-10       64.9kB ± 0%    64.1kB ± 0%   -1.24%  (p=0.000 n=10+10)
Exp/P100/S-1/D-10-10     70.1kB ± 0%    69.3kB ± 0%   -1.19%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10      72.6kB ± 0%    71.8kB ± 0%   -1.14%  (p=0.000 n=9+9)
Exp/P100/S2/D10-10        115kB ± 0%     114kB ± 0%   -1.05%  (p=0.000 n=10+10)
Exp/P100/S2/D2-10         113kB ± 0%     112kB ± 0%   -1.01%  (p=0.000 n=10+10)
Exp/P100/S2/D-2-10        107kB ± 0%     106kB ± 0%   -1.00%  (p=0.000 n=10+10)
Exp/P100/S2/D-10-10       117kB ± 0%     116kB ± 0%   -0.96%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       318kB ± 0%     315kB ± 0%   -0.83%  (p=0.000 n=10+8)
Ln/P100/S-100/D100-10     328kB ± 0%     326kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S-10/D2-10        336kB ± 0%     334kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         326kB ± 0%     324kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S100/D2-10        353kB ± 0%     350kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S10/D10-10        326kB ± 0%     324kB ± 0%   -0.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      340kB ± 0%     338kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         351kB ± 0%     348kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S-10/D100-10      323kB ± 0%     320kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        344kB ± 0%     341kB ± 0%   -0.77%  (p=0.000 n=9+6)
Ln/P100/S-100/D10-10      311kB ± 0%     309kB ± 0%   -0.76%  (p=0.000 n=10+7)
Ln/P100/S10/D100-10       323kB ± 0%     321kB ± 0%   -0.76%  (p=0.000 n=8+9)
Ln/P100/S-100/D2-10       362kB ± 0%     360kB ± 0%   -0.75%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       345kB ± 0%     342kB ± 0%   -0.74%  (p=0.000 n=9+8)
Exp/P100/S-1/D100-10     94.0kB ± 0%    93.3kB ± 0%   -0.74%  (p=0.000 n=8+10)
Exp/P100/S-4/D100-10     47.9kB ± 0%    47.5kB ± 0%   -0.74%  (p=0.000 n=6+9)
Ln/P100/S2/D2-10          333kB ± 0%     330kB ± 0%   -0.74%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10    97.5kB ± 0%    96.8kB ± 0%   -0.70%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10    51.5kB ± 0%    51.2kB ± 0%   -0.70%  (p=0.000 n=10+10)
Ln/P100/S10/D2-10         346kB ± 0%     343kB ± 0%   -0.69%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        306kB ± 0%     304kB ± 0%   -0.68%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       329kB ± 0%     326kB ± 0%   -0.67%  (p=0.000 n=10+8)
Exp/P100/S2/D100-10       150kB ± 0%     149kB ± 0%   -0.67%  (p=0.000 n=10+10)
Exp/P100/S2/D-100-10      148kB ± 0%     147kB ± 0%   -0.62%  (p=0.000 n=9+10)
GDA/exp-10               61.3MB ± 0%    61.1MB ± 0%   -0.18%  (p=0.000 n=10+10)
GDA/base-10              24.4kB ± 0%    24.4kB ± 0%     ~     (all equal)
GDA/comparetotal-10      7.11kB ± 0%    7.11kB ± 0%     ~     (all equal)
GDA/add-10                712kB ± 0%     720kB ± 0%   +1.18%  (p=0.000 n=10+8)
GDA/plus-10              46.6kB ± 0%    47.5kB ± 0%   +2.04%  (p=0.000 n=10+9)
GDA/multiply-10          35.5kB ± 0%    39.2kB ± 0%  +10.32%  (p=0.000 n=10+10)
GDA/minus-10             2.43kB ± 0%    2.82kB ± 0%  +16.12%  (p=0.000 n=10+10)
GDA/abs-10               2.33kB ± 0%    2.82kB ± 0%  +21.31%  (p=0.000 n=10+10)

name                   old allocs/op  new allocs/op  delta
GDA/reduce-10               187 ± 0%        43 ± 0%  -77.01%  (p=0.000 n=10+10)
GDA/compare-10              638 ± 0%       250 ± 0%  -60.82%  (p=0.000 n=10+10)
GDA/minus-10                164 ± 0%        73 ± 0%  -55.49%  (p=0.000 n=10+10)
GDA/abs-10                  151 ± 0%        73 ± 0%  -51.66%  (p=0.000 n=10+10)
GDA/quantize-10           1.70k ± 0%     0.98k ± 0%  -42.30%  (p=0.000 n=10+10)
GDA/tointegralx-10          677 ± 0%       454 ± 0%  -32.94%  (p=0.000 n=10+10)
GDA/remainder-10            923 ± 0%       626 ± 0%  -32.18%  (p=0.000 n=10+10)
GDA/tointegral-10           665 ± 0%       454 ± 0%  -31.73%  (p=0.000 n=10+10)
GDA/divideint-10            357 ± 0%       256 ± 0%  -28.29%  (p=0.000 n=10+10)
GDA/subtract-10           2.85k ± 0%     2.14k ± 0%  -24.89%  (p=0.000 n=10+10)
GDA/divide-10             2.51k ± 0%     1.91k ± 0%  -24.01%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10          47.0 ± 0%      37.0 ± 0%  -21.28%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10           50.0 ± 0%      40.0 ± 0%  -20.00%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10          76.0 ± 0%      62.0 ± 0%  -18.42%  (p=0.000 n=10+10)
GDA/rounding-10           9.20k ± 0%     7.59k ± 0%  -17.52%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10         87.0 ± 0%      72.0 ± 0%  -17.24%  (p=0.000 n=10+10)
Exp/P10/S-4/D10-10         95.0 ± 0%      79.0 ± 0%  -16.84%  (p=0.000 n=10+10)
GDA/randoms-10            40.4k ± 0%     33.6k ± 0%  -16.82%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10           104 ± 0%        87 ± 0%  -16.35%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10            105 ± 0%        88 ± 0%  -16.19%  (p=0.000 n=10+10)
Exp/P10/S-4/D-10-10        93.0 ± 0%      78.0 ± 0%  -16.13%  (p=0.000 n=10+10)
GDA/squareroot-10          275k ± 0%      230k ± 0%  -16.08%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10         174 ± 0%       147 ± 0%  -15.52%  (p=0.000 n=10+10)
GDA/plus-10                 587 ± 0%       496 ± 0%  -15.50%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10          163 ± 0%       138 ± 0%  -15.34%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10           166 ± 0%       141 ± 0%  -15.06%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10          182 ± 0%       155 ± 0%  -14.84%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10          335 ± 0%       286 ± 0%  -14.63%  (p=0.000 n=10+10)
Ln/P2/S-2/D2-10             342 ± 0%       292 ± 0%  -14.62%  (p=0.000 n=10+10)
Ln/P2/S2/D2-10              363 ± 0%       310 ± 0%  -14.60%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10             366 ± 0%       313 ± 0%  -14.48%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10           335 ± 0%       287 ± 0%  -14.33%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       7.67k ± 0%     6.58k ± 0%  -14.27%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10            582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10             582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S10/D10-10           541 ± 0%       464 ± 0%  -14.23%  (p=0.000 n=10+10)
Ln/P10/S2/D10-10            563 ± 0%       483 ± 0%  -14.21%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10            317 ± 0%       272 ± 0%  -14.20%  (p=0.000 n=10+10)
GDA/powersqrt-10          2.63M ± 0%     2.26M ± 0%  -14.17%  (p=0.000 n=10+10)
Ln/P10/S-2/D10-10           637 ± 0%       547 ± 0%  -14.13%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10           581 ± 0%       499 ± 0%  -14.11%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10           369 ± 0%       317 ± 0%  -14.09%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10            342 ± 0%       294 ± 0%  -14.04%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10            613 ± 0%       527 ± 0%  -14.03%  (p=0.000 n=9+10)
Ln/P10/S-10/D10-10          581 ± 0%       500 ± 0%  -13.99%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10            272 ± 0%       234 ± 0%  -13.97%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10          575 ± 0%       495 ± 0%  -13.91%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10           597 ± 0%       514 ± 0%  -13.90%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10          504 ± 0%       434 ± 0%  -13.89%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10             281 ± 0%       242 ± 0%  -13.88%  (p=0.000 n=10+10)
Ln/P10/S-100/D10-10         549 ± 0%       473 ± 0%  -13.84%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10           336 ± 0%       290 ± 0%  -13.69%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10            376 ± 0%       325 ± 0%  -13.56%  (p=0.000 n=10+10)
GDA/ln-10                  269k ± 0%      239k ± 0%  -10.88%  (p=0.000 n=10+10)
GDA/log10-10               326k ± 0%      292k ± 0%  -10.44%  (p=0.000 n=10+10)
GDA/power-10               699k ± 0%      627k ± 0%  -10.30%  (p=0.000 n=10+10)
GDA/add-10                13.2k ± 0%     11.9k ± 0%   -9.68%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10         701 ± 0%       638 ± 0%   -8.99%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10        715 ± 0%       651 ± 0%   -8.95%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10          686 ± 0%       626 ± 0%   -8.75%  (p=0.000 n=10+10)
Exp/P100/S-4/D-2-10         723 ± 0%       661 ± 0%   -8.58%  (p=0.000 n=10+10)
Exp/P100/S-1/D10-10       1.36k ± 0%     1.26k ± 0%   -7.99%  (p=0.000 n=10+10)
Exp/P100/S-1/D2-10        1.35k ± 0%     1.24k ± 0%   -7.72%  (p=0.000 n=9+10)
Exp/P100/S-1/D-10-10      1.40k ± 0%     1.29k ± 0%   -7.70%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10       1.43k ± 0%     1.32k ± 0%   -7.48%  (p=0.000 n=9+10)
Exp/P100/S2/D-2-10        2.03k ± 0%     1.89k ± 0%   -6.82%  (p=0.000 n=9+10)
Exp/P100/S2/D10-10        2.28k ± 0%     2.13k ± 0%   -6.81%  (p=0.000 n=10+9)
Exp/P100/S2/D-10-10       2.19k ± 0%     2.05k ± 0%   -6.66%  (p=0.001 n=8+9)
Exp/P100/S2/D2-10         2.21k ± 0%     2.06k ± 0%   -6.61%  (p=0.000 n=9+10)
Ln/P100/S-10/D2-10        6.00k ± 0%     5.65k ± 0%   -5.87%  (p=0.000 n=10+9)
Ln/P100/S100/D2-10        6.25k ± 0%     5.89k ± 0%   -5.80%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         5.79k ± 0%     5.46k ± 0%   -5.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      6.04k ± 0%     5.69k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         6.21k ± 0%     5.86k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        6.09k ± 0%     5.74k ± 0%   -5.73%  (p=0.000 n=10+8)
Exp/P100/S-4/D100-10        855 ± 0%       806 ± 0%   -5.73%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       5.62k ± 0%     5.30k ± 0%   -5.72%  (p=0.000 n=10+8)
Ln/P100/S-100/D2-10       6.34k ± 0%     5.98k ± 0%   -5.68%  (p=0.000 n=10+10)
Ln/P100/S10/D100-10       5.73k ± 0%     5.41k ± 0%   -5.67%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       6.08k ± 0%     5.74k ± 0%   -5.66%  (p=0.000 n=9+8)
Ln/P100/S10/D2-10         6.13k ± 0%     5.78k ± 0%   -5.66%  (p=0.000 n=8+10)
Ln/P100/S-10/D100-10      5.69k ± 0%     5.37k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S10/D10-10        5.74k ± 0%     5.41k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S-100/D10-10      5.55k ± 0%     5.23k ± 0%   -5.63%  (p=0.000 n=10+7)
Ln/P100/S2/D2-10          5.86k ± 0%     5.53k ± 0%   -5.62%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10       893 ± 0%       843 ± 0%   -5.60%  (p=0.000 n=10+10)
Ln/P100/S-100/D100-10     5.73k ± 0%     5.41k ± 0%   -5.54%  (p=0.000 n=10+10)
Exp/P100/S-1/D100-10      1.68k ± 0%     1.58k ± 0%   -5.52%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        5.40k ± 0%     5.10k ± 0%   -5.49%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       5.77k ± 0%     5.45k ± 0%   -5.49%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10     1.68k ± 0%     1.59k ± 0%   -5.41%  (p=0.000 n=10+10)
Exp/P100/S2/D100-10       2.61k ± 0%     2.48k ± 0%   -5.03%  (p=0.000 n=8+10)
Exp/P100/S2/D-100-10      2.47k ± 0%     2.35k ± 0%   -4.90%  (p=0.000 n=10+10)
GDA/multiply-10             931 ± 0%       889 ± 0%   -4.51%  (p=0.000 n=10+10)
GDA/exp-10                 911k ± 0%      895k ± 0%   -1.75%  (p=0.000 n=10+10)
GDA/base-10               2.11k ± 0%     2.11k ± 0%     ~     (all equal)
GDA/comparetotal-10         254 ± 0%       254 ± 0%     ~     (all equal)
```
@nvanbenschoten
Copy link
Member Author

I think we can simplify this even further by creating a wrapper around big.Int and pushing the inline value optimization into that wrapper. Something along the lines of:

const inlineWords = 1 // or maybe 2

type BigInt struct {
	inner  big.Int
	inline [inlineWords]big.Word
}

func (b *BigInt) lazyInit() {
	if b.inner.Bits() == nil {
		b.inline = [inlineWords]big.Word{}
		b.inner.SetBits(b.inline[:0])
	}
}

func (b *BigInt) Add(x, y *BigInt) *BigInt {
	b.lazyInit()
	b.inner.Add(&x.inner, &y.inner)
	return b
}

...

type Decimal struct {
	Form     Form
	Negative bool
	Exponent int32
	Coeff    BigInt
}

We could then re-implement the big.Int API on this wrapper and add calls to (*BigInt).lazyInit where appropriate. That would probably be more lines of code, but it would mostly be boilerplate and would keep the optimization confined to a small layer below Decimal and above big.Int.

This BigInt wrapper may then be useful in other contexts. For instance, it would provide the inline optimization to intermediate results in this library, which are currently allocation heavy (see benchmark results above). It may also be useful in CRDB code.

@nvanbenschoten nvanbenschoten changed the title apd: embed small coefficient values in Decimal struct [DNM] apd: embed small coefficient values in Decimal struct Jan 3, 2022
@nvanbenschoten
Copy link
Member Author

I'm closing this in favor of #102. It turns out that the self-referential pointer in this commit was leading to all stack-allocated Decimals escaping to the heap. In #102, we address this while also speeding up most intermediate computation in the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant