Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for unbounded integrals #20

Merged
merged 11 commits into from
Jul 5, 2024

Conversation

wismill
Copy link
Contributor

@wismill wismill commented Apr 21, 2024

Currently this package supports only finite integrals, but Integer and Natural are also common.

Add support for unbounded integrals:

  • Decimal
  • Hexadecimal: only add a note, as their use case is not clear.

Also add related tests and benchmarks.

@wismill wismill marked this pull request as draft April 21, 2024 09:25
@wismill
Copy link
Contributor Author

wismill commented Apr 21, 2024

Currently a draft for early feedback. Although the benchmark results are already quite good, the current implementation allocates too much. I have some idea for a better implementation.

Note that I wanted to try a newtype for Integer and write a FiniteBits instance, but it is too tricky and does not benefit from optimizations.

Hexadecimal formatting pending.

@wismill
Copy link
Contributor Author

wismill commented Apr 21, 2024

@Bodigrim Is it OK to depend on ghc-bignum?

@Bodigrim
Copy link
Owner

@Bodigrim Is it OK to depend on ghc-bignum?

That's fine.

@wismill wismill force-pushed the integral/unbounded branch 2 times, most recently from 9154f9a to 7a4d92d Compare April 21, 2024 18:19
@wismill
Copy link
Contributor Author

wismill commented Apr 21, 2024

Fixed exactIntegerDecLen being slow.
Attempted a faster algorithm for unsafePrependUnboundedDec inspired by fast-digits, but it does not improve performance 😅

@wismill
Copy link
Contributor Author

wismill commented May 8, 2024

@Bodigrim new implementation. I am quite happy with the benchmark:

Benchmark results GHC 9.2.8, Linux, 8 × AMD Ryzen 5 2500U
All
  Decimal
    Unbounded
      Small
        1
          Data.Text.Lazy.Builder:   OK
            263  ns ± 5.4 ns, 807 B  allocated,   0 B  copied,  15 MB peak memory
          Data.ByteString.Builder:  OK
            491  ns ±  18 ns, 4.9 KB allocated,   1 B  copied,  18 MB peak memory, 1.87x
          Text.Builder:             OK
            1.85 μs ±  93 ns, 6.8 KB allocated,   3 B  copied,  18 MB peak memory, 7.03x
          ByteString.StrictBuilder: OK
            653  ns ±  32 ns, 2.5 KB allocated,   1 B  copied,  18 MB peak memory, 2.48x
          Data.Text.Builder.Linear: OK
            160  ns ± 8.1 ns, 486 B  allocated,   0 B  copied,  18 MB peak memory, 0.61x
        10
          Data.Text.Lazy.Builder:   OK
            2.63 μs ±  83 ns, 5.9 KB allocated,   3 B  copied,  18 MB peak memory
          Data.ByteString.Builder:  OK
            2.15 μs ±  86 ns, 7.9 KB allocated,   1 B  copied,  19 MB peak memory, 0.82x
          Text.Builder:             OK
            18.9 μs ± 847 ns,  71 KB allocated, 103 B  copied,  19 MB peak memory, 7.19x
          ByteString.StrictBuilder: OK
            8.15 μs ± 437 ns,  24 KB allocated,  50 B  copied,  19 MB peak memory, 3.10x
          Data.Text.Builder.Linear: OK
            1.25 μs ±  61 ns, 3.2 KB allocated,   1 B  copied,  19 MB peak memory, 0.47x
        100
          Data.Text.Lazy.Builder:   OK
            227  μs ± 3.3 μs, 795 KB allocated, 1.9 KB copied,  19 MB peak memory
          Data.ByteString.Builder:  OK
            32.5 μs ± 1.6 μs,  60 KB allocated, 113 B  copied,  19 MB peak memory, 0.14x
          Text.Builder:             OK
            221  μs ±  13 μs, 752 KB allocated, 8.4 KB copied,  19 MB peak memory, 0.97x
          ByteString.StrictBuilder: OK
            95.5 μs ± 4.8 μs, 267 KB allocated, 4.6 KB copied,  19 MB peak memory, 0.42x
          Data.Text.Builder.Linear: OK
            24.4 μs ± 1.2 μs,  47 KB allocated,  25 B  copied,  19 MB peak memory, 0.11x
        1000
          Data.Text.Lazy.Builder:   OK
            4.48 ms ±  88 μs, 8.4 MB allocated, 474 KB copied,  19 MB peak memory
          Data.ByteString.Builder:  OK
            371  μs ±  11 μs, 704 KB allocated,  12 KB copied,  19 MB peak memory, 0.08x
          Text.Builder:             OK
            4.00 ms ± 143 μs, 7.8 MB allocated, 885 KB copied,  19 MB peak memory, 0.89x
          ByteString.StrictBuilder: OK
            1.49 ms ±  42 μs, 2.8 MB allocated, 518 KB copied,  19 MB peak memory, 0.33x
          Data.Text.Builder.Linear: OK
            256  μs ± 8.0 μs, 431 KB allocated,  92 B  copied,  19 MB peak memory, 0.06x
        10000
          Data.Text.Lazy.Builder:   OK
            76.1 ms ± 4.2 ms,  85 MB allocated,  11 MB copied,  21 MB peak memory
          Data.ByteString.Builder:  OK
            6.01 ms ± 206 μs, 6.6 MB allocated, 1.2 MB copied,  21 MB peak memory, 0.08x
          Text.Builder:             OK
            80.1 ms ± 4.0 ms,  82 MB allocated,  17 MB copied,  37 MB peak memory, 1.05x
          ByteString.StrictBuilder: OK
            93.7 ms ± 2.2 ms,  29 MB allocated,  29 MB copied,  48 MB peak memory, 1.23x
          Data.Text.Builder.Linear: OK
            2.84 ms ± 123 μs, 5.2 MB allocated, 877 B  copied,  48 MB peak memory, 0.04x
        100000
          Data.Text.Lazy.Builder:   OK
            725  ms ±  41 ms, 861 MB allocated, 119 MB copied,  57 MB peak memory
          Data.ByteString.Builder:  OK
            131  ms ± 2.0 ms,  67 MB allocated,  30 MB copied,  63 MB peak memory, 0.18x
          Text.Builder:             OK
            779  ms ± 6.2 ms, 870 MB allocated, 179 MB copied, 249 MB peak memory, 1.07x
          ByteString.StrictBuilder: OK
            1.083 s ±  18 ms, 323 MB allocated, 329 MB copied, 514 MB peak memory, 1.49x
          Data.Text.Builder.Linear: OK
            46.2 ms ± 757 μs,  50 MB allocated,  11 KB copied, 514 MB peak memory, 0.06x
      Big
        1
          Data.Text.Lazy.Builder:   OK
            23.1 μs ± 790 ns,  83 KB allocated,  51 B  copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            3.19 μs ±  68 ns, 8.4 KB allocated,   1 B  copied, 514 MB peak memory, 0.14x
          Text.Builder:             OK
            27.9 μs ± 955 ns,  90 KB allocated, 194 B  copied, 514 MB peak memory, 1.21x
          ByteString.StrictBuilder: OK
            15.8 μs ± 908 ns,  44 KB allocated,  81 B  copied, 514 MB peak memory, 0.68x
          Data.Text.Builder.Linear: OK
            1.86 μs ±  87 ns, 3.4 KB allocated,   2 B  copied, 514 MB peak memory, 0.08x
        10
          Data.Text.Lazy.Builder:   OK
            241  μs ± 8.5 μs, 824 KB allocated, 1.2 KB copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            27.6 μs ± 1.0 μs,  40 KB allocated,  37 B  copied, 514 MB peak memory, 0.11x
          Text.Builder:             OK
            287  μs ± 7.4 μs, 906 KB allocated, 5.5 KB copied, 514 MB peak memory, 1.19x
          ByteString.StrictBuilder: OK
            163  μs ± 6.7 μs, 438 KB allocated, 7.8 KB copied, 514 MB peak memory, 0.68x
          Data.Text.Builder.Linear: OK
            20.8 μs ± 765 ns,  34 KB allocated,  11 B  copied, 514 MB peak memory, 0.09x
        100
          Data.Text.Lazy.Builder:   OK
            3.69 ms ± 162 μs, 8.1 MB allocated, 310 KB copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            282  μs ± 9.9 μs, 467 KB allocated, 2.4 KB copied, 514 MB peak memory, 0.08x
          Text.Builder:             OK
            4.91 ms ± 223 μs, 8.9 MB allocated, 935 KB copied, 514 MB peak memory, 1.33x
          ByteString.StrictBuilder: OK
            2.34 ms ± 112 μs, 4.3 MB allocated, 794 KB copied, 514 MB peak memory, 0.63x
          Data.Text.Builder.Linear: OK
            200  μs ± 7.9 μs, 353 KB allocated,  90 B  copied, 514 MB peak memory, 0.05x
        1000
          Data.Text.Lazy.Builder:   OK
            59.3 ms ± 3.5 ms,  80 MB allocated, 8.5 MB copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            3.15 ms ± 132 μs, 4.2 MB allocated, 241 KB copied, 514 MB peak memory, 0.05x
          Text.Builder:             OK
            71.5 ms ± 3.0 ms,  89 MB allocated,  15 MB copied, 514 MB peak memory, 1.21x
          ByteString.StrictBuilder: OK
            109  ms ± 629 μs,  43 MB allocated,  30 MB copied, 514 MB peak memory, 1.84x
          Data.Text.Builder.Linear: OK
            2.00 ms ±  71 μs, 3.4 MB allocated, 596 B  copied, 514 MB peak memory, 0.03x
        10000
          Data.Text.Lazy.Builder:   OK
            542  ms ±  29 ms, 806 MB allocated,  92 MB copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            57.8 ms ± 2.4 ms,  43 MB allocated, 7.1 MB copied, 514 MB peak memory, 0.11x
          Text.Builder:             OK
            728  ms ± 1.5 ms, 902 MB allocated, 152 MB copied, 514 MB peak memory, 1.34x
          ByteString.StrictBuilder: OK
            1.193 s ±  24 ms, 436 MB allocated, 323 MB copied, 514 MB peak memory, 2.20x
          Data.Text.Builder.Linear: OK
            43.1 ms ± 2.3 ms,  42 MB allocated, 8.6 KB copied, 514 MB peak memory, 0.08x
      Huge
        1
          Data.Text.Lazy.Builder:   OK
            262  μs ±  15 μs, 873 KB allocated, 1.3 KB copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            27.4 μs ± 1.2 μs,  38 KB allocated,  40 B  copied, 514 MB peak memory, 0.10x
          Text.Builder:             OK
            630  μs ±  20 μs, 1.6 MB allocated,  24 KB copied, 514 MB peak memory, 2.40x
          ByteString.StrictBuilder: OK
            502  μs ±  21 μs, 1.1 MB allocated,  17 KB copied, 514 MB peak memory, 1.92x
          Data.Text.Builder.Linear: OK
            29.8 μs ± 1.3 μs,  40 KB allocated,  11 B  copied, 514 MB peak memory, 0.11x
        10
          Data.Text.Lazy.Builder:   OK
            3.79 ms ± 187 μs, 8.4 MB allocated, 316 KB copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            289  μs ± 7.7 μs, 450 KB allocated, 2.0 KB copied, 514 MB peak memory, 0.08x
          Text.Builder:             OK
            9.68 ms ± 570 μs,  16 MB allocated, 1.4 MB copied, 514 MB peak memory, 2.55x
          ByteString.StrictBuilder: OK
            9.82 ms ± 224 μs,  11 MB allocated, 2.0 MB copied, 514 MB peak memory, 2.59x
          Data.Text.Builder.Linear: OK
            304  μs ±  11 μs, 415 KB allocated, 150 B  copied, 514 MB peak memory, 0.08x
        100
          Data.Text.Lazy.Builder:   OK
            61.3 ms ± 933 μs,  85 MB allocated, 9.0 MB copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            3.12 ms ± 121 μs, 4.1 MB allocated, 182 KB copied, 514 MB peak memory, 0.05x
          Text.Builder:             OK
            105  ms ± 4.3 ms, 161 MB allocated,  17 MB copied, 514 MB peak memory, 1.71x
          ByteString.StrictBuilder: OK
            133  ms ± 6.3 ms, 110 MB allocated,  32 MB copied, 514 MB peak memory, 2.18x
          Data.Text.Builder.Linear: OK
            3.01 ms ±  43 μs, 4.2 MB allocated, 1.5 KB copied, 514 MB peak memory, 0.05x
        1000
          Data.Text.Lazy.Builder:   OK
            560  ms ± 9.3 ms, 851 MB allocated,  95 MB copied, 514 MB peak memory
          Data.ByteString.Builder:  OK
            55.2 ms ± 3.1 ms,  41 MB allocated, 5.3 MB copied, 514 MB peak memory, 0.10x
          Text.Builder:             OK
            1.057 s ±  44 ms, 1.6 GB allocated, 166 MB copied, 514 MB peak memory, 1.89x
          ByteString.StrictBuilder: OK
            1.518 s ±  49 ms, 1.1 GB allocated, 323 MB copied, 514 MB peak memory, 2.71x
          Data.Text.Builder.Linear: OK
            39.9 ms ± 1.5 ms,  40 MB allocated,  14 KB copied, 514 MB peak memory, 0.07x

This is not optimal, because it does not scale as good as the ByteString builder, but it does beat it both in time and space for numbers < 1019265. For 10900, it is 2 times faster. So I guess this is quite acceptable.

Of course these numbers are not crazily big, but I wonder if there are real use cases with bigger numbers.

@Bodigrim
Copy link
Owner

Bodigrim commented May 8, 2024

This is not optimal, because it does not scale as good as the ByteString builder,

(I have not looked at the implementation yet) What's the reason it does not scale as good as bytestring?

@wismill
Copy link
Contributor Author

wismill commented May 10, 2024

Fixed detailed benchmarks: finally they are not as good as I thought… I realized that the simple benchmark had differences with the detailed one for the config. I needed to inline the benchmark builder to get consistency between the two benchmarks. Benchmarking is so tricky! It seems this implementation still beats ByteString builder < 101450 on my machine.

What's the reason it does not scale as good as bytestring?

@Bodigrim it’s because of the same reason than previous comment highlighted, although at a different scale.

The question is: does it matter? I am not satisfied with a suboptimal solution, but on the other hand, are there use cases to format really huge numbers and where the perf of the formatting matters? I mean, the current Text builder is even worst and is still widely used.

I will investigate other options, now that the benchmarks seems to display correct results.

I did try to implement the bytestring algorithm, but was not convinced by the perf.

@raehik
Copy link

raehik commented May 10, 2024

(Aside: My current workaround for printing Integers and Naturals is shoving in a Show constraint and doing fromString (show n), which feels kinda bad. I moved to linear-text-builder for performance & ergonomics, so I'd love to see this get merged.)

@Bodigrim
Copy link
Owner

The question is: does it matter? I am not satisfied with a suboptimal solution, but on the other hand, are there use cases to format really huge numbers and where the perf of the formatting matters?

It is not a blocker indeed.

@wismill
Copy link
Contributor Author

wismill commented May 16, 2024

So after reading the Core dump, it seems that in the specific benchmark for unbounded integers:

  • Data.ByteString.Builder: in B.integerDec i <> (acc <> B.integerDec i), B.integerDec i is shared.
  • Data.Text.Builder.Linear: in i $$<| (acc |>$$ i) work is not shared, even with different implementation. Using Builder instead of Buffer does not fix that.
Example of benchmark result using the patterns `-p unbounded -p Huge1`
All
  Decimal: detailed unbounded
    Append
      Huge1
        1
          Data.ByteString.Builder:  OK
            19.1 μs ± 406 ns,  32 KB allocated,  14 B  copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            10.1 μs ± 457 ns,  11 KB allocated,  10 B  copied,  20 MB peak memory, 0.53x
        10
          Data.ByteString.Builder:  OK
            191  μs ±  10 μs, 324 KB allocated, 187 B  copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            105  μs ± 1.9 μs, 115 KB allocated,  49 B  copied,  20 MB peak memory, 0.55x
        100
          Data.ByteString.Builder:  OK
            1.85 ms ±  28 μs, 2.9 MB allocated, 3.7 KB copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            1.09 ms ±  58 μs, 1.1 MB allocated, 331 B  copied,  20 MB peak memory, 0.59x
    Prepend
      Huge1
        1
          Data.ByteString.Builder:  OK
            19.4 μs ± 657 ns,  32 KB allocated,  17 B  copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            10.1 μs ± 179 ns,  11 KB allocated,  10 B  copied,  20 MB peak memory, 0.52x
        10
          Data.ByteString.Builder:  OK
            188  μs ± 4.1 μs, 325 KB allocated, 177 B  copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            104  μs ± 2.3 μs, 113 KB allocated,  63 B  copied,  20 MB peak memory, 0.55x
        100
          Data.ByteString.Builder:  OK
            1.86 ms ±  22 μs, 2.9 MB allocated, 3.5 KB copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            1.04 ms ±  58 μs, 1.1 MB allocated, 361 B  copied,  20 MB peak memory, 0.56x
    Both
      Huge1
        1
          Data.ByteString.Builder:  OK
            21.5 μs ± 768 ns,  30 KB allocated,  19 B  copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            20.8 μs ± 491 ns,  25 KB allocated,  19 B  copied,  20 MB peak memory, 0.97x
        10
          Data.ByteString.Builder:  OK
            221  μs ± 7.2 μs, 330 KB allocated, 1.2 KB copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            220  μs ±  12 μs, 258 KB allocated,  70 B  copied,  20 MB peak memory, 1.00x
        100
          Data.ByteString.Builder:  OK
            2.35 ms ±  68 μs, 3.2 MB allocated, 106 KB copied,  20 MB peak memory
          Data.Text.Builder.Linear: OK
            2.23 ms ±  47 μs, 2.7 MB allocated, 543 B  copied,  23 MB peak memory, 0.95x

So while I can find implementations that beat ByteString builder for either prepending or appending, it seems that doing both cannot perform better that the ByteString builder after reaching big enough numbers.

Do you have some advice to improve the situation and enable sharing to work? Or is it just a corner case and we should use a fairer benchmark?

@wismill
Copy link
Contributor Author

wismill commented May 16, 2024

Using the following function makes the benchmark a bit fairer:

benchUnboundedLinearBuilder  Integer  Int  T.Text
benchUnboundedLinearBuilder k m = runBuffer (\b  go b m)
  where
    go  Buffer  Int  Buffer
    go !acc 0 = acc
    go !acc n = case newEmptyBuffer acc of
      (# acc', e #)  case dupBuffer ((fromIntegral n * k) $$<| e) of
        (# l, r #)  go (l >< (acc' >< r)) (n - 1)

@Bodigrim
Copy link
Owner

Do you have some advice to improve the situation and enable sharing to work? Or is it just a corner case and we should use a fairer benchmark?

Yeah, that's a corner case. I'd change the benchmark to i $$<| (acc |>$$ (i + 1)) or similar.

@wismill wismill force-pushed the integral/unbounded branch 2 times, most recently from 6787e44 to b962082 Compare May 24, 2024 09:17
Copy link
Contributor Author

@wismill wismill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New implementation. Faster that ByteString with also much fewer allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark file is probably too detailed once we get the implementation right.

Comment on lines +30 to +31
-- textBenchName = "Data.Text.Lazy.Builder"
textBenchName = "Data.ByteString.Builder"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found easier to compare to ByteString, but I can revert it (or just comment) because this package is primarily about Text.

Comment on lines +251 to +255
-- | Low-level routine to append data of unknown size to a 'Buffer', giving
-- the action the choice between two strategies.
--
-- See also: 'appendBounded'.
appendBounded'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal new helper. If deemed a useful addition to the API, maybe we need a better name.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data.Text.Builder.Linear.Core is a public module, so it's not that internal. Maybe we can put the helper somewhere else to avoid nameshedding?

Comment on lines +345 to +348
-- | Low-level routine to prepend data of unknown size to a 'Buffer'.
--
-- Contrary to 'prependBounded', only use a prepend action.
prependBounded'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal new helper. If deemed a useful addition to the API, maybe we need a better name.

Useful function when the prepender is always faster that the appender.

src/Data/Text/Builder/Linear/Dec/Bounded.hs Show resolved Hide resolved
Comment on lines 104 to 118
-- TODO: Remove once we choose a strategy
data Thresholds = SmallOnly | BigOnly | HugeOnly | Optimum

-- Use the fastest writer depending on the BigNat size
unsafePrependBigNatDec ∷ ∀ s. A.MArray s → DigitsWriter s
unsafePrependBigNatDec marr !off0 !n0
| BN.bigNatSize n0 < bigSizeThreshold = prependSmallNat marr off0 n0
| BN.bigNatSize n0 < hugeSizeThreshold = prependBigNat marr off0 n0
| otherwise = prependHugeNat marr off0 n0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented 3 strategies to get the maximum perf.

These functions perform better (time, space) than the ByteString builder for at least:

  • prependSmallNat: n < 1e2000
  • prependBigNat: n < 1e20000
  • prependBigNat: always??

While better implementations are possible, for huge numbers the bottleneck is bigNatQuotRem#

But 3 strategies may be too much.

I see 2 main orientations if we want to simplify the code:

  • Be fast only for integers of realistic size: keep prependBigNat and possibly prependSmallNat.
  • Be fast and the only Text builder that scales as good as the ByteString one: keep prependHugeNat and possibly prependBigNat.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with using all three strategies. If future maintainers find it burdensome, they can always cut it down :)

SmallOnly → (maxBound, maxBound)
BigOnly → (minBound, maxBound)
HugeOnly → (minBound, minBound)
Optimum → (25, 400)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thresholds were found empirically, so they are subject to caution. It’s valid on my machine (Linux, 64bits), but what about other arch (32bits), OS?

Note that these thresholds are on the BigNat# size in words.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely that aarch64 bounds might be different. I have M2 machine; is there a way for me to figure thresholds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the following “method”: change the line threshold = Optimum to the appropriate value and run the detailed benchmark with filters, e.g. -p "detailed unbounded" -p Both -p Big. You may also want to choose a specific count, e.g. -p '$5 == "10".

This is clearly not satisfying; a rigorous asymptotic analysis would be better. Guidance welcome!

Note that threshold :: Thresholds will be removed (or at least commented) in the final implementation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine you could use fits to determine asymptotics and constant factors for each method, then solve for optimal points to switch between algorithms. If you happen to craft such program, I'd be happy to run it on aarch64.

(I'm on vacation, so overlazy to do much work myself, sorry :)

Copy link
Contributor Author

@wismill wismill May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use case for fits indeed!

Here my take to check the complexity using the number of decimal digits:

let s = BigOnly; mi = 1000; ma=10000 :: Word; n = toInteger @Word (maxBound - 1) ^ ma in F.fits $ F.mkFitConfig (\p -> runBuffer (\b -> prependUnboundedDecimal s (rem n (10 ^ p)) b)) (mi, ma)

Comment on lines +320 to +342
-- NOTE: ensure to not inline the following numbers, in order to avoid allocations.

tenPower18 ∷ N.Natural
tenPower18 = 1e18
{-# NOINLINE tenPower18 #-}

tenPower38 ∷ N.Natural
tenPower38 = 1e38
{-# NOINLINE tenPower38 #-}
Copy link
Contributor Author

@wismill wismill May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aim is to avoid to compute and allocate poweredBase² each time we use the function, but now that I review it I am not sure this is a good idea.

@wismill
Copy link
Contributor Author

wismill commented May 24, 2024

New (fair) benchmark
All
  Decimal: detailed unbounded
    Both
      Small
        1
          Data.ByteString.Builder:  OK
            487  ns ± 7.8 ns, 4.9 KB allocated,   3 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            148  ns ± 1.4 ns, 375 B  allocated,   0 B  copied,  28 MB peak memory, 0.30x
        10
          Data.ByteString.Builder:  OK
            2.17 μs ±  25 ns, 8.5 KB allocated,   6 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            1.07 μs ±  20 ns, 2.2 KB allocated,   3 B  copied,  28 MB peak memory, 0.49x
        100
          Data.ByteString.Builder:  OK
            44.5 μs ± 441 ns,  84 KB allocated, 374 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            22.4 μs ± 123 ns,  37 KB allocated,  65 B  copied,  28 MB peak memory, 0.50x
      Big01
        1
          Data.ByteString.Builder:  OK
            1.49 μs ±  28 ns, 6.3 KB allocated,   8 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            369  ns ± 6.1 ns, 678 B  allocated,   1 B  copied,  28 MB peak memory, 0.25x
        10
          Data.ByteString.Builder:  OK
            12.1 μs ± 110 ns,  23 KB allocated,  42 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            4.01 μs ±  26 ns, 6.2 KB allocated,  16 B  copied,  28 MB peak memory, 0.33x
        100
          Data.ByteString.Builder:  OK
            116  μs ± 892 ns, 227 KB allocated, 745 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            43.1 μs ± 594 ns,  61 KB allocated,  89 B  copied,  28 MB peak memory, 0.37x
      Big02
        1
          Data.ByteString.Builder:  OK
            2.82 μs ±  40 ns, 8.1 KB allocated,   8 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            787  ns ±  13 ns, 1.4 KB allocated,   3 B  copied,  28 MB peak memory, 0.28x
        10
          Data.ByteString.Builder:  OK
            25.4 μs ± 233 ns,  41 KB allocated, 109 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            8.80 μs ±  89 ns,  14 KB allocated,  43 B  copied,  28 MB peak memory, 0.35x
        100
          Data.ByteString.Builder:  OK
            256  μs ± 3.4 μs, 421 KB allocated, 1.8 KB copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            86.5 μs ± 1.5 μs, 143 KB allocated, 203 B  copied,  28 MB peak memory, 0.34x
      Big03
        1
          Data.ByteString.Builder:  OK
            5.10 μs ±  90 ns,  12 KB allocated,  21 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            1.61 μs ±  20 ns, 3.0 KB allocated,   9 B  copied,  28 MB peak memory, 0.32x
        10
          Data.ByteString.Builder:  OK
            46.2 μs ± 383 ns,  74 KB allocated, 214 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            17.8 μs ± 300 ns,  31 KB allocated,  74 B  copied,  28 MB peak memory, 0.38x
        100
          Data.ByteString.Builder:  OK
            474  μs ± 3.2 μs, 805 KB allocated, 4.7 KB copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            174  μs ± 2.2 μs, 316 KB allocated, 746 B  copied,  28 MB peak memory, 0.37x
      Big04
        1
          Data.ByteString.Builder:  OK
            7.63 μs ±  94 ns,  16 KB allocated,  52 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            2.67 μs ±  50 ns, 4.9 KB allocated,  25 B  copied,  28 MB peak memory, 0.35x
        10
          Data.ByteString.Builder:  OK
            75.0 μs ± 674 ns, 155 KB allocated, 546 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            28.7 μs ± 211 ns,  51 KB allocated, 142 B  copied,  28 MB peak memory, 0.38x
        100
          Data.ByteString.Builder:  OK
            734  μs ±  11 μs, 1.2 MB allocated, 8.5 KB copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            280  μs ± 3.4 μs, 524 KB allocated, 1.4 KB copied,  28 MB peak memory, 0.38x
      Big05
        1
          Data.ByteString.Builder:  OK
            9.87 μs ± 189 ns,  19 KB allocated,  69 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            3.99 μs ±  15 ns, 7.3 KB allocated,  44 B  copied,  28 MB peak memory, 0.40x
        10
          Data.ByteString.Builder:  OK
            94.4 μs ± 1.5 μs, 184 KB allocated, 718 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            41.3 μs ± 418 ns,  75 KB allocated, 240 B  copied,  28 MB peak memory, 0.44x
        100
          Data.ByteString.Builder:  OK
            924  μs ±  13 μs, 1.5 MB allocated,  12 KB copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            404  μs ± 5.9 μs, 770 KB allocated, 2.1 KB copied,  28 MB peak memory, 0.44x
      Big06
        1
          Data.ByteString.Builder:  OK
            11.6 μs ± 182 ns,  22 KB allocated,  91 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            5.48 μs ±  69 ns, 7.7 KB allocated,  48 B  copied,  28 MB peak memory, 0.47x
        10
          Data.ByteString.Builder:  OK
            115  μs ± 726 ns, 220 KB allocated, 922 B  copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            55.4 μs ± 880 ns,  81 KB allocated, 248 B  copied,  28 MB peak memory, 0.48x
        100
          Data.ByteString.Builder:  OK
            1.12 ms ± 8.4 μs, 1.8 MB allocated,  15 KB copied,  28 MB peak memory
          Data.Text.Builder.Linear: OK
            537  μs ± 5.5 μs, 829 KB allocated, 2.3 KB copied,  29 MB peak memory, 0.48x
      Big07
        1
          Data.ByteString.Builder:  OK
            15.1 μs ± 197 ns,  27 KB allocated, 127 B  copied,  29 MB peak memory
          Data.Text.Builder.Linear: OK
            6.51 μs ± 113 ns, 9.2 KB allocated,  63 B  copied,  29 MB peak memory, 0.43x
        10
          Data.ByteString.Builder:  OK
            149  μs ± 1.9 μs, 274 KB allocated, 1.2 KB copied,  29 MB peak memory
          Data.Text.Builder.Linear: OK
            66.9 μs ± 1.3 μs,  96 KB allocated, 310 B  copied,  29 MB peak memory, 0.45x
        100
          Data.ByteString.Builder:  OK
            1.47 ms ±  22 μs, 2.4 MB allocated,  22 KB copied,  29 MB peak memory
          Data.Text.Builder.Linear: OK
            655  μs ± 3.6 μs, 991 KB allocated, 2.6 KB copied,  29 MB peak memory, 0.44x
      Big08
        1
          Data.ByteString.Builder:  OK
            17.3 μs ± 272 ns,  30 KB allocated, 127 B  copied,  29 MB peak memory
          Data.Text.Builder.Linear: OK
            8.11 μs ±  77 ns,  11 KB allocated,  43 B  copied,  29 MB peak memory, 0.47x
        10
          Data.ByteString.Builder:  OK
            173  μs ± 968 ns, 305 KB allocated, 1.5 KB copied,  29 MB peak memory
          Data.Text.Builder.Linear: OK
            81.2 μs ± 1.3 μs, 114 KB allocated, 344 B  copied,  29 MB peak memory, 0.47x
        100
          Data.ByteString.Builder:  OK
            1.71 ms ±  33 μs, 2.7 MB allocated,  26 KB copied,  29 MB peak memory
          Data.Text.Builder.Linear: OK
            800  μs ± 6.7 μs, 1.1 MB allocated, 3.2 KB copied,  30 MB peak memory, 0.47x
      Big09
        1
          Data.ByteString.Builder:  OK
            19.4 μs ± 251 ns,  33 KB allocated, 156 B  copied,  30 MB peak memory
          Data.Text.Builder.Linear: OK
            9.38 μs ± 126 ns,  13 KB allocated,  49 B  copied,  30 MB peak memory, 0.48x
        10
          Data.ByteString.Builder:  OK
            194  μs ± 2.2 μs, 340 KB allocated, 1.7 KB copied,  30 MB peak memory
          Data.Text.Builder.Linear: OK
            95.1 μs ± 744 ns, 133 KB allocated, 435 B  copied,  30 MB peak memory, 0.49x
        100
          Data.ByteString.Builder:  OK
            1.92 ms ±  30 μs, 3.0 MB allocated,  33 KB copied,  30 MB peak memory
          Data.Text.Builder.Linear: OK
            949  μs ±  17 μs, 1.3 MB allocated, 4.5 KB copied,  31 MB peak memory, 0.49x
      Big10
        1
          Data.ByteString.Builder:  OK
            22.1 μs ± 396 ns,  37 KB allocated, 194 B  copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            11.1 μs ± 218 ns,  14 KB allocated,  65 B  copied,  31 MB peak memory, 0.50x
        10
          Data.ByteString.Builder:  OK
            218  μs ± 1.7 μs, 379 KB allocated, 2.0 KB copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            110  μs ± 1.1 μs, 149 KB allocated, 528 B  copied,  31 MB peak memory, 0.51x
        100
          Data.ByteString.Builder:  OK
            2.16 ms ±  32 μs, 3.4 MB allocated,  37 KB copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            1.09 ms ± 9.6 μs, 1.5 MB allocated, 4.9 KB copied,  31 MB peak memory, 0.51x
      Big11
        1
          Data.ByteString.Builder:  OK
            24.2 μs ± 351 ns,  40 KB allocated, 231 B  copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            12.7 μs ± 137 ns,  16 KB allocated,  85 B  copied,  31 MB peak memory, 0.52x
        10
          Data.ByteString.Builder:  OK
            243  μs ± 2.2 μs, 409 KB allocated, 2.4 KB copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            127  μs ± 1.3 μs, 168 KB allocated, 638 B  copied,  31 MB peak memory, 0.52x
        100
          Data.ByteString.Builder:  OK
            2.40 ms ±  43 μs, 3.7 MB allocated,  44 KB copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            1.26 ms ±  25 μs, 1.7 MB allocated, 6.0 KB copied,  31 MB peak memory, 0.53x
      Huge01
        1
          Data.ByteString.Builder:  OK
            39.5 μs ± 187 ns,  57 KB allocated, 347 B  copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            21.8 μs ± 221 ns,  26 KB allocated, 143 B  copied,  31 MB peak memory, 0.55x
        10
          Data.ByteString.Builder:  OK
            399  μs ± 6.2 μs, 593 KB allocated, 3.8 KB copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            220  μs ± 4.2 μs, 268 KB allocated, 1.0 KB copied,  31 MB peak memory, 0.55x
        100
          Data.ByteString.Builder:  OK
            4.32 ms ±  52 μs, 5.8 MB allocated, 107 KB copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            2.22 ms ±  23 μs, 2.7 MB allocated,  11 KB copied,  31 MB peak memory, 0.51x
      Huge02
        1
          Data.ByteString.Builder:  OK
            51.2 μs ± 500 ns,  72 KB allocated, 440 B  copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            28.7 μs ± 404 ns,  37 KB allocated, 156 B  copied,  31 MB peak memory, 0.56x
        10
          Data.ByteString.Builder:  OK
            532  μs ± 6.4 μs, 796 KB allocated, 5.3 KB copied,  31 MB peak memory
          Data.Text.Builder.Linear: OK
            289  μs ± 4.4 μs, 389 KB allocated, 1.6 KB copied,  31 MB peak memory, 0.54x
        100
          Data.ByteString.Builder:  OK
            6.52 ms ±  65 μs, 7.5 MB allocated, 261 KB copied,  32 MB peak memory
          Data.Text.Builder.Linear: OK
            2.94 ms ±  47 μs, 3.9 MB allocated,  19 KB copied,  32 MB peak memory, 0.45x
      Huge03
        1
          Data.ByteString.Builder:  OK
            124  μs ± 1.8 μs, 187 KB allocated, 1.1 KB copied,  32 MB peak memory
          Data.Text.Builder.Linear: OK
            78.5 μs ± 1.4 μs,  84 KB allocated, 441 B  copied,  32 MB peak memory, 0.63x
        10
          Data.ByteString.Builder:  OK
            1.23 ms ±  12 μs, 1.6 MB allocated,  14 KB copied,  32 MB peak memory
          Data.Text.Builder.Linear: OK
            795  μs ± 9.6 μs, 879 KB allocated, 4.5 KB copied,  32 MB peak memory, 0.64x
        100
          Data.ByteString.Builder:  OK
            17.5 ms ± 217 μs,  15 MB allocated, 1.3 MB copied,  43 MB peak memory
          Data.Text.Builder.Linear: OK
            9.31 ms ±  64 μs, 8.8 MB allocated,  43 KB copied,  43 MB peak memory, 0.53x
      Huge04
        1
          Data.ByteString.Builder:  OK
            223  μs ± 3.4 μs, 271 KB allocated, 1.7 KB copied,  43 MB peak memory
          Data.Text.Builder.Linear: OK
            154  μs ± 2.5 μs, 155 KB allocated, 917 B  copied,  43 MB peak memory, 0.69x
        10
          Data.ByteString.Builder:  OK
            2.23 ms ±  17 μs, 2.4 MB allocated,  24 KB copied,  43 MB peak memory
          Data.Text.Builder.Linear: OK
            1.57 ms ±  22 μs, 1.6 MB allocated,  10 KB copied,  43 MB peak memory, 0.70x
        100
          Data.ByteString.Builder:  OK
            31.9 ms ± 367 μs,  24 MB allocated, 2.2 MB copied,  43 MB peak memory
          Data.Text.Builder.Linear: OK
            18.8 ms ± 161 μs,  16 MB allocated,  98 KB copied,  44 MB peak memory, 0.59x
      Huge05
        1
          Data.ByteString.Builder:  OK
            299  μs ± 2.9 μs, 344 KB allocated, 2.3 KB copied,  44 MB peak memory
          Data.Text.Builder.Linear: OK
            250  μs ± 1.8 μs, 167 KB allocated, 870 B  copied,  44 MB peak memory, 0.84x
        10
          Data.ByteString.Builder:  OK
            2.99 ms ±  25 μs, 3.1 MB allocated,  33 KB copied,  44 MB peak memory
          Data.Text.Builder.Linear: OK
            2.53 ms ±  37 μs, 1.7 MB allocated, 9.3 KB copied,  44 MB peak memory, 0.85x
        100
          Data.ByteString.Builder:  OK
            41.7 ms ± 515 μs,  31 MB allocated, 3.2 MB copied,  44 MB peak memory
          Data.Text.Builder.Linear: OK
            28.3 ms ± 382 μs,  17 MB allocated,  90 KB copied,  45 MB peak memory, 0.68x
      Huge06
        1
          Data.ByteString.Builder:  OK
            475  μs ± 4.7 μs, 443 KB allocated, 3.0 KB copied,  45 MB peak memory
          Data.Text.Builder.Linear: OK
            322  μs ± 3.6 μs, 206 KB allocated, 1.0 KB copied,  45 MB peak memory, 0.68x
        10
          Data.ByteString.Builder:  OK
            4.75 ms ±  82 μs, 4.0 MB allocated,  30 KB copied,  45 MB peak memory
          Data.Text.Builder.Linear: OK
            3.28 ms ±  47 μs, 2.1 MB allocated,  10 KB copied,  45 MB peak memory, 0.69x
        100
          Data.ByteString.Builder:  OK
            63.0 ms ±  97 μs,  40 MB allocated, 3.7 MB copied,  45 MB peak memory
          Data.Text.Builder.Linear: OK
            38.3 ms ± 459 μs,  21 MB allocated, 102 KB copied,  47 MB peak memory, 0.61x
      Huge07
        1
          Data.ByteString.Builder:  OK
            661  μs ±  11 μs, 568 KB allocated, 4.1 KB copied,  47 MB peak memory
          Data.Text.Builder.Linear: OK
            576  μs ± 9.6 μs, 304 KB allocated, 1.5 KB copied,  47 MB peak memory, 0.87x
        10
          Data.ByteString.Builder:  OK
            6.77 ms ±  34 μs, 5.5 MB allocated,  46 KB copied,  47 MB peak memory
          Data.Text.Builder.Linear: OK
            5.84 ms ± 100 μs, 3.1 MB allocated,  15 KB copied,  47 MB peak memory, 0.86x
        100
          Data.ByteString.Builder:  OK
            87.3 ms ± 1.3 ms,  55 MB allocated, 5.4 MB copied,  47 MB peak memory
          Data.Text.Builder.Linear: OK
            64.1 ms ± 755 μs,  31 MB allocated, 159 KB copied,  54 MB peak memory, 0.73x
      Huge08
        1
          Data.ByteString.Builder:  OK
            1.22 ms ±  11 μs, 864 KB allocated, 6.0 KB copied,  54 MB peak memory
          Data.Text.Builder.Linear: OK
            883  μs ± 4.0 μs, 427 KB allocated, 2.2 KB copied,  54 MB peak memory, 0.72x
        10
          Data.ByteString.Builder:  OK
            13.4 ms ± 129 μs, 8.2 MB allocated, 184 KB copied,  54 MB peak memory
          Data.Text.Builder.Linear: OK
            9.00 ms ± 142 μs, 4.3 MB allocated,  22 KB copied,  54 MB peak memory, 0.67x
        100
          Data.ByteString.Builder:  OK
            154  ms ± 1.4 ms,  82 MB allocated, 7.8 MB copied,  54 MB peak memory
          Data.Text.Builder.Linear: OK
            96.0 ms ± 962 μs,  44 MB allocated, 221 KB copied,  55 MB peak memory, 0.62x
      Huge09
        1
          Data.ByteString.Builder:  OK
            4.90 ms ±  68 μs, 2.4 MB allocated,  18 KB copied,  55 MB peak memory
          Data.Text.Builder.Linear: OK
            4.46 ms ±  46 μs, 1.3 MB allocated, 6.2 KB copied,  55 MB peak memory, 0.91x
        10
          Data.ByteString.Builder:  OK
            59.1 ms ± 519 μs,  24 MB allocated, 1.9 MB copied,  55 MB peak memory
          Data.Text.Builder.Linear: OK
            47.6 ms ± 874 μs,  14 MB allocated,  59 KB copied,  55 MB peak memory, 0.80x
        100
          Data.ByteString.Builder:  OK
            579  ms ± 5.4 ms, 246 MB allocated,  25 MB copied,  60 MB peak memory
          Data.Text.Builder.Linear: OK
            459  ms ± 7.9 ms, 143 MB allocated, 599 KB copied, 117 MB peak memory, 0.79x
      Huge10
        1
          Data.ByteString.Builder:  OK
            10.0 ms ± 161 μs, 4.1 MB allocated,  29 KB copied, 117 MB peak memory
          Data.Text.Builder.Linear: OK
            9.37 ms ± 180 μs, 2.1 MB allocated, 9.1 KB copied, 117 MB peak memory, 0.93x
        10
          Data.ByteString.Builder:  OK
            117  ms ± 1.4 ms,  42 MB allocated, 3.5 MB copied, 117 MB peak memory
          Data.Text.Builder.Linear: OK
            98.7 ms ± 1.3 ms,  24 MB allocated, 126 KB copied, 117 MB peak memory, 0.85x
        100
          Data.ByteString.Builder:  OK
            1.159 s ± 8.8 ms, 420 MB allocated,  43 MB copied, 117 MB peak memory
          Data.Text.Builder.Linear: OK
            956  ms ± 2.0 ms, 251 MB allocated, 1.1 MB copied, 133 MB peak memory, 0.82x
      Huge11
        1
          Data.ByteString.Builder:  OK
            25.3 ms ± 189 μs, 8.4 MB allocated,  62 KB copied, 133 MB peak memory
          Data.Text.Builder.Linear: OK
            23.9 ms ± 241 μs, 4.8 MB allocated,  23 KB copied, 133 MB peak memory, 0.94x
        10
          Data.ByteString.Builder:  OK
            281  ms ± 5.4 ms,  84 MB allocated, 7.8 MB copied, 133 MB peak memory
          Data.Text.Builder.Linear: OK
            244  ms ± 3.1 ms,  50 MB allocated, 246 KB copied, 133 MB peak memory, 0.87x
        100
          Data.ByteString.Builder:  OK
            2.772 s ±  35 ms, 855 MB allocated,  86 MB copied, 144 MB peak memory
          Data.Text.Builder.Linear: OK
            2.411 s ± 6.6 ms, 518 MB allocated, 2.2 MB copied, 233 MB peak memory, 0.87x
      Huge12
        1
          Data.ByteString.Builder:  OK
            476  ms ± 3.9 ms,  88 MB allocated, 4.7 MB copied, 233 MB peak memory
          Data.Text.Builder.Linear: OK
            450  ms ± 5.2 ms,  52 MB allocated, 256 KB copied, 233 MB peak memory, 0.94x
        10
          Data.ByteString.Builder:  OK
            4.828 s ±  22 ms, 894 MB allocated,  83 MB copied, 233 MB peak memory
          Data.Text.Builder.Linear: OK
            4.473 s ±  81 ms, 545 MB allocated, 2.3 MB copied, 233 MB peak memory, 0.93x
        100
          Data.ByteString.Builder:  This benchmark takes more than 100 seconds. Consider setting --timeout, if this is unexpected (or to silence this warning).
                                    OK
            48.086 s ± 417 ms, 8.7 GB allocated, 883 MB copied, 1.2 GB peak memory
          Data.Text.Builder.Linear: This benchmark takes more than 100 seconds. Consider setting --timeout, if this is unexpected (or to silence this warning).
                                    OK
            44.617 s ± 448 ms, 5.4 GB allocated,  22 MB copied, 1.9 GB peak memory, 0.93x
      1e20
        1
          Data.ByteString.Builder:  OK
            880  ns ±  11 ns, 5.3 KB allocated,  29 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            238  ns ± 714 ps, 575 B  allocated,   4 B  copied, 1.9 GB peak memory, 0.27x
        10
          Data.ByteString.Builder:  OK
            5.70 μs ± 100 ns,  13 KB allocated,  70 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            2.29 μs ±  28 ns, 4.0 KB allocated,  34 B  copied, 1.9 GB peak memory, 0.40x
        100
          Data.ByteString.Builder:  OK
            53.5 μs ± 531 ns, 126 KB allocated, 808 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            22.8 μs ± 423 ns,  37 KB allocated, 197 B  copied, 1.9 GB peak memory, 0.43x
      1e100
        1
          Data.ByteString.Builder:  OK
            2.36 μs ±  43 ns, 7.3 KB allocated,  29 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            620  ns ± 6.2 ns, 1.5 KB allocated,  11 B  copied, 1.9 GB peak memory, 0.26x
        10
          Data.ByteString.Builder:  OK
            20.1 μs ± 349 ns,  34 KB allocated, 223 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            6.73 μs ±  66 ns,  15 KB allocated, 102 B  copied, 1.9 GB peak memory, 0.33x
        100
          Data.ByteString.Builder:  OK
            197  μs ± 2.7 μs, 344 KB allocated, 2.6 KB copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            64.6 μs ± 856 ns, 148 KB allocated, 570 B  copied, 1.9 GB peak memory, 0.33x
      1e300
        1
          Data.ByteString.Builder:  OK
            4.09 μs ±  61 ns,  11 KB allocated,  57 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            2.21 μs ±  28 ns, 5.1 KB allocated,  44 B  copied, 1.9 GB peak memory, 0.54x
        10
          Data.ByteString.Builder:  OK
            38.7 μs ± 665 ns, 109 KB allocated, 535 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            23.3 μs ± 195 ns,  52 KB allocated, 248 B  copied, 1.9 GB peak memory, 0.60x
        100
          Data.ByteString.Builder:  OK
            360  μs ± 3.9 μs, 776 KB allocated, 7.0 KB copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            231  μs ± 2.0 μs, 535 KB allocated, 2.3 KB copied, 1.9 GB peak memory, 0.64x
      1e500
        1
          Data.ByteString.Builder:  OK
            7.01 μs ± 110 ns,  15 KB allocated,  87 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            2.85 μs ±  55 ns, 5.7 KB allocated,  70 B  copied, 1.9 GB peak memory, 0.41x
        10
          Data.ByteString.Builder:  OK
            67.4 μs ± 529 ns, 153 KB allocated, 843 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            28.7 μs ± 304 ns,  60 KB allocated, 266 B  copied, 1.9 GB peak memory, 0.43x
        100
          Data.ByteString.Builder:  OK
            635  μs ± 5.3 μs, 1.2 MB allocated, 8.8 KB copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            271  μs ± 3.2 μs, 629 KB allocated, 3.0 KB copied, 1.9 GB peak memory, 0.43x
      1e1000
        1
          Data.ByteString.Builder:  OK
            12.9 μs ± 192 ns,  25 KB allocated, 165 B  copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            7.70 μs ± 115 ns,  12 KB allocated,  72 B  copied, 1.9 GB peak memory, 0.60x
        10
          Data.ByteString.Builder:  OK
            126  μs ± 1.7 μs, 261 KB allocated, 1.6 KB copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            75.0 μs ± 1.4 μs, 129 KB allocated, 573 B  copied, 1.9 GB peak memory, 0.59x
        100
          Data.ByteString.Builder:  OK
            1.24 ms ±  15 μs, 2.2 MB allocated,  28 KB copied, 1.9 GB peak memory
          Data.Text.Builder.Linear: OK
            748  μs ±  12 μs, 1.3 MB allocated, 6.0 KB copied, 1.9 GB peak memory, 0.60x

1
+ fromIntegral @Word64
( (fromIntegral (BN.bigNatSize n#) * fromIntegral (finiteBitSize @Word 0) * 5)
`shiftR` 4
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we divide by 16 first and multiply by 5 second? It would allow to avoid any overflows I imagine. Might require something larger than 1 + ..., but if we are in a business of rendering large nats a few extra bytes would not matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked: on common arch (32 and 64 bits) finiteBitSize @Word 0 is a multiple of 16 and is >= 29 by the Haskell report. So we can indeed simply this and it will compile to core with just a multiplication (10 or 20 respectively). I kept the original implementation for other (unlikely) arch.

@Bodigrim
Copy link
Owner

FWIW I don't think there is demand for unbounded hexadecimal numbers, so maybe we can skip them.

@wismill
Copy link
Contributor Author

wismill commented May 27, 2024

Rebased

@Bodigrim
Copy link
Owner

It's not a blocker, but I raised Bodigrim/tasty-bench-fit#3. Haskell ecosystem should have an automated method to find thresholds to switch between algorithms.

prependSmallNat ∷ ∀ s. A.MArray s → DigitsWriter s
prependSmallNat marr = go
where
!(# power, poweredBase, _poweredBase² #) = selectPower (# #)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment explaining (# #) trick?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrowed it from ghc-bignum, which is not documented either. Added to my other comment, maybe we should just drop it altogether. There is also bigNatFromAddrLE#, but it buys us little as it will allocates each time and requires handling word size.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I saw @BurningWitness using (# #) trick somewhere, but I do not remember what for.

Copy link

@BurningWitness BurningWitness Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is documented in a note: GHC.Num.BigNat#BigNat. The usage here does exactly the same thing (though I can't say if it's worth it, I try not to import GHC.Exts in my projects).


My use case (definitely in radix-tree, at least) was a more general one: unboxed tuples are the only way to force the evaluation of a function without evaluating its result to WHNF .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BurningWitness thanks for the pointer, I missed the note. Hackage source display does not have a great contrast.

@Bodigrim I added a note for the (# #) use.

mkBigBase ∷ BN.BigNat# → (# Word#, BN.BigNat# #)
mkBigBase n# = go 2## poweredBase²
where
!targetLen = double2Int# (sqrtDouble# (int2Double# (BN.bigNatSize# n#)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this. Why we take square root instead of taking half?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Borrowed from Rust bigint.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such case pow10^k does not seem to be anywhere near sqrt n, as the comment says. Or am I deluded?..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the note to:

Find k such that bigNatSize (pow10^k) ≈ √(bigNatSize n)

@wismill
Copy link
Contributor Author

wismill commented Jun 17, 2024

Late to the party: 8 × AMD Ryzen 5 2500U @ 2GHz

SmallOnly vs. BigOnly
(1,10000)
(1,5000)
(1,2500)
(1,2500)
(1,1250)
(1,1250)
(1,625)
(1,313)
(1,157)
(1,157)
(1,79)
(1,79)
(1,79)
(1,40)
(20,40)
(20,40)
(20,30)
(20,30)
(25,30)
(25,30)
(25,30)
(27,30)
(27,28)
BigOnly vs. HugeOnly
(1,10000)
(1,10000)
(1,10000)
(1,5000)
(1,5000)
(1,2500)
(1,2500)
(1,2500)
(1,1250)
(1,1250)
(625,1250)
(625,937)
(625,781)
(625,703)
(625,703)
(625,703)
(625,703)
(625,664)
(644,664)

@wismill
Copy link
Contributor Author

wismill commented Jun 18, 2024

Rebased, added/fixed notes required by review and the (optional) benchmark. The latter requires setting cabal.project.local for tasty-bench-fit.

@Bodigrim
Copy link
Owner

I pondered about algorithmic complexity of long multiplication / division and still cannot figure a reason why would a range exist (on x86_64) where prependBigNat is optimal. The only explanation I can come up with is that prependHugeNat somehow has an unexpectedly large constant factor. Could we try building a list of powers of 10 (instead of building recursive closures (Bool → DigitsWriter# s)) there?

@wismill
Copy link
Contributor Author

wismill commented Jun 19, 2024

Could we try building a list of powers of 10 (instead of building recursive closures (Bool → DigitsWriter# s)) there?

@Bodigrim I already tried using lists as bytestring does and could not come with a good solution. Frankly I dedicated far too much time here and may have lost myself on the way (fun ride though). Right now I would like to focus on other tasks. If there is no obvious solution we can still consider using the bytestring builder at some threshold and comment/revert the suspicious code. I would not like either to use code in the library if its perf is not clear.

Right now the feature is missing in the library, so maybe take the safest path now and give it another try in a follow-up PR.

@Bodigrim
Copy link
Owner

Fair enough, I appreciate a lot the insane amount of work you already put into this. If you remove FIXME commit, I'm happy to merge.

@wismill
Copy link
Contributor Author

wismill commented Jul 5, 2024

@Bodigrim there’s something wrong with the CI

@Bodigrim
Copy link
Owner

Bodigrim commented Jul 5, 2024

@wismill please rebase, hopefully it will be better now.

@wismill
Copy link
Contributor Author

wismill commented Jul 5, 2024

@Bodigrim done, thanks for the quick fix. I think the last 6-7 commits could be squashed, but it’s up to you.

@wismill wismill marked this pull request as ready for review July 5, 2024 19:04
@Bodigrim
Copy link
Owner

Bodigrim commented Jul 5, 2024

Thanks for the monumental work, @wismill. Let me merge as is; I'll check I wish to change anything later.
(FWIW I'm increasingly inclined to drop prependBigNat algorithm)

@Bodigrim Bodigrim merged commit 1233b66 into Bodigrim:master Jul 5, 2024
7 checks passed
@Bodigrim
Copy link
Owner

This is now released as text-builder-linear-0.1.3, thanks again @wismill!

(Aside: My current workaround for printing Integers and Naturals is shoving in a Show constraint and doing fromString (show n), which feels kinda bad. I moved to linear-text-builder for performance & ergonomics, so I'd love to see this get merged.)

@raehik could you possibly update text-builder-linear and give it a try?

@raehik
Copy link

raehik commented Jul 14, 2024

It works perfectly! Thank you @wismill and @Bodigrim for your work here :) (update commit: strongweak#d38c99f)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants