-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for unbounded integrals #20
Conversation
Currently a draft for early feedback. Although the benchmark results are already quite good, the current implementation allocates too much. I have some idea for a better implementation. Note that I wanted to try a newtype for Hexadecimal formatting pending. |
@Bodigrim Is it OK to depend on |
That's fine. |
9154f9a
to
7a4d92d
Compare
Fixed |
7a4d92d
to
af03f2c
Compare
@Bodigrim new implementation. I am quite happy with the benchmark: Benchmark resultsGHC 9.2.8, Linux, 8 × AMD Ryzen 5 2500U
This is not optimal, because it does not scale as good as the Of course these numbers are not crazily big, but I wonder if there are real use cases with bigger numbers. |
(I have not looked at the implementation yet) What's the reason it does not scale as good as |
Fixed detailed benchmarks: finally they are not as good as I thought… I realized that the simple benchmark had differences with the detailed one for the config. I needed to inline the benchmark builder to get consistency between the two benchmarks. Benchmarking is so tricky! It seems this implementation still beats
@Bodigrim it’s because of the same reason than previous comment highlighted, although at a different scale. The question is: does it matter? I am not satisfied with a suboptimal solution, but on the other hand, are there use cases to format really huge numbers and where the perf of the formatting matters? I mean, the current I will investigate other options, now that the benchmarks seems to display correct results. I did try to implement the |
(Aside: My current workaround for printing |
It is not a blocker indeed. |
So after reading the Core dump, it seems that in the specific benchmark for unbounded integers:
Example of benchmark result using the patterns `-p unbounded -p Huge1`
So while I can find implementations that beat Do you have some advice to improve the situation and enable sharing to work? Or is it just a corner case and we should use a fairer benchmark? |
Using the following function makes the benchmark a bit fairer: benchUnboundedLinearBuilder ∷ Integer → Int → T.Text
benchUnboundedLinearBuilder k m = runBuffer (\b → go b m)
where
go ∷ Buffer ⊸ Int → Buffer
go !acc 0 = acc
go !acc n = case newEmptyBuffer acc of
(# acc', e #) → case dupBuffer ((fromIntegral n * k) $$<| e) of
(# l, r #) → go (l >< (acc' >< r)) (n - 1) |
Yeah, that's a corner case. I'd change the benchmark to |
6787e44
to
b962082
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New implementation. Faster that ByteString
with also much fewer allocations.
bench/BenchDecimalUnbounded.hs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This benchmark file is probably too detailed once we get the implementation right.
-- textBenchName = "Data.Text.Lazy.Builder" | ||
textBenchName = "Data.ByteString.Builder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found easier to compare to ByteString
, but I can revert it (or just comment) because this package is primarily about Text
.
-- | Low-level routine to append data of unknown size to a 'Buffer', giving | ||
-- the action the choice between two strategies. | ||
-- | ||
-- See also: 'appendBounded'. | ||
appendBounded' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal new helper. If deemed a useful addition to the API, maybe we need a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data.Text.Builder.Linear.Core
is a public module, so it's not that internal. Maybe we can put the helper somewhere else to avoid nameshedding?
-- | Low-level routine to prepend data of unknown size to a 'Buffer'. | ||
-- | ||
-- Contrary to 'prependBounded', only use a prepend action. | ||
prependBounded' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal new helper. If deemed a useful addition to the API, maybe we need a better name.
Useful function when the prepender is always faster that the appender.
-- TODO: Remove once we choose a strategy | ||
data Thresholds = SmallOnly | BigOnly | HugeOnly | Optimum | ||
|
||
-- Use the fastest writer depending on the BigNat size | ||
unsafePrependBigNatDec ∷ ∀ s. A.MArray s → DigitsWriter s | ||
unsafePrependBigNatDec marr !off0 !n0 | ||
| BN.bigNatSize n0 < bigSizeThreshold = prependSmallNat marr off0 n0 | ||
| BN.bigNatSize n0 < hugeSizeThreshold = prependBigNat marr off0 n0 | ||
| otherwise = prependHugeNat marr off0 n0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented 3 strategies to get the maximum perf.
These functions perform better (time, space) than the ByteString
builder for at least:
prependSmallNat
: n < 1e2000prependBigNat
: n < 1e20000prependBigNat
: always??
While better implementations are possible, for huge numbers the bottleneck is bigNatQuotRem#
But 3 strategies may be too much.
I see 2 main orientations if we want to simplify the code:
- Be fast only for integers of realistic size: keep
prependBigNat
and possiblyprependSmallNat
. - Be fast and the only
Text
builder that scales as good as theByteString
one: keepprependHugeNat
and possiblyprependBigNat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with using all three strategies. If future maintainers find it burdensome, they can always cut it down :)
SmallOnly → (maxBound, maxBound) | ||
BigOnly → (minBound, maxBound) | ||
HugeOnly → (minBound, minBound) | ||
Optimum → (25, 400) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thresholds were found empirically, so they are subject to caution. It’s valid on my machine (Linux, 64bits), but what about other arch (32bits), OS?
Note that these thresholds are on the BigNat#
size in words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely that aarch64 bounds might be different. I have M2 machine; is there a way for me to figure thresholds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the following “method”: change the line threshold = Optimum
to the appropriate value and run the detailed benchmark with filters, e.g. -p "detailed unbounded" -p Both -p Big
. You may also want to choose a specific count, e.g. -p '$5 == "10"
.
This is clearly not satisfying; a rigorous asymptotic analysis would be better. Guidance welcome!
Note that threshold :: Thresholds
will be removed (or at least commented) in the final implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine you could use fits
to determine asymptotics and constant factors for each method, then solve for optimal points to switch between algorithms. If you happen to craft such program, I'd be happy to run it on aarch64.
(I'm on vacation, so overlazy to do much work myself, sorry :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good use case for fits
indeed!
Here my take to check the complexity using the number of decimal digits:
let s = BigOnly; mi = 1000; ma=10000 :: Word; n = toInteger @Word (maxBound - 1) ^ ma in F.fits $ F.mkFitConfig (\p -> runBuffer (\b -> prependUnboundedDecimal s (rem n (10 ^ p)) b)) (mi, ma)
-- NOTE: ensure to not inline the following numbers, in order to avoid allocations. | ||
|
||
tenPower18 ∷ N.Natural | ||
tenPower18 = 1e18 | ||
{-# NOINLINE tenPower18 #-} | ||
|
||
tenPower38 ∷ N.Natural | ||
tenPower38 = 1e38 | ||
{-# NOINLINE tenPower38 #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim is to avoid to compute and allocate poweredBase²
each time we use the function, but now that I review it I am not sure this is a good idea.
New (fair) benchmark
|
1 | ||
+ fromIntegral @Word64 | ||
( (fromIntegral (BN.bigNatSize n#) * fromIntegral (finiteBitSize @Word 0) * 5) | ||
`shiftR` 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we divide by 16 first and multiply by 5 second? It would allow to avoid any overflows I imagine. Might require something larger than 1 + ...
, but if we are in a business of rendering large nats a few extra bytes would not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked: on common arch (32 and 64 bits) finiteBitSize @Word 0
is a multiple of 16 and is >= 29
by the Haskell report. So we can indeed simply this and it will compile to core with just a multiplication (10 or 20 respectively). I kept the original implementation for other (unlikely) arch.
FWIW I don't think there is demand for unbounded hexadecimal numbers, so maybe we can skip them. |
5228872
to
38b334b
Compare
Rebased |
It's not a blocker, but I raised Bodigrim/tasty-bench-fit#3. Haskell ecosystem should have an automated method to find thresholds to switch between algorithms. |
prependSmallNat ∷ ∀ s. A.MArray s → DigitsWriter s | ||
prependSmallNat marr = go | ||
where | ||
!(# power, poweredBase, _poweredBase² #) = selectPower (# #) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment explaining (# #)
trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I borrowed it from ghc-bignum
, which is not documented either. Added to my other comment, maybe we should just drop it altogether. There is also bigNatFromAddrLE#
, but it buys us little as it will allocates each time and requires handling word size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I saw @BurningWitness using (# #)
trick somewhere, but I do not remember what for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is documented in a note: GHC.Num.BigNat#BigNat. The usage here does exactly the same thing (though I can't say if it's worth it, I try not to import GHC.Exts
in my projects).
My use case (definitely in radix-tree
, at least) was a more general one: unboxed tuples are the only way to force the evaluation of a function without evaluating its result to WHNF .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BurningWitness thanks for the pointer, I missed the note. Hackage source display does not have a great contrast.
@Bodigrim I added a note for the (# #)
use.
mkBigBase ∷ BN.BigNat# → (# Word#, BN.BigNat# #) | ||
mkBigBase n# = go 2## poweredBase² | ||
where | ||
!targetLen = double2Int# (sqrtDouble# (int2Double# (BN.bigNatSize# n#))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this. Why we take square root instead of taking half?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Borrowed from Rust bigint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such case pow10^k
does not seem to be anywhere near sqrt n
, as the comment says. Or am I deluded?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the note to:
Find k such that bigNatSize (pow10^k) ≈ √(bigNatSize n)
ee55a3c
to
38b334b
Compare
Late to the party:
|
38b334b
to
dc6b8c1
Compare
Rebased, added/fixed notes required by review and the (optional) benchmark. The latter requires setting |
I pondered about algorithmic complexity of long multiplication / division and still cannot figure a reason why would a range exist (on |
@Bodigrim I already tried using lists as Right now the feature is missing in the library, so maybe take the safest path now and give it another try in a follow-up PR. |
Fair enough, I appreciate a lot the insane amount of work you already put into this. If you remove |
dc6b8c1
to
79307bd
Compare
@Bodigrim there’s something wrong with the CI |
@wismill please rebase, hopefully it will be better now. |
Add note about overflow of maxBitNatDecLen
c4cbfb8
to
9e0d401
Compare
@Bodigrim done, thanks for the quick fix. I think the last 6-7 commits could be squashed, but it’s up to you. |
Thanks for the monumental work, @wismill. Let me merge as is; I'll check I wish to change anything later. |
This is now released as
@raehik could you possibly update |
It works perfectly! Thank you @wismill and @Bodigrim for your work here :) (update commit: strongweak#d38c99f) |
Currently this package supports only
finite
integrals, butInteger
andNatural
are also common.Add support for unbounded integrals:
Also add related tests and benchmarks.