Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Int256 to reduce BigInts in FD operations. #93

Merged
merged 5 commits into from
Aug 12, 2024
Merged

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented Jun 12, 2024

We do not here explicitly introduce support for FD{BitIntegers.Int256}, though that should work out of the box both before and after this PR.

Rather, this PR uses a (U)Int256 under the hood to prevent allocations from Int128 widening to BigInt in FD operations.
Unfortunately, rem and mod on BitIntegers.Int256 still fall-back to a BigInt (see the note here), so this doesn't completely eliminate the BigInt allocs. But it does reduce them.


This is a pretty small PR, but it should have a big impact on users of FD{Int128}.

Before:

julia> @btime fd * fd setup = (fd = FixedDecimal{Int128,3}(1.234))
  392.413 ns (24 allocations: 464 bytes)
FixedDecimal{Int128,3}(1.523)

After:

julia> @btime fd * fd setup = (fd = FixedDecimal{Int128,3}(1.234))
  213.039 ns (12 allocations: 240 bytes)
FixedDecimal{Int128,3}(1.523)

We do not here explicitly introduce support for FD{BitIntegers.Int256},
though that should work out of the box both before and after this PR.

Rather, this PR _uses_ a (U)Int256 under the hood to prevent allocations
from Int128 widening to BigInt in FD operations.
@NHDaly NHDaly requested a review from Drvi June 12, 2024 19:27
@NHDaly
Copy link
Member Author

NHDaly commented Jun 12, 2024

I just realized this reimplements RelationalAI-oss#7, from 6 years ago (😳) which @TotalVerb had already reviewed. @TotalVerb you may want to do one more pass over this.

NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
NHDaly added a commit that referenced this pull request Jun 13, 2024
Finally implements the fast-multiplication optimization from
#45, but this
time for 128-bit FixedDecimals! :)

This is a follow-up to
#93, which
introduces an Int256 type for widemul. However, the fldmod still
required 2 BigInt allocations.

Now, this PR uses a custom implementation of the LLVM div-by-const
optimization for (U)Int256, which briefly widens to Int512 (😅) to
perform the fldmod by the constant 10^f coefficient.

This brings 128-bit FD multiply to the same performance as 64-bit. :)
Copy link
Collaborator

@Drvi Drvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@NHDaly NHDaly merged commit e0c1932 into master Aug 12, 2024
12 checks passed
@NHDaly NHDaly deleted the nhd-overflow-Int128 branch August 12, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants