Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make "dec" and ryu functions faster and simpler #51273

Merged
merged 1 commit into from
Sep 13, 2023
Merged

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Sep 11, 2023

We had some common code in Ryu.append_c_digits that can be combined with Base logic for the same thing. But it turns out all of this duplicated code in Ryu seems to just make it run slightly slower in most cases. The old version had many more branches to check, even though often numbers are small, so only the last check is meaningful. But the assumption that it would be faster even if all of them were used also seems to not hold up in practice. Particularly for a function like append_nine_digits which unrolls completely, but the complicated version has slightly more data dependencies because of they way it is written.

Similarly, we replace unsafe_copy with @inbounds[], since this is better for the optimizer, which doesn't need to treat this operation as an unknown reference escape.

Lastly, we use the append_nine_digits trick from Ryu to make printing of arbitrary big numbers much faster.

julia> @btime string(typemax(Int128))
  402.345 ns (2 allocations: 120 bytes) # before
  151.139 ns (2 allocations: 120 bytes) # after

@nanosoldier runbenchmarks("io" || "misc" || "problem" || "micro" || "shootout", vs=":master")

We had some common code in `Ryu.append_c_digits` that can be combined with
Base logic for the same thing. But it turns out all of this duplicated
code in Ryu seems to just make it run slightly slower in most cases. The
old version had many more branches to check, even though often numbers
are small, so only the last check is meaningful. But the assumption that
it would be faster even if all of them were used also seems to not hold
up in practice. Particularly for a function like `append_nine_digits`
which unrolls completely, but the complicated version has slightly more
data dependencies because of they way it is written.

Similarly, we replace `unsafe_copy` with `@inbounds[]`, since this is
better for the optimizer, which doesn't need to treat this operation as
an unknown reference escape.

Lastly, we use the append_nine_digits trick from Ryu to make printing
of arbitrary big numbers much faster.

```
julia> @Btime string(typemax(Int128))
  402.345 ns (2 allocations: 120 bytes) # before
  151.139 ns (2 allocations: 120 bytes) # after
```
@vtjnash vtjnash added the performance Must go faster label Sep 11, 2023
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@vtjnash vtjnash merged commit e9d9314 into master Sep 13, 2023
8 checks passed
@vtjnash vtjnash deleted the jn/faster-dec-fun branch September 13, 2023 21:36
NHDaly pushed a commit that referenced this pull request Sep 20, 2023
We had some common code in `Ryu.append_c_digits` that can be combined
with Base logic for the same thing. But it turns out all of this
duplicated code in Ryu seems to just make it run slightly slower in most
cases. The old version had many more branches to check, even though
often numbers are small, so only the last check is meaningful. But the
assumption that it would be faster even if all of them were used also
seems to not hold up in practice. Particularly for a function like
`append_nine_digits` which unrolls completely, but the complicated
version has slightly more data dependencies because of they way it is
written.

Similarly, we replace `unsafe_copy` with `@inbounds[]`, since this is
better for the optimizer, which doesn't need to treat this operation as
an unknown reference escape.

Lastly, we use the append_nine_digits trick from Ryu to make printing of
arbitrary big numbers much faster.

```
julia> @Btime string(typemax(Int128))
  402.345 ns (2 allocations: 120 bytes) # before
  151.139 ns (2 allocations: 120 bytes) # after
```
KristofferC pushed a commit that referenced this pull request Jan 24, 2024
We had some common code in `Ryu.append_c_digits` that can be combined
with Base logic for the same thing. But it turns out all of this
duplicated code in Ryu seems to just make it run slightly slower in most
cases. The old version had many more branches to check, even though
often numbers are small, so only the last check is meaningful. But the
assumption that it would be faster even if all of them were used also
seems to not hold up in practice. Particularly for a function like
`append_nine_digits` which unrolls completely, but the complicated
version has slightly more data dependencies because of they way it is
written.

Similarly, we replace `unsafe_copy` with `@inbounds[]`, since this is
better for the optimizer, which doesn't need to treat this operation as
an unknown reference escape.

Lastly, we use the append_nine_digits trick from Ryu to make printing of
arbitrary big numbers much faster.

```
julia> @Btime string(typemax(Int128))
  402.345 ns (2 allocations: 120 bytes) # before
  151.139 ns (2 allocations: 120 bytes) # after
```

(cherry picked from commit e9d9314)
@KristofferC KristofferC added the backport 1.10 Change should be backported to the 1.10 release label Jan 24, 2024
@KristofferC KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label Feb 6, 2024
Drvi pushed a commit to RelationalAI/julia that referenced this pull request Jun 7, 2024
We had some common code in `Ryu.append_c_digits` that can be combined
with Base logic for the same thing. But it turns out all of this
duplicated code in Ryu seems to just make it run slightly slower in most
cases. The old version had many more branches to check, even though
often numbers are small, so only the last check is meaningful. But the
assumption that it would be faster even if all of them were used also
seems to not hold up in practice. Particularly for a function like
`append_nine_digits` which unrolls completely, but the complicated
version has slightly more data dependencies because of they way it is
written.

Similarly, we replace `unsafe_copy` with `@inbounds[]`, since this is
better for the optimizer, which doesn't need to treat this operation as
an unknown reference escape.

Lastly, we use the append_nine_digits trick from Ryu to make printing of
arbitrary big numbers much faster.

```
julia> @Btime string(typemax(Int128))
  402.345 ns (2 allocations: 120 bytes) # before
  151.139 ns (2 allocations: 120 bytes) # after
```

(cherry picked from commit e9d9314)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants