New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib][SR-7556] Re-implement string-to-integer parsing #36623
Conversation
This comment has been minimized.
This comment has been minimized.
There's also #30094 by @PatrickPijnappel |
@benrimmington It's been long enough that I'd forgotten about that PR 🤦, and the bug doesn't make mention of it. If @PatrickPijnappel wants to finish that one up, happy to set this aside. The solution presented here is significantly less involved, and I'm curious to see what the benchmarks show. If there's sufficient incremental improvement, this could be landed without blocking a subsequent more sophisticated implementation that makes use of SWAR as @PatrickPijnappel is doing. |
This comment has been minimized.
This comment has been minimized.
I feel bad I didn't get to finishing that PR for such a long time, my work situation changed. It was very close to being merged, just stuck on a final simplification that seemed to change retain/release behavior wiping out gains. If you're interested, I'm open to collaborating on that one somehow—I can prioritize some time. Nevertheless, if this PR delivers significant gains it makes sense to merge it first, especially since it doesn't introduce anything that needs to be maintained from an ABI perspective. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@PatrickPijnappel I'm also not blessed with a large amount of time these days, sadly. I was more hoping that there was some low-hanging fruit here; if the benchmarks aren't really exciting, I think I'll have to leave this work in others' hands. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@swift-ci benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@swift-ci test Linux platform |
@swift-ci test macOS platform |
@swift-ci benchmark |
This comment has been minimized.
This comment has been minimized.
@swift-ci benchmark |
Build failed |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
This comment has been minimized.
This comment has been minimized.
@swift-ci please smoke test windows |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@swift-ci smoke test |
This comment has been minimized.
This comment has been minimized.
3 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Ugh, really? @swift-ci smoke test macOS platform |
@swift-ci test macOS platform |
@swift-ci smoke test Windows platform |
@milseman Ship it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR attempts to improve code size and performance for string-to-integer parsing without introducing any additions to the ABI. As noted in the relevant bug:
To confront this issue, this PR introduces a new
_parseASCII
. Instead of taking an argument of typeT: IteratorProtocol where T.Element == UnsignedInteger
, it expects a buffer of typeUnsafeBufferPointer<UInt8>
. This is made possible by the existence of thewithContiguousStorageIfAvailable
API onStringProtocol
.If contiguous storage is not available, then an
@inline(never)
fallback is called, which initializes a mutableString
value and then makes use of thewithUTF8
API. We consign this function to the fallback path in order to avoid creating a mutable copy if we can.The extremely pared down implementation shown here is the result of several iterative rounds of benchmarking and simplification described below. It's tempting to consider further specializations for base 8, 10, or 16, but the possible wins would appear to be negligible without a significantly more sophisticated implementation (such as that attempted in #30094, as pointed out in the conversation below). Those are deferred to prioritize addressing some low-hanging fruit.
Resolves SR-7556.
Summary of findings
Baseline
As a baseline, I used #36625 to demonstrate what would occur if the existing implementation had its
@inline(never)
and@_semantics
markings removed. This revealed sizable improvements in microbenchmark performance but a significant regression in code size (excerpted results below):Improvement
Regression
Improvement
First attempts
The first attempts to improve upon the status quo yielded similar results to the above.
A manually specialized implementation of
_parseASCII
and a new_parseASCIIDigits
were added which take anUnsafeBufferPointer<UInt8>
argument as the source. Additionally, generic versions of the above were maintained; with this setup, I attempted to mark the duplicated implementations with different attributes in the hopes of fine tuning code size and performance.However, even after marking the fallback generic helper functions using
@inline(never)
, there were improvements in microbenchmarks but significant code size regressions (excerpted results below):Improvement
Regression
Minimum code size
After removing all manually repeated code and further simplifying the implementation, I attempted to remove the
@inline(__always)
marking fromFixedWidthInteger.init(_:radix:)
and to test the effect of explicitly requiring partial specializations forS == String
and forS == Substring
using@_specialize(kind: partial, ...)
.This produced a result that decreased the compiled size of the standard library itself by ~1%, as well as improvements in the code size of the
IntegerParsing
microbenchmarks. However, it wiped out most performance improvements at-O
, except forStrToInt
(excerpted results below):Regression
Improvement
Regression
Improvement
CODE SIZE: -swiftlibs
Inlined performance
The final form of this PR restores the
@inline(__always)
marking toFixedWidthInteger.init(_:radix:)
(and simplifies the implementation further). Doing so produces a result where a sizable proportion of the performance benefit seen in the baseline benchmarks can be recovered with a very modest code size increase. As before, the compiled size of the standard library itself is decreased by ~1%.This implementation relies on no
@_semantics
annotations and, perhaps relatedly, exhibits performance improvements at-O
,-Osize
, and-Onone
(excerpted-O
results below):Improvement
Regression
CODE SIZE: -swiftlibs
(All versions of this PR show varying degrees of code size regressions in
LuhnAlgoEager
,LuhnAlgoLazy
,RangeReplaceableCollectionPlusDefault
, andDictionaryCompactMapValues
. I have to presume that they are attributable to emitting this new implementation into the client; in the final iteration, these code size increases are the most modest yet.)