New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[string] String(decoding:as:) fast path for withContiguousStorageIfAvailable #30729
Conversation
@swift-ci please benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful!
4192fef
to
97426c5
Compare
@swift-ci please benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you! @milseman shouldn't we add a test thought that validates we're doing the right thing?
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
There shouldn't be observable behavior differences from this change, just performance. But I do agree we should add a benchmark. I'll work on that. |
@milseman what about a LIT test like the ones I wrote when I last made sure more things are hitting the fast paths? https://github.com/apple/swift/pull/21743/files#diff-ba4d75f225736a97fc5a116d6ecf0821R1 |
97426c5
to
b19e658
Compare
@swift-ci please test |
@swift-ci please benchmark |
Build failed |
Build failed |
Doh, I think you're right. The benchmarks I added are probably still useful (especially the non-contiguous one and the contrast between the two), but a check on the SIL is nicer to enforce this. |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
That's a little all over the place. @swift-ci please benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Ugh, so with this change, release-no-assertions pass the test, but release-with-assertions have different IRGen behavior. Calls are now denoted as
Meaning that in assertion builds, the UMBP version just calls the UMB version (which is actually fine and a good idea, it's probably that there's an assertion preventing some other optimization from messing this up). |
... and code size regressions for the benchmark file is not great, since that will be conflated with the new benchmarks I'm adding. I'm going to spin those off because I think I have a change that will also improve code size. |
@milseman you could try converting those tests to .sil to see if that helps. I'm not sure why they were .ll tests to begin with. I personally don't think the benchmark code size numbers are meaningful. They might correlate with real code size or might just be random. SCK code size is relevant. I think "swift-ci compiler performance" or something like that gets you aggregate code size for the SCK. We don't have any way to get a per project breakdown without running it locally. This takes time, which is why I haven't checked in anything that significantly perturbs codegen in the past year! |
@milseman awesome. Definitely agree that the benchmarks are super useful but SIL checks are just more stable. Regarding the tail calls to other functions: I've added |
@atrick It's more like I saw that both the fast-path and the path that makes an Array out of the code units were always-inline, and Array initializer loops is probably also always inline. I haven't checked the disassembly, but this would be tragic if so. edit: That is only relevant if whether the collection supports |
@milseman I think it's a good idea to hand-outline the slow paths. If you see problems where always-inline functions are inlined into slow paths, I think that should be fixed by adding an |
@swift-ci please benchmark |
For those following along, migrating these test cases to SIL is pending @atrick's SIL optimizer changes. The optimizer in unable to constant-fold the metatype comparison, and thus the entirety of the code is present in SIL form (LLVM later optimizes this). It is important to fold that comparison at the SIL level to accurately represent the actual program being compiled. Since 99+% of the time these generics are known at compilation time, failing to eliminate the branch to the slow path inside this always-inline function grossly distorts any size-based heuristics. |
Update: UnsafeRawBufferPointer (and mutable, and slices) cannot implement withContiguousStorageIfAvailable as that would create a typed pointer from untyped memory. The longer-term solution to avoid having two checks is to add a withContiguousRawStorageIfAvailable that runs a closure over UnsafeRawBufferPointer. It can have a default implementation that calls withContiguousStorageIfAvailable and constructs the raw pointer from that (so existing wCSIA clients get this new path for free). Then, we can run only over the new API. For now, I'll have to keep the |
@milseman Yes, that’s what I implemented in #22028 Unfortunately, basically the entire internal (ABI-exposed) String API works in terms of typed pointers. So supporting URBP would require duplicating all of those entry points. Even though _HasContiguousBytes vends a raw pointer (so URBP can conform), the String initialiser makes a typed pointer using assumingMemoryBound, which might be an invalid binding. So the old code technically has UB, if I understand memory binding correctly (which I might not). |
@karwa Doh, I forgot about that. I like most of the changes in that PR, but we'll want to do a little extra to preserve the ABI. For this PR I'll try to add the
We can change that by just having them remove the typed-ness and call raw internal functions. It is always type safe to go from typed to untyped, it's the reverse that's tricky.
This would expose the potential for UB if there was an existing type binding for that memory, that is if there were any typed pointers that weren't over So in this case, since the set of Let me get this performance optimization in now, and we can try to get the right ABI in place. |
TLDR: @milseman's plan is good Ultimately, I think it would be better if the character decoding utilities did not require typed pointers. Ideally, there would be a different abstraction over a raw buffer and an encoding type. However, that is not necessary in order to avoid undefined behavior. Internally, the String implementation can expose a typed pointer for the duration of a closure. As long as we know that arbitrary code does not access the same memory using a pointer with a different type, then it's safe. The String implementation can guarantee this because it does not execute any arbitrary user code during decoding. To make this 100% safe, we eventually need some compiler support ( Sorry, I don't have time to show example code, but hopefully you get the idea. The important thing is that we have multiple viable approaches to the String implementation. The real question is how to allow 3rd party Collections to participate in fast character decoding, and other fast paths the require contiguous storage. There has never been any question in my mind that we need something like |
Switch String(decoding:as) and other entry points to call withContiguousStorageIfAvailable rather than use _HasContiguousBytes.
Outline the cold, non-contiguous UTF-8 path from String(decoding:as:), saving ~40 bytes (33%) of code size (x86_64 and arm64) from every call site where the contiguity check cannot be constant folded away.
UnsafeRawBufferPointer cannot implement withContiguousStorageIfAvailable because doing so would potentially create a typed pointer from untyped data.
f041354
to
ae224ca
Compare
@swift-ci please benchmark |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Interesting that restoring the @Catfish-Man, it seems highly likely that the check isn't getting eliminated, even at the LLVM IR level. That might be an interesting thing to explore for some easy Data->String wins. |
The style of the benchmarks make them frustratingly hard to investigate to crawl through the disassembly, but I found it does not remove the check:
|
@milseman I'm not sure what |
Currently working around the issue that Substring's wCSIA closure is not being inlined, despite the |
@swift-ci please test |
// | ||
// NOTE: The SIL optimizer cannot currently fold away a (UTF16.self == | ||
// UTF8.self) metatype comparison, so we have to disabel the check-not for UTF-8 | ||
// construction :-( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atrick should metatype inequality be foldable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be easy, just didn't fall out naturally from handling equality. You have to also check that there's no potential relationship between the type values. If you file a bug I or someone else can circle around to it.
Build failed |
Build failed |
fixed. |
Switch String(decoding:as) and other entry points to call
withContiguousStorageIfAvailable rather than use _HasContiguousBytes.
rdar://problem/59148099, SR-12125
rdar://problem/59148099