Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-15624] withContiguousStorageIfAvailable is not inlined early #57920

Open
karwa opened this issue Dec 20, 2021 · 3 comments
Open

[SR-15624] withContiguousStorageIfAvailable is not inlined early #57920

karwa opened this issue Dec 20, 2021 · 3 comments

Comments

@karwa
Copy link
Contributor

@karwa karwa commented Dec 20, 2021

Previous ID SR-15624
Radar rdar://problem/86754177
Original Reporter @karwa
Type Bug

Attachment: Download

Environment

Apple Swift version 5.5.2 (swiftlang-1300.0.47.5 clang-1300.0.29.30)

Target: x86_64-apple-macosx11.0

Xcode 13.2

Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug
Assignee None
Priority Medium

md5: 59c0f16facc590c7ededabf32822573e

Issue Description:

I've been using the `@_semantics("optremark")` feature to try and optimise code-size. While doing so, I noticed something interesting. Consider these remarks in one function, which operates on generic Collections of UInt8:

As you can see, the optimiser is doing a ton of work, and a huge number of specializations get generated for all kinds of collection types - slices of ContiguousArray, UnsafeBufferPointer, UnsafeBoundsCheckedBufferPointer, WebURL.UTF8View, etc - and it tries to devirtualise all of these calls.

Now, what is really interesting is that almost none of these collection types actually enter this function. That's because, in the caller, we use withContiguousStorageIfAvailable to collapse all contiguous collections to UnsafeBufferPointer, then wrap that in an UnsafeBoundsCheckedBufferPointer. That's the only contiguous storage type which is ever used by this function.

So why is the optimiser doing all of this work?

Well, it turns out, the caller looks something like this:

return inputString.withContiguousStorageIfAvailable {
  _urlFromBytes_impl($0.boundsChecked, baseURL: baseURL, callback: &callback)
} ?? _urlFromBytes_impl(inputString, baseURL: baseURL, callback: &callback)

So - use the contiguous storage if possible, otherwise fall back to using the collection type itself.

What I believe is happening (what it looks like) is that this function gets constant-folded after generic specialisation, so the optimiser does a bunch of work specialising my entire library, devirtualising stuff, calculating whether things should be inlined, etc – and then just discards it all later.

We can test this. I added a protocol, "KnownContiguousStorage", which includes a "withUnsafeBufferPointer" method (unlike wCSIA, it does not return an optional - there is guaranteed contiguous storage), and conformed the types listed above to that protocol. Next, I added a dynamic downcast before using wCSIA:

if let buffer = inputString as? KnownContiguousStorage {
  return buffer._withUnsafeBufferPointer {
    _urlFromBytes_impl($0.boundsChecked, baseURL: baseURL, callback: &callback)
  }
}

return inputString.withContiguousStorageIfAvailable {
  _urlFromBytes_impl($0.boundsChecked, baseURL: baseURL, callback: &callback)
} ?? _urlFromBytes_impl(inputString, baseURL: baseURL, callback: &callback)

Now, let's look at what happens to the function mentioned at the start, at the same location:

Boom. We've gone from 26 remarks at this location down to 9. The optimiser no longer spends time generating specialisations which it will not use.

Let's take a look at that caller, and how its remarks have changed.

Before:

We see that the compiler recognises that it can devirtualise and specialise wCSIA, but still generates the specialisation in case UnsafeBufferPointer is not contiguous. Obviously, this will never be used.

After:

Now the compiler sees that UnsafeBufferPointer only has one path to follow, and that path results in it being wrapped in a UnsafeBoundsCheckedBufferPointer. As we've seen, it is thus able to do much less work.

It's worth pointing out that this doesn't affect binary size (as far as I can tell), although one would expect that it does impact compile time. It might be worth trying to make wCSIA inline(__always), so that the result can be used earlier by the optimiser and it can avoid optimising code that will certainly be culled later.

@karwa
Copy link
Contributor Author

@karwa karwa commented Dec 20, 2021

Ping @glessard @atrick

I think this may be of interest to one/both of you. I don't know whether this can be fixed in the standard library (perhaps making wCISA inline(__always)), or whether the optimiser can be a bit smarter about where it spends its time. But there are potentially compile-time savings here.

@glessard
Copy link
Contributor

@glessard glessard commented Dec 20, 2021

@swift-ci create

@atrick
Copy link
Member

@atrick atrick commented Dec 22, 2021

Thank you @karwa . That's an excellent writeup. I think the optimizer will need to work around this issue since `withContiguousStorage` is central to stdlib collections. It does raise the question of better ways to design customization hooks going forward. I'm not sure what that answer is yet.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants