Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-10683] unsafely getting Swift to trust me that it doesn't need to ARC #53081

Closed
weissi opened this issue May 14, 2019 · 19 comments
Closed

[SR-10683] unsafely getting Swift to trust me that it doesn't need to ARC #53081

weissi opened this issue May 14, 2019 · 19 comments

Comments

@weissi
Copy link
Member

@weissi weissi commented May 14, 2019

Previous ID SR-10683
Radar rdar://problem/52529016
Original Reporter @weissi
Type Bug
Status Closed
Resolution Done

Attachment: Download

Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, Performance
Assignee None
Priority Medium

md5: f50041869021cf418f7c6f49e80278d5

Issue Description:

I'm currently working on getting SwiftNIO's ChannelPipeline as fast as Netty's. Obviously, the JVM is garbage-collected so certain micro benchmarks naturally suit that sort of language quite well and one of them is repeatedly accessing pointers that are not changed very often. The GC won't actually have much work to do because there's not much garbage produced so the program can happily chase the pointers without every writing to memory. Swift uses ARC and ARC does (through retain/release) come with constant, very small overheads so in simple pointer chasing benchmarks, ARC will likely look worse than a GC'd language.

Right now, SwiftNIO's ChannelPipeline is about 6x slower than Netty's and that's pretty much exclusively due to ref count operations done whilst walking the pipeline (which happens all the time). Luckily, there's a simple fix but it involves _withUnsafeGuaranteedRef and I'm not happy with that.

Please find attached a program that is a pretty good approximation of SwiftNIO's ChannelPipeline and it demonstrates the problem very well.

public final class Holder {
    unowned(unsafe) var next: Holder!
    var nextSafe: Holder?
    let node: EventHandler

    init(_ node: EventHandler) {
        self.next = nil
        self.node = node
    }

    func invokeEvent() {
        self.node.event(holder: self)
    }

    @inline(never)
    public func fireEvent() {
#if FAST
        Unmanaged<Holder>.passUnretained(self.next!)._withUnsafeGuaranteedRef {
            $0.invokeEvent()
        }
#else
        // swift_retain(next = self.next!)   // <<< I WANT TO KILL THIS
        self.next!.invokeEvent()
        // swift_release(next)               // <<< AND THAT
#endif
    }
}

Naturally, ARC is correct and the retain of self.next is necessary because self.next could be reassigned during the self.next!.invokeEvent() call.

However, we could make a very simple change in SwiftNIO that would guarantee that this could never happen but obviously this will be hard to prove to the compiler. I tried expressing this with unowned(unsafe) but still, the compiler doesn't trust me that it doesn't need to perform ARC here...

I would like a way to express this but apart from _withUnsafeGuaranteedRef I don't think Swift has anything like that that isn't underscored...

repro

  • Download the attached program:

  • run

$ swiftc -Ounchecked cp_perf.swift && time ./cp_perf

real    0m2.746s
user    0m2.738s
sys 0m0.005s

and then

$ swiftc -Ounchecked -DFAST cp_perf.swift && time ./cp_perf

real    0m1.080s
user    0m1.070s
sys 0m0.005s

The interesting thing is that the second version pretty much exactly matches Netty's speed 🙂

@weissi
Copy link
Member Author

@weissi weissi commented May 14, 2019

CC @eeckstein/aschwaighofer@apple.com (JIRA User)/@gottesmm

@weissi
Copy link
Member Author

@weissi weissi commented Jul 2, 2019

@swift-ci create

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Jul 8, 2019

Taking a quick look at this.

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Jul 8, 2019

I can fix this without unsafe guaranteed with semantic arc that is on master. Let me cook something up real quick.

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Jul 8, 2019

Also, can you add this as a benchmark to the swift benchmark suite?

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Jul 8, 2019

So some interesting stuff. I can get this to work, but I would need to make some changes to SemanticARCOpts that would be slightly larger than I want to make to it right now (even though I can do it, it would just take a little longer).

Part of the problem you are running into is that in your API you are not passing Unmanaged things. So to pass it, the compiler will want a safe +0 value, meaning it will insert a copy. If you do like what I did below, you get the behavior you are looking for. That being said, I wonder if we can use a property wrapper to hide the withUnsafeGuaranteed.

// MODULE: Framework
import Foundation

public protocol EventHandler: class {
  func event(holder: Unmanaged<Holder>)
}

extension EventHandler {
  public func event(holder: Unmanaged<Holder>) {
    holder._withUnsafeGuaranteedRef { $0.fireEvent() }
  }
}

public final class Pipeline {
  var head: Holder? = nil

  public init() {}

  public func addHandler(_ handler: EventHandler) {
    if self.head == nil {
      self.head = Holder(handler)
      return
    }

    var node = self.head
    while node?.next != nil {
      node = node?.next
    }
    node?.nextSafe = Holder(handler)
    node?.next = node?.nextSafe
  }

  public func fireEvent() {
    self.head!.invokeEvent()
  }
}

public final class Holder {
  unowned(unsafe) var next: Holder!
  var nextSafe: Holder?
  let node: EventHandler

  private var unsafeSelf: Unmanaged<Holder> {
    return Unmanaged<Holder>.passUnretained(self)
  }

  init(_ node: EventHandler) {
    self.next = nil
    self.node = node
  }

  func invokeEvent() {
    self.node.event(holder: unsafeSelf)
  }

  @inline(never)
  public func fireEvent() {
    #if FAST
    Unmanaged<Holder>.passUnretained(self.next!)._withUnsafeGuaranteedRef { $0.invokeEvent() }
    #else
    self.unsafeSelf._withUnsafeGuaranteedRef { $0.invokeEvent() }
    #endif
  }
}

// MODULE: Middleware
//import Framework

public final class NoOpHandler: EventHandler {
  public init() {}
}

public final class ConsumingHandler: EventHandler {
  var consumed = 0
  public init() {}
  public func event(holder: Holder) {
    self.consumed += 1
  }
}

// MODULE: Main
//import Middleware
//import Framework

@inline(never)
func runBench() {
  let pipeline = Pipeline()
  for _ in 0..<5 {
    pipeline.addHandler(NoOpHandler())
  }
  pipeline.addHandler(ConsumingHandler())

  for _ in 0 ..< 30_000_000 {
    pipeline.fireEvent()
  }
}

runBench()

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Jul 8, 2019

Another thing to consider here is that since you made the underling type unsafe guaranteed, I can't use borrows.

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Jul 8, 2019

Actually. I figured out how to do this. Took me a second.

@weissi
Copy link
Member Author

@weissi weissi commented Jul 9, 2019

@gottesmm Thanks so much for looking into this, that sounds really promising. I had added the base case (no unmanage/unsafeGuaranteedRef) to the benchmark suite (https://github.com/apple/swift/pull/24765/files). That base case IMHO can't be optimised because the compiler won't be able to proof that node.next won't become nil.

I haven't actually added any of the Unmanaged/_withUnsafeGuaranteedRef yet because I didn't know what the "preferred" spelling in actual Swift will be that tell the compiler 'this variable won't be mutated during a call on it' actually is. @gottesmm does that make sense? Is unowned(unsafe) var next: Holder! the right spelling or rather with Unmanaged.takeUnretained or what would be the best way to spell this?

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Jul 9, 2019

That actually is not an issue since ossa can express that property and optimize even so.

I have a patch ready that fixes this for ossa. I am just adding some tests.

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Aug 13, 2019

For my memory. I think this is fixed by the forwarding ossa optimization that I need to finish.

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Mar 4, 2020

I landed the forwarding optimizations on master some time ago. Lets see if the problem goes away.

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Mar 4, 2020

Looked at this a bit more. We now are just hitting the strong_retain_* from the use of unsafe (unowned). But beyond that all of the ARC traffic is gone. That being said, I do think we can get rid of all of the ARC traffic with a small source change and a few other changes I have in the pipeline. Consider the following rewrite:

public final class Holder {
    var next: Holder!

    var nextSafe: Holder?
    let node: EventHandler

    init(_ node: EventHandler) {
        self.next = nil
        self.node = node
    }

    func invokeEvent() {
        self.node.event(holder: self)
    }

    @inline(never)
    public func fireEvent() {
        self.next!.invokeEvent()
        var this = self
        @_transparent
        func inoutWrapper(_ h: inout Holder!) {
            h!.invokeEvent()
        }
        inoutWrapper(&this.next)
    }
}

The key thing is to note is that I eliminated the usage of unsafe(unowned) to prevent the copy_unmanaged_value from getting emitted (with its semantics of having a retain that can not be eliminated by the optimizer). The other thing to note is that the inoutWrapper function forces SILGen to open an access scope over the entire switch instead of just over the load of the value to be switched on. E.x.:

%0 = begin_access
%1 = load [copy] %0 // <---- Can not eliminate this load [copy] b/c of the end_access
end_access %0
switch_enum %1 

Until we get true +0 return values and the ability to express "this access scope needs to be opened over multiple statements", we can not optimize the code in this form. That being said, we can use a transparent helper function that takes the ivar inout to expand the access scope of the switch_enum. Example:

    @inline(never)
    public func fireEvent() {
        var this = self
        @_transparent
        func inoutWrapper(_ h: inout Holder!) {
            h!.invokeEvent()
        }
        inoutWrapper(&this.next)
    }

In this case, we get the lvalue end_access over the entirety of inoutWrapper. This yields the following SIL when we hit SemanticARCOpts with near ToT SIL:

// Holder.fireEvent()
sil [noinline] [ossa] @$s7cp_perf6HolderC9fireEventyyF : $@convention(method) (@guaranteed Holder) -> () {
// %0                                             // users: %3, %1
bb0(%0 : @guaranteed $Holder):
  debug_value %0 : $Holder, let, name "self", argno 1 // id: %1
  %2 = alloc_stack $Holder, var, name "this"      // users: %5, %19, %4
  %3 = copy_value %0 : $Holder                    // user: %4
  store %3 to [init] %2 : $*Holder                // id: %4
  %5 = load [take] %2 : $*Holder                  // users: %18, %6
  %6 = begin_borrow %5 : $Holder                  // users: %17, %7
  %7 = ref_element_addr %6 : $Holder, #Holder.next // user: %8
  %8 = begin_access [modify] [dynamic] %7 : $*Optional<Holder> // users: %9, %16
  %9 = load [copy] %8 : $*Optional<Holder>        // user: %10
  switch_enum %9 : $Optional<Holder>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb2 // id: %10

// %11                                            // users: %14, %13
bb1(%11 : @owned $Holder):                        // Preds: bb0
  // function_ref Holder.invokeEvent()
  %12 = function_ref @$s7cp_perf6HolderC11invokeEventyyF : $@convention(method) (@guaranteed Holder) -> () // user: %13
  %13 = apply %12(%11) : $@convention(method) (@guaranteed Holder) -> ()
  destroy_value %11 : $Holder                     // id: %14
  %15 = tuple ()
  end_access %8 : $*Optional<Holder>              // id: %16
  end_borrow %6 : $Holder                         // id: %17
  destroy_value %5 : $Holder                      // id: %18
  dealloc_stack %2 : $*Holder                     // id: %19
  %20 = tuple ()                                  // user: %21
  return %20 : $()                                // id: %21

bb2:                                              // Preds: bb0
  %22 = integer_literal $Builtin.Word, 14
   ...
} // end sil function '$s7cp_perf6HolderC9fireEventyyF'

This is a form that we can get to zero rr today with a few patches that I have in stream. Specifically:

1. I have a patch that eliminates the alloc_stack in the example above. It comes from teaching temprvalue opts how to eliminate read only allocations that are initialized by a store. Today we only support copy_addr. That will change the above SIL to the following:

// Holder.fireEvent()
sil [noinline] [ossa] @$s7cp_perf6HolderC9fireEventyyF : $@convention(method) (@guaranteed Holder) -> () {
// %0                                             // users: %3, %1
bb0(%0 : @guaranteed $Holder):
  debug_value %0 : $Holder, let, name "self", argno 1 // id: %1
  %5 = copy_value %0 : $Holder                    // user: %4
  %6 = begin_borrow %5 : $Holder                  // users: %17, %7
  %7 = ref_element_addr %6 : $Holder, #Holder.next // user: %8
  %8 = begin_access [modify] [dynamic] %7 : $*Optional<Holder> // users: %9, %16
  %9 = load [copy] %8 : $*Optional<Holder>        // user: %10
  switch_enum %9 : $Optional<Holder>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb2 // id: %10

// %11                                            // users: %14, %13
bb1(%11 : @owned $Holder):                        // Preds: bb0
  // function_ref Holder.invokeEvent()
  %12 = function_ref @$s7cp_perf6HolderC11invokeEventyyF : $@convention(method) (@guaranteed Holder) -> () // user: %13
  %13 = apply %12(%11) : $@convention(method) (@guaranteed Holder) -> ()
  destroy_value %11 : $Holder                     // id: %14
  %15 = tuple ()
  end_access %8 : $*Optional<Holder>              // id: %16
  end_borrow %6 : $Holder                         // id: %17
  destroy_value %5 : $Holder                      // id: %18
  %20 = tuple ()                                  // user: %21
  return %20 : $()                                // id: %21

bb2:                                              // Preds: bb0
  %22 = integer_literal $Builtin.Word, 14
   ...
} // end sil function '$s7cp_perf6HolderC9fireEventyyF'

That will then allow for us to eliminate the copy_value of %5. Yielding:

// Holder.fireEvent()
sil [noinline] [ossa] @$s7cp_perf6HolderC9fireEventyyF : $@convention(method) (@guaranteed Holder) -> () {
// %0                                             // users: %3, %1
bb0(%0 : @guaranteed $Holder):
  debug_value %0 : $Holder, let, name "self", argno 1 // id: %1
  %7 = ref_element_addr %0 : $Holder, #Holder.next // user: %8
  %8 = begin_access [modify] [dynamic] %7 : $*Optional<Holder> // users: %9, %16
  %9 = load [copy] %8 : $*Optional<Holder>        // user: %10
  switch_enum %9 : $Optional<Holder>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb2 // id: %10

// %11                                            // users: %14, %13
bb1(%11 : @owned $Holder):                        // Preds: bb0
  // function_ref Holder.invokeEvent()
  %12 = function_ref @$s7cp_perf6HolderC11invokeEventyyF : $@convention(method) (@guaranteed Holder) -> () // user: %13
  %13 = apply %12(%11) : $@convention(method) (@guaranteed Holder) -> ()
  destroy_value %11 : $Holder                     // id: %14
  %15 = tuple ()
  end_access %8 : $*Optional<Holder>              // id: %16
  %20 = tuple ()                                  // user: %21
  return %20 : $()                                // id: %21

bb2:                                              // Preds: bb0
  %22 = integer_literal $Builtin.Word, 14
   ...
} // end sil function '$s7cp_perf6HolderC9fireEventyyF'

2. Finally, we need to generalize my patch that proves that we can promote inout parameters that are never mutating to also support mutating begin_access that are never actually written to. Should be relatively simple. Once we have that, we should be able to eliminate the copy_value above, yielding what we want:

// Holder.fireEvent()
sil [noinline] [ossa] @$s7cp_perf6HolderC9fireEventyyF : $@convention(method) (@guaranteed Holder) -> () {
// %0                                             // users: %3, %1
bb0(%0 : @guaranteed $Holder):
  debug_value %0 : $Holder, let, name "self", argno 1 // id: %1
  %7 = ref_element_addr %0 : $Holder, #Holder.next // user: %8
  %8 = begin_access [modify] [dynamic] %7 : $*Optional<Holder> // users: %9, %16
  %9 = load_borrow %8 : $*Optional<Holder>        // user: %10
  switch_enum %9 : $Optional<Holder>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb2 // id: %10

// %11                                            // users: %14, %13
bb1(%11 : @guaranteed $Holder):                        // Preds: bb0
  // function_ref Holder.invokeEvent()
  %12 = function_ref @$s7cp_perf6HolderC11invokeEventyyF : $@convention(method) (@guaranteed Holder) -> () // user: %13
  %13 = apply %12(%11) : $@convention(method) (@guaranteed Holder) -> ()
  end_borrow %11 : $Holder                     // id: %14
  %15 = tuple ()
  end_access %8 : $*Optional<Holder>              // id: %16
  %20 = tuple ()                                  // user: %21
  return %20 : $()                                // id: %21

bb2:                                              // Preds: bb0
  %22 = integer_literal $Builtin.Word, 14
   ...
} // end sil function '$s7cp_perf6HolderC9fireEventyyF'

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Mar 5, 2020

PR #30225 takes care of part 1. So now all I would need to do is the inout thing I mentioned. Then you can use the pattern I mentioned there.

@weissi
Copy link
Member Author

@weissi weissi commented Mar 5, 2020

That's awesome. Thanks very much @gottesmm, I filed apple/swift-nio#1435 to look into porting this change into the actual NIO codebase 🙂

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Mar 5, 2020

Did the last piece of work here:

#30227

@gottesmm
Copy link
Member

@gottesmm gottesmm commented Mar 5, 2020

So JW you should be good to go!

@weissi
Copy link
Member Author

@weissi weissi commented Apr 14, 2020

Thanks @gottesmm, this can be closed. If I come across more complicated cases where that I still can’t fix, I’ll file new issues

@glessard
Copy link
Contributor

@glessard glessard commented Jun 4, 2021

Reporter confirmed issue is resolved.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants