Skip to content

Latest commit

 

History

History
1507 lines (1188 loc) · 61.9 KB

Accessors.rst

File metadata and controls

1507 lines (1188 loc) · 61.9 KB
orphan

Optimizing Accessors in Swift

Definitions

An abstract storage declaration is a language construct that declares a means of accessing some sort of abstract entity. I'll just say "storage" hereafter.

Swift provides three storage declarations:

  • a single named entity, called a variable and declared with var
  • a single named entity which can never be reassigned, called a constant and declared with let
  • a compound unnamed entity accessed with an index, called a subscript and declared with subscript

These features are similar to those in other languages. Swift notably lacks compound named entities, such as C#'s indexed properties; the design team intentionally chose to favor the use of named single entities of subscriptable type.

It's useful to lump the two kinds of single named entities together; I'll just call them both "variables" hereafter.

Subscripts must always be instance type members. When a variable is a type member, it's called a property.

When I say that these entities are abstract, I mean that they're not directly tied to any particular implementation. All of them may be backed directly by memory storing values of the entity's type, or they may simply provide a way of invoking arbitrary code to access a logical memory, or they may be a little of both.

Full-value accesses

All accesses to storage, no matter the implementation, can be performed with two primitive operations:

  • a full-value load, which creates a new copy of the current value
  • a full-value store, which overwrites the current value with a different, independent value

A function which implements a full-value load is called a getter; a full-value store, a setter.

An operation which calls for a full-value load into a temporary, then a modification of the temporary, then a full-value store of the temporary into the original entity, is called a write-back.

Implementing accesses with full-value accesses introduces two problems: subobject clobbering and performance.

Subobject clobbering

Subobject clobbering is a semantic issue. It occurs when there are two changes to an entity in flight at once, as in:

swap(&point.x, &point.y)

(By "at once", I mean synchronously. Unlike Java, Swift is willing to say that an asynchronous simultaneous access (mutating or not) to an entity that's being modified is completely undefined behavior, with any sort of failure permitted, up to and including memory corruption and crashes. As Swift transitions towards being a systems language, it can selectively choose to define some of this behavior, e.g. allowing racing accesses to different elements of an array.)

Subobject clobbering is a problem with user expectation. Users are unlikely to write code that intentionally modifies two obviously overlapping entities, but they might very reasonably write code that modifies two "separate" sub-entities. For example, they might write code to swap two properties of a struct, or two elements of an array. The natural expansion of these subobject accesses using whole-object accesses generates code that "loses" the changes made to one of the objects:

var point0 = point
var x = point0.x
var point1 = point
var y = point1.y
swap(&x, &y)
point1.y = y
point = point1
point0.x = x
point = point0

Note that point.y is left unchanged.

Local analysis

There are two straightforward solutions to subobject clobbering, both reliant on doing local analysis to recognize two accesses which obviously alias (like the two references to point in the above example). Once you've done this, you can either:

  • Emit an error, outlawing such code. This is what Swift currently does (but only when the aliasing access must be implemented with full-value loads and stores).
  • Use a single temporary, potentially changing the semantics.

It's impossible to guarantee the absence of subobject clobbering with this analysis without extremely heavy-handed languages changes. Fortunately, subobject clobbering is "only" a semantics issue, not a memory-safety issue, at least as long as it's implemented with full-value accesses.

Reprojection

A more general solution is to re-project the modified subobject from scratch before writing it back. That is, you first acquire an initial value like normal for the subobject you wish to modify, then modify that temporary copy in-place. But instead of then recursively consuming all the intermediate temporaries when writing them back, you drop them all and recompute the current value, then write the modified subobject to it, then write back all the way up. That is:

var point0 = point
var x = point0.x
var point1 = point
var y = point1.y
swap(&x, &y)
point1 = point      // reload point1
point1.y = y
point = point1
point0 = point      // reload point0
point0.x = x
point = point0

In this example, I've applied the solution consistently to all accesses, which protects against unseen modifications (e.g. during the call to swap) at the cost of performing two extra full-value loads.

You can heuristically lower this by combining it with a simple local analysis and only re-projecting when writing back to other l-values besides the last. In other words, generate code that will work as long as the entity is not modified behind abstraction barriers or through unexpected aliases:

var point0 = point
var x = point0.x
var point1 = point
var y = point1.y
swap(&x, &y)
point1.y = y        // do not reload point1
point = point1
point0 = point      // reload point0
point0.x = x
point = point0

Note that, in either solution, you've introduced extra full-value loads. This may be quite expensive, and it's not guaranteed to be semantically equivalent.

Performance

There are three major reasons why full-value accesses are inefficient.

Unnecessary subobject accesses

The first is that they may load or store more than is necessary.

As an obvious example, imagine a variable of type (Int, Int); even if my code only accesses the first element of the tuple, full-value accesses force me to read or write the second element as well. That means that, even if I'm purely overwriting the first element, I actually have to perform a full-value load first so that I know what value to use for the second element when performing the full-value store.

Additionally, while unnecessarily loading the second element of an (Int, Int) pair might seem trivial, consider that the tuple could actually have twenty elements, or that the second element might be non-trivial to copy (e.g. if it's a retainable pointer).

Abstraction barriers

A full-value load or store which you can completely reason about is one thing, but if it has to be performed as a call, it can be a major performance drag.

For one, calls do carry a significant amount of low-level overhead.

For another, optimizers must be extremely conservative about what a call might do. A retainable pointer might have to be retained and later released purely to protect against the possibility that a getter might, somehow, cause the pointer to otherwise be deallocated.

Furthermore, the conventions of the call might restrict performance. One way or another, a getter for a retainable pointer generally returns at +1, meaning that as part of the return, it is retained, forcing the caller to later release. If the access were instead direct to memory, this retain might be avoidable, depending on what the caller does with the pointer.

Copy-on-write

These problems are compounded by copy-on-write (COW) types. In Swift, a copy-on-write value embeds an object reference. Copying the value has low immediate cost, because it simply retains the existing reference. However, modifying a value requires the reference to be made unique, generally by copying the data held by the value into a fresh object. I'll call this operation a structural copy in an effort to avoid the more treacherous term "deep copy".

COW types are problematic with full-value accesses for several reasons.

First, COW types are often used to implement aggregates and thus often have several distinguishable subobjects which users are likely to think of as independent. This heightens the dangers of subobject clobbering.

Second, a full-value load of a COW type implies making the object reference non-unique. Changing the value at this point will force a structural copy. This means that modifying a temporary copy has dramatically worse performance compared to modifying the original entity in-place. For example:

window.name += " (closing)"

If &window.name can be passed directly to the operator, and the string buffer is uniquely referenced by that string, then this operation may be as cheap as copying a few characters into the tail of the buffer. But if this must be done with a write-back, then the temporary will never have a unique reference, and there will always be an unneeded structural copy.

Conservative access patterns

When you know how storage is implemented, it's straightforward to generate an optimal access to it. There are several major reasons why you might not know how a storage declaration is implemented, though:

  • It might be an abstract declaration, not a concrete declaration. Currently this means a protocol member, but Swift may someday add abstract class members.
  • It might be a non-final class member, where the implementation you can see is potentially overridable by a subclass.
  • It might be a resilient declaration, where you know only that the entity exists and know nothing statically about its implementation.

In all of these cases, you must generate code that will handle the worst possible case, which is that the entity is implemented with a getter and a setter. Therefore, the conservative access pattern includes opaque getter and setter functions.

However, for all the reasons discussed above, using unnecessary full-value accesses can be terrible for performance. It's really bad if a little conservatism --- e.g. because Swift failed to devirtualize a property access --- causes asymptotic inefficiencies. Therefore, Swift's native conservative access pattern also includes a third accessor which permits direct access to storage when possible. This accessor is called materializeForSet.

materializeForSet receives an extra argument, which is an uninitialized buffer of the value type, and it returns a pointer and a flag. When it can provide direct access to storage for the entity, it constructs a pointer to the storage and returns false. When it can't, it performs a full-value load into the buffer and returns true. The caller performs the modification in-place on the returned pointer and then, if the flag is true, passes the value to the setter.

The overall effect is to enable direct storage access as a dynamic optimization when it's impossible as a static optimization.

For now, materializeForSet is always automatically generated based on whether the entity is implemented with a computed setter. It is possible to imagine data structures that would benefit from having this lifted to a user-definable feature; for example, a data structure which sometimes holds its elements in memory but sometimes does not.

materializeForSet can provide direct access whenever an address for the storage can be derived. This includes when the storage is implemented with a mutableAddress accessor, as covered below. Observing accessors currently prevent materializeForSet from offering direct access; that's fixable for didSet using a slightly different code pattern, but willSet is an inherent obstacle.

Independent of any of the other optimizations discussed in this whitepaper, materializeForSet had the potential to immediately optimize the extremely important case of mutations to COW values in un-devirtualized class properties, with fairly minimal risk. Therefore, materializeForSet was implemented first, and it shipped in Xcode 6.1.

Direct access at computed addresses

What entities can be directly accessed in memory? Non-computed variables make up an extremely important set of cases; Swift has enough built-in knowledge to know that it can provide direct access to them. But there are a number of other important cases where the address of an entity is not built-in to the compiler, but where direct access is nonetheless possible. For example, elements of a simple array always have independent storage in memory. Most benchmarks on arrays would profit from being able to modify array elements in-place.

There's a long chain of proposals in this area, many of which are refinement on previous proposals. None of these proposals has yet shipped in Xcode.

Addressors

For something like a simple array (or any similar structure, like a deque) which is always backed by a buffer, it makes sense for the implementor to simply define accessors which return the address of the element. Such accessors are called addressors, and there are two: address and mutableAddress.

The conservative access pattern can be generated very easily from this: the getter calls address and loads from it, the setter calls mutableAddress and stores to it, and materializeForSet provides direct access to the address returned from mutableAddress.

If the entity has type T, then address returns an UnsafePointer<T> and mutableAddress returns an UnsafeMutablePointer<T>. This means that the formal type of the entity must exactly match the formal type of the storage. Thus, the standard subscript on Dictionary<K,V> cannot be implemented using addressors, because the formal type of the entity is V?, but the backing storage holds a V. (And this is in keeping with user expectations about the data structure: assigning nil at a key is supposed to erase any existing entry there, not create a new entry to hold nil.)

This simple addressor proposal was the first prong of our efforts to optimize array element access. Unfortunately, while it is useful for several other types (such as ContiguousArray and UnsafeMutablePointer), it is not flexible enough for the Array type.

Mixed addressors

Swift's chief Array type is only a simple array when it is not interacting with Objective-C. Type bridging requires Array to be able to store an immutable NSArray instance, and the NSArray interface does not expose the details of how it stores elements. An NSArray is even permitted to dynamically generate its values in its objectAtIndex: method. And it would be absurd for Array to perform a structural copy during a load just to make non-mutating accesses more efficient! So the load access pattern for Array's subscript declaration must use a getter.

Fortunately, this requirement does not preclude using an addressor for mutating accesses. Mutations to Array always transition the array to a unique contiguous buffer representation as their first step. This means that the subscript operator can sensibly return an address when it's used for the purposes of mutation: in other words, exactly when mutableAddress would be invoked.

Therefore, the second prong of our efforts to optimize array element access was to allow entities to be implemented with the combination of a get accessor and a mutableAddress accessor. This is straightforward in the user model, where it simply means lifting a restriction. It's more complex behind the scenes because it broke what was previously a clean conceptual division between "physical" and "logical" l-values.

Mixed addressors have now been adopted by Array to great success. As expected, they substantially improved performance mutating COW array elements. But they also fix an important instance of subobject clobbering, because modifications to different subobjects (notably, different elements of the same array) can occur simultaneously by simply projecting out their addresses in the unique buffer. For example, this means that it's possible to simply swap two elements of an array directly:

swap(&array[i], &array[j])

// Expanded:
array.transitionToUniquelyReferenced()
let address_i = array.buffer.storage + i
array.transitionToUniquelyReferenced()
let address_j = array.buffer.storage + j
swap(address_i, address_j)

Mixed addressors weren't completely implemented until very close to the Xcode 6.1 deadline, and they changed code-generation patterns enough to break a number of important array-specific optimizations. Therefore, the team sensibly decided that they were too risky for that release, and that there wasn't enough benefit from other applications to justify including any of the addressor work.

In a way, that was a fortunate decision, because the naive version of addressors implemented so far in Swift creates a safety hole which would otherwise have been exposed to users.

Memory unsafety of addressors

The semantics and memory safety of operations on COW types rely on a pair of simple rules:

  • A non-mutating operation must own a reference to the buffer for the full course of the read.
  • A mutating operation must own a unique reference to the buffer for the full course of the mutation.

Both rules tend to be naturally satisfied by the way that operations are organized into methods. A value must own a reference to its buffer at the moment that a method is invoked on it. A mutating operation immediately transitions the buffer to a unique reference, performing a structural copy if necessary. This reference will remain valid for the rest of the method as long as the method is atomic: as long as it does not synchronously invoke arbitrary user code.

(This is a single-threaded notion of atomicity. A second thread which modifies the value simultaneously can clearly invalidate the assumption. But that would necessarily be a data race, and the language design team is willing to say that such races have fully undefined behavior, and arbitrary consequences like memory corruption and crashes are acceptable in their wake.)

However, addressors are not atomic in this way: they return an address to the caller, which may then interleave arbitrary code before completing the operation. This can present the opportunity for corruption if the interleaved code modifies the original value. Consider the following code:

func operate(value: inout Int, count: Int) { ... }

var array: [Int] = [1,2,3,4]
operate(&array[0], { array = []; return 0 }())

The dynamic sequence of operations performed here will expand like so:

var array: [Int] = [1,2,3,4]
let address = array.subscript.mutableAddress(0)
array = []
operate(address, 0)

The assignment to array within the closure will release the buffer containing address, thus passing operate a dangling pointer.

Nor can this be fixed with a purely local analysis; consider:

class C { var array: [Int] }
let global_C = C()

func assign(value: inout Int) {
  C.array = []
  value = 0
}

assign(&global_C.array[0])

Fixing the memory safety hole

Conceptually, the correct fix is to guarantee that the rules are satisfied by ensuring that the buffer is retained for the duration of the operation. Any interleaving modifications will then see a non-uniquely-referenced buffer and perform a structural copy:

// Project the array element.
let address = array.subscript.mutableAddress(0)

// Remember the new buffer value and keep it retained.
let newArrayBuffer = array.buffer
retain(newArrayBuffer)

// Reassign the variable.
release(array.buffer)
array.buffer = ...

// Perform the mutation.  These changes will be silently lost, but
// they at least won't be using deallocated memory.
operate(address, 0)

// Release the "new" buffer.
release(newArrayBuffer)

Note that this still leaves a semantic hole if the original value is copied in interleaving code before the modification, because the subsequent modification will be reflected in the copy:

// Project the array element.
let address = array.subscript.mutableAddress(0)

// Remember the new buffer value and keep it retained.
let newArrayBuffer = array.buffer
retain(newArrayBuffer)

// Copy the value.  Note that arrayCopy uses the same buffer that
// 'address' points into.
let arrayCopy = array
retain(arrayCopy.buffer)

// Perform the mutation.
operate(address, 0)

// Release the "new" buffer.
release(newArrayBuffer)

This might be unexpected behavior, but the language team is willing to accept unexpected behavior for this code. What's non-negotiable is breaking memory safety.

Unfortunately, applying this fix naively reintroduces the problem of subobject clobbering: since a modification of one subobject immediately retains a buffer that's global to the entire value, an interleaved modification of a different subobject will see a non-unique buffer reference and therefore perform a structural copy. The modifications to the first subobject will therefore be silently lost.

Unlike the interleaving copy case, this is seen as unacceptable. Notably, it breaks swapping two array elements:

// Original:
swap(&array[i], &array[j])

// Expanded:

// Project array[i].
array.transitionToUniquelyReferenced()
let address_i = array.buffer.storage + i
let newArrayBuffer_i = array.buffer
retain(newArrayBuffer_i)

// Project array[j].  Note that this transition is guaranteed
// to have to do a structural copy.
array.transitionToUniquelyReferenced()
let address_j = array.buffer.storage + j
let newArrayBuffer_j = array.buffer
retain(newArrayBuffer_j)

// Perform the mutations.
swap(address_i, address_j)

// Balance out the retains.
release(newArrayBuffer_j)
release(newArrayBuffer_i)

get- and setForMutation

Some collections need finer-grained control over the entire mutation process. For instance, to support divide-and-conquer algorithms using slices, sliceable collections must "pin" and "unpin" their buffers while a slice is being mutated to grant permission for the slice to mutate the collection in-place while sharing ownership. This flexibility can be exposed by a pair of accessors that are called before and after a mutation. The "get" stage produces both the value to mutate and a state value (whose type must be declared) to forward to the "set" stage. A pinning accessor can then look something like this:

extension Array {
  subscript(range: Range<Int>) -> Slice<Element> {
    // `getForMutation` must declare its return value, a pair of both
    // the value to mutate and a state value that is passed to
    // `setForMutation`.
    getForMutation() -> (Slice<Element>, PinToken) {
      let slice = _makeSlice(range)
      let pinToken = _pin()
      return (slice, pinToken)
    }

    // `setForMutation` receives two arguments--the result of the
    // mutation to write back, and the state value returned by
    // `getForMutation`.
    setForMutation(slice, pinToken) {
      _unpin(pinToken)
      _writeSlice(slice, backToRange: range)
    }
  }
}

getForMutation and setForMutation must appear as a pair; neither one is valid on its own. A get and set accessor should also still be provided for simple read and write operations. When the compiler has visibility that storage is implemented in terms of getForMutation and setForMutation, it lowers a mutable projection using those accessors as follows:

// A mutation like this (assume `reverse` is a mutating method):
array[0...99].reverse()
// Decomposes to:
let index = 0...99
(var slice, let state) = array.`subscript.getForMutation`(index)
slice.reverse()
array.`subscript.setForMutation`(index, slice, state)

To support the conservative access pattern, a materializeForSet accessor can be generated from getForMutation and setForMutation in an obvious fashion: perform getForMutation and store the state result in its scratch space, and return a callback that loads the state and hands it off to setForMutation.

The beacon of hope for a user-friendly future: Inversion of control

Addressors and {get,set}ForMutation expose important optimizations to the standard library, but are undeniably fiddly and unsafe constructs to expose to users. A more natural model would be to recognize that a compound mutation is a composition of nested scopes, and express it in the language that way. A strawman model might look something like this:

var foo: T {
  get { return getValue() }
  set { setValue(newValue) }

  // Perform a full in-out mutation. The `next` continuation is of
  // type `(inout T) -> ()` and must be called exactly once
  // with the value to hand off to the nested mutation operation.
  mutate(next) {
    var value = getValue()
    next(&value)
    setValue(value)
  }
}

This presents a natural model for expressing the lifetime extension concerns of addressors, and the state maintenance necessary for pinning getForMutation accessors:

// An addressing mutator
mutate(next) {
  withUnsafePointer(&resource) {
    next(&$0.memory)
  }
}

// A pinning mutator
mutate(next) {
  var slice = makeSlice()
  let token = pin()
  next(&slice)
  unpin(token)
  writeBackSlice(slice)
}

For various semantic and implementation efficiency reasons, we don't want to literally implement every access as a nesting of closures like this. Doing so would allow for semantic surprises (a mutate() operation never invoking its continuation, or doing so multiple times would be disastrous), and would interfere with the ability for inout and mutating functions to throw or otherwise nonlocally exit. However, we could present this model using inversion of control, similar to Python generators or async-await. A mutate operation could yield the inout reference to its inner value, and the compiler could enforce that a yield occurs exactly once on every control flow path:

// An addressing, yielding mutator
mutate {
  withUnsafePointer(&resource) {
    yield &$0.memory
  }
}

// A pinning mutator
mutate {
  var slice = makeSlice()
  let token = pin()
  yield &slice
  unpin(token)
  writeBackSlice(slice)
}

This obviously requires more implementation infrastructure than we currently have, and raises language and library design issues (in particular, lifetime-extending combinators like withUnsafePointer would need either a reyields kind of decoration, or to become macros), but represents a promising path toward exposing the full power of the accessor model to users in an elegant way.

Acceptability

This whitepaper has mentioned several times that the language team is prepared to accept such-and-such behavior but not prepared to accept some other kind of behavior. Clearly, there is a policy at work. What is it?

General philosophy

For any given language problem, a perfect solution would be one which:

  • guarantees that all operations complete without crashing or corrupting the program state,
  • guarantees that all operations produce results according to consistent, reliable, and intuitive rules,
  • does not limit or require complex interactions with the remainder of the language, and
  • imposes no performance cost.

These goals are, however, not all simultaneously achievable, and different languages reach different balances. Swift's particular philosophy is as follows:

  • The language should be as dynamically safe as possible. Straightforward uses of ordinary language features may cause dynamic failure, but the should never corrupt the program state. Any unsafe language or library features (other than simply calling into C code) should be explicitly labeled as unsafe.

    A dynamic failure should mean that the program reliably halts, ideally with a message clearly describing the source of the failure. In the future, the language may allow for emergency recovery from such failures.

  • The language should sit on top of C, relying only on a relatively unobtrusive runtime. Accordingly, the language's interactions with C-based technologies should be efficient and obvious.
  • The language should allow a static compiler to produce efficient code without dynamic instrumentation. Accordingly, static analysis should only be blocked by incomplete information when the code uses an obviously abstract language feature (such as calling a class method or an unknown function), and the language should provide tools to allow programmers to limit such cases.

    (Dynamic instrumentation can, of course, still help, but it shouldn't be required for excellent performance.)

General solutions

A language generally has six tools for dealing with code it considers undesirable. Some of this terminology is taken from existing standards, others not.

  • The language may nonetheless take steps to ensure that the code executes with a reliable result. Such code is said to have guaranteed behavior.
  • The language may report the code as erroneous before it executes. Such code is said to be ill formed.
  • The language may reliably report the code as having performed an illegal operation when it executes. Such code is said to be asserting or aborting.
  • The language may allow the code to produce an arbitrary-but-sound result. Such code is said to have unspecified behavior or to have produced an unspecified value.
  • The language may allow the code to produce an unsound result which will result in another of these behaviors, but only if used. Such code is said to have produced a trap value.
  • The language may declare the code to be completely outside of the guarantees of the language. Such code is said to have undefined behavior.

In keeping with its design philosophy, Swift has generally limited itself to the first four solutions, with two significant exceptions.

The first exception is that Swift provides several explicitly unsafe language and library features, such as UnsafePointer<T> and unowned(unsafe). The use of these features is generally subject to undefined behavior rules.

The second exception is that Swift does not make any guarantees about programs in the presence of race conditions. It is extremely difficult to make even weak statements about the behavior of a program with a race condition without either:

  • heavily restricting shared mutable state on a language level, which would require invasive changes to how the language interacts with C;
  • forcing implicit synchronization when making any change to potentially shared memory, which would cripple performance and greatly complicate library implementation; or
  • using a garbage collector to manage all accessible memory, which would impose a very large burden on almost all of Swift's language goals.

Therefore, Swift does surrender safety in the presence of races.

Acceptability conditions for storage accesses

Storage access involves a tension between four goals:

  • Preserving all changes when making simultaneous modifications to distinct subobjects; in other words, avoiding subobject clobbering
  • Performing a predictable and intuitive sequence of operations when modifying storage that's implemented with a getter and setter
  • Avoiding unnecessary copies of a value during a modification, especially when this forces a structural copy of a COW value
  • Avoiding memory safety holes when accessing storage that's been implemented with memory.

Reprojection is good at preserving changes, but it introduces extra copies, and it's less intuitive about how many times getters and setters will be called. There may be a place for it anyway, if we're willing to accept the extra conceptual complexity for computed storage, but it's not a reasonable primary basis for optimizing the performance of storage backed by memory.

Solutions permitting in-place modification are more efficient, but they do have the inherent disadvantage of having to vend the address of a value before arbitrary interleaving code. Even if the address remains valid, and the solution to that avoids subobject clobbering, there's an unavoidable issue that the write can be lost because the address became dissociated from the storage. For example, if your code passes &array[i] to a function, you might plausibly argue that changes to that argument should show up in the ith element of array even if you completely reassign array. Reprojection could make this work, but in-place solutions cannot efficiently do so. So, for any in-place solution to be acceptable, there does need to be some rule specifying when it's okay to "lose track" of a change.

Furthermore, the basic behavior of COW means that it's possible to copy an array with an element under modification and end up sharing the same buffer, so that the modification will be reflected in a value that was technically copied beforehand. For example:

var array = [1,2,3]
var oldArray : [Int] = []

// This function copies array before modifying it, but because that
// copy is of a value undergoing modification, the copy will use
// the same buffer and therefore observe updates to the element.
func foo(element: inout Int) {
  oldArray = array
  element = 4
}

// Therefore, oldArray[2] will be 4 after this call.
foo(&array[2])

Nor can this be fixed by temporarily moving the modified array aside, because that would prevent simultaneous modifications to different elements (and, in fact, likely cause them to assert). So the rule will also have to allow this.

However, both of these possibilities already come up in the design of both the library and the optimizer. The optimizer makes a number of assumptions about aliasing; for example, the general rule is that storage bound to an inout parameter cannot be accessed through other paths, and while the optimizer is not permitted to compromise memory safety, it is permitted to introduce exactly this kind of unexpected behavior where aliasing accesses may or may not the storage as a consistent entity.

Formal accesses

That rule leads to an interesting generalization. Every modification of storage occurs during a formal access (FA) to that storage. An FA is also associated with zero or more designated storage names (DSNs), which are inout arguments in particular execution records. An FA arises from an l-value expression, and its duration and DSN set depend on how the l-value is used:

  • An l-value which is simply loaded from creates an instantaneous FA at the time of the load. The DSN set is empty.

    Example:

    foo(array)
    // instantaneous FA reading array
  • An l-value which is assigned to with = creates an instantaneous FA at the time of the primitive assignment. The DSN set is empty.

    Example:

    array = []
    // instantaneous FA assigning array

    Note that the primitive assignment strictly follows the evaluation of both the l-value and r-value expressions of the assignment. For example, the following code:

    // object is a variable of class type
    object.array = object.array + [1,2,3]

    produces this sequence of formal accesses:

    // instantaneous FA reading object (in the left-hand side)
    // instantaneous FA reading object (in the right-hand side)
    // instantaneous FA reading object.array (in the right-hand side)
    // evaluation of [1,2,3]
    // evaluation of +
    // instantaneous FA assigning object.array
  • An l-value which is passed as an inout argument to a call creates an FA beginning immediately before the call and ending immediately after the call. (This includes calls where an argument is implicitly passed inout, such as to mutating methods or user-defined assignment operators such as += or ++.) The DSN set contains the inout argument within the call.

    Example:

    func swap<T>(lhs: inout T, rhs: inout T) {}
    
    // object is a variable of class type
    swap(&leftObject.array, &rightObject.array)

    This results in the following sequence of formal accesses:

    // instantaneous FA reading leftObject
    // instantaneous FA reading rightObject
    // begin FA for inout argument leftObject.array (DSN={lhs})
    // begin FA for inout argument rightObject.array (DSN={rhs})
    // evaluation of swap
    // end FA for inout argument rightObject.array
    // end FA for inout argument leftObject.array
  • An l-value which is used as the base of a member storage access begins an FA whose duration is the same as the duration of the FA for the subobject l-value. The DSN set is empty.

    Example:

    swap(&leftObject.array[i], &rightObject.array[j])

    This results in the following sequence of formal accesses:

    // instantaneous FA reading leftObject
    // instantaneous FA reading i
    // instantaneous FA reading rightObject
    // instantaneous FA reading j
    // begin FA for subobject base leftObject.array (DSN={})
    // begin FA for inout argument leftObject.array[i] (DSN={lhs})
    // begin FA for subobject base rightObject.array (DSN={})
    // begin FA for inout argument rightObject.array[j] (DSN={rhs})
    // evaluation of swap
    // end FA for subobject base rightObject.array[j]
    // end FA for inout argument rightObject.array
    // end FA for subobject base leftObject.array[i]
    // end FA for inout argument leftObject.array
  • An l-value which is the base of an ! operator begins an FA whose duration is the same the duration of the FA for the resulting l-value. The DSN set is empty.

    Example:

    // left is a variable of type T
    // right is a variable of type T?
    swap(&left, &right!)

    This results in the following sequence of formal accesses:

    // begin FA for inout argument left (DSN={lhs})
    // begin FA for ! operand right (DSN={})
    // begin FA for inout argument right! (DSN={rhs})
    // evaluation of swap
    // end FA for inout argument right!
    // end FA for ! operand right
    // end FA for inout argument left
  • An l-value which is the base of a ? operator begins an FA whose duration begins during the formal evaluation of the l-value and ends either immediately (if the operand was nil) or at the end of the duration of the FA for the resulting l-value. In either case, the DSN set is empty.

    Example:

    // left is a variable of optional struct type
    // right is a variable of type Int
    left?.member += right

    This results in the following sequence of formal accesses, assuming that left contains a value:

    // begin FA for ? operand left (DSN={})
    // instantaneous FA reading right (DSN={})
    // begin FA for inout argument left?.member (DSN={lhs})
    // evaluation of +=
    // end FA for inout argument left?.member
    // end FA for ? operand left

The FAs for all inout arguments to a call begin simultaneously at a point strictly following the evaluation of all the argument expressions. For example, in the call foo(&array, array), the evaluation of the second argument produces a defined value, because the FA for the first argument does not begin until after all the arguments are formally evaluated. No code should actually be emitted during the formal evaluation of &array, but for an expression like someClassReference.someArray[i], the class r-value and index expressions would be fully evaluated at that time, and then the l-value would be kept abstract until the FA begins. Note that this requires changes in SILGen's current code generation patterns.

The FA rule for the chaining operator ? is an exception to the otherwise-simple intuition that formal accesses begin immediately before the modification begins. This is necessary because the evaluation rules for ? may cause arbitrary computation to be short-circuited, and therefore the operand must be accessed during the formal evaluation of the l-value. There were three options here:

  • Abandon short-circuiting for assignments to optional l-values. This is a very high price; short-circuiting fits into user intuitions about the behavior of the chaining operator, and it can actually be quite awkward to replicate with explicit accesses.
  • Short-circuit using an instantaneous formal access, then start a separate formal access before the actual modification. In other words, evaluation of X += Y would proceed by first determining whether X exists (capturing the results of any r-value components), discarding any projection information derived from that, evaluating Y, reprojecting X again (using the saved r-value components and checking again for whether the l-value exists), and finally calling the += operator.

    If X involves any sort of computed storage, the steps required to evaluate this might be... counter-intuitive.

  • Allow the formal access to begin during the formal evaluation of the l-value. This means that code like the following will have unspecified behavior:

    array[i]?.member = deriveNewValueFrom(array)

In the end, I've gone with the third option. The intuitive explanation is that array has to be accessed early in order to continue the evaluation of the l-value. I think that's comprehensible to users, even if it's not immediately obvious.

Disjoint and non-disjoint formal accesses

I'm almost ready to state the core rule about formal accesses, but first I need to build up a few more definitions.

An abstract storage location (ASL) is:

  • a global variable declaration;
  • an inout parameter declaration, along with a reference to a specific execution record for that function;
  • a local variable declaration, along with a reference to a specific execution record for that declaration statement;
  • a static/class property declaration, along with a type having that property;
  • a struct/enum instance property declaration, along with an ASL for the base;
  • a struct/enum subscript declaration, along with a concrete index value and an ASL for the base;
  • a class instance property declaration, along with an instance of that class; or
  • a class instance subscript declaration, along with a concrete index value and an instance of that class.

Two abstract storage locations may be said to overlap. Overlap corresponds to the imprecise intuition that a modification of one location directly alters the value of another location. Overlap is an "open" property of the language: every new declaration may introduce its own overlap behavior. However, the language and library make certain assertions about the overlap of some locations:

  • An inout parameter declaration overlaps exactly the set of ASLs overlapped by the ASL which was passed as an argument.
  • If two ASLs are both implemented with memory, then they overlap only if they have the same kind in the above list and the corresponding data match:

    • execution records must represent the same execution
    • types must be the same
    • class instances must be the same
    • ASLs must overlap
  • For the purposes of the above rule, the subscript of a standard library array type is implemented with memory, and the two indexes match if they have the same integer value.
  • For the purposes of the above rule, the subscript of a standard library dictionary type is implemented with memory, and the two indexes match if they compare equal with ==.

Because this definition is open, it is impossible to completely statically or dynamically decided it. However, it would still be possible to write a dynamic analysis which decided it for common location kinds. Such a tool would be useful as part of, say, an ASan-like dynamic tool to diagnose violations of the unspecified-behavior rule below.

The overlap rule is vague about computed storage partly because computed storage can have non-obvious aliasing behavior and partly because the subobject clobbering caused by the full-value accesses required by computed storage can introduce unexpected results that can be reasonably glossed as unspecified values.

This notion of abstract storage location overlap can be applied to formal accesses as well. Two FAs x and y are said to be disjoint if:

  • they refer to non-overlapping abstract storage locations or
  • they are the base FAs of two disjoint member storage accesses x.a and y.b.

Given these definitions, the core unspecified-behavior rule is:

If two non-disjoint FAs have intersecting durations, and neither FA is derived from a DSN for the other, then the program has unspecified behavior in the following way: if the second FA is a load, it yields an unspecified value; otherwise, both FAs store an unspecified value in the storage.

Note that you cannot have two loads with intersecting durations, because the FAs for loads are instantaneous.

Non-overlapping subobject accesses make the base accesses disjoint because otherwise code like swap(&a[0], &a[1]) would have unspecified behavior, because the two base FAs are to clearly overlapping locations and have intersecting durations.

Note that the optimizer's aliasing rule falls out from this rule. If storage has been bound as an inout argument, accesses to it through any path not derived from the inout argument will start a new FA for overlapping storage, the duration of which will necessarily intersect duration with that of the FA through which the inout argument was bound, causing unspecified behavior. If the inout argument is forwarded to another call, that will start a new FA which is validly based on a DSN of the first; but an attempt to modify the storage through the first inout argument while the second call is active will create a third FA not based on the DSN from the second inout call, causing a conflict there. Therefore a function may assume that it can see all accesses to the storage bound to an inout argument.

If you didn't catch all that...

That may have been a somewhat intense description, so here's a simple summary of the rule being proposed.

If storage is passed to an inout argument, then any other simultaneous attempt to read or write to that storage, including to the storage containing it, will have unspecified behavior. Reads from it may see partially-updated values, or even values which will change as modifications are made to the original storage; and writes may be clobbered or simply disappear.

But this only applies during the call with the inout argument: the evaluation of other arguments to the call will not be interfered with, and as soon as the call ends, all these modifications will resolve back to a quiescent state.

And this unspecified behavior has limits. The storage may end up with an unexpected value, with only a subset of the writes made to it, and copies from it may unexpectedly reflect modifications made after they were copied. However, the program will otherwise remain in a consistent and uncorrupted state. This means that execution will be able to continue apace as long as these unexpected values don't trip up some higher-level invariant.

Tracking formal accesses

Okay, now that I've analyzed this to death, it's time to make a concrete proposal about the implementation.

As discussed above, the safety hole with addressors can be fixed by always retaining the buffer which keeps the address valid. Assuming that other uses of the buffer follow the general copy-on-write pattern, this retain will prevent structural changes to the buffer while the address is in use.

But, as I also discussed above, this introduces two problems:

Copies during modification

Copying a COW aggregate value always shares the same buffer that was stored there at the time of the copy; there is no uniqueness check done as part of the copy. Changes to subobjects will then be instantly reflected in the "copy" as they are made to the original. The structure of the copy will stay the same, but the values of its subobjects will appear to spontaneously change.

I want to say that this behavior is acceptable according to the formal-access rule I laid out above. How does that reasoning work?

First, I need to establish what kind of behavior is at work here. It clearly isn't guaranteed behavior: copies of COW values are normally expected to be independent. The code wasn't rejected by the compiler, nor did it dynamically assert; it simply seems to misbehave. But there are limits to the misbehavior:

  • By general COW rules, there's no way to change the structure of an existing buffer unless the retain count is 1. For the purposes of this analysis, that means that, as long as the retain count is above 1, there's no way to invalidate the address returned by the addressor.
  • The buffer will be retained for as long as the returned address is being modified. This retain is independent of any storage which might hold the aggregate value (and thus also retain the buffer).
  • Because of this retain, the only way for the retain count to drop to 1 is for no storage to continue to refer to the buffer.
  • But if no storage refers to the buffer, there is no way to initiate an operation which would change the buffer structure.

Thus the address will remain valid, and there's no danger of memory corruption. The only thing is that the program no longer makes useful guarantees about the value of the copied aggregate. In other words, the copy yielded an unspecified value.

The formal-access rule allows loads from storage to yield an unspecified value if there's another formal access to that storage in play and the load is (1) not from an l-value derived from a name in the other FA's DSN set and (2) not from a non-overlapping subobject. Are these conditions true?

Recall that an addressor is invoked for an l-value of the form:

base.pointee

or:

base[i]

Both cases involve a formal access to the storage base as the base of a subobject formal access. This kind of formal access always has an empty DSN set, regardless of how the subobject is used. A COW mutable addressor will always ensure that the buffer is uniquely referenced before returning, so the only way that a value containing that buffer can be copied is if the load is a non-subobject access to base. Therefore, there are two simultaneous formal accesses to the same storage, and the load is not from an l-value derived from the modification's DSN set (which is empty), nor is it for a non-overlapping subobject. So the formal-access rule applies, and an unspecified value is an acceptable result.

The implementation requirement here, then, is simply that the addressor must be called, and the buffer retained, within the duration of the formal access. In other words, the addressor must only be called immediately prior to the call, rather than at the time of the formal evaluation of the l-value expression.

What would happen if there were a simultaneous load from a non-overlapping subobject? Accessing the subobject might cause a brief copy of base, but only for the duration of copying the subobject. If the subobject does not overlap the subobject which was projected out for the addressor, then this is harmless, because the addressor will not allow modifications to those subobjects; there might be other simultaneous formal accesses which do conflict, but these two do not. If the subobject does overlap, then a recursive analysis must be applied; but note that the exception to the formal-access rule will only apply if non-overlapping subobjects were projected out from both formal accesses. Otherwise, it will be acceptable for the access to the overlapping subobject to yield an unspecified value.

Avoiding subobject clobbering during parallel modification

The other problem is that the retain will prevent simultaneous changes to the same buffer. The second change will cause a structural copy, and the first address will end up modifying a buffer which is no longer referenced: in other words, the program will observe subobject clobbering. A similar analysis to the one from the last section suggests that this can be described as unspecified behavior.

Unfortunately, this unspecified behavior is unwanted: it violates the guarantees of the formal-access rule as I laid it out above, because it occurs even if you have formal accesses to two non-overlapping subobjects. So something does need to be done here.

One simple answer is to dynamically track whether a COW buffer is currently undergoing a non-structural mutation. I'll call this NSM tracking, and I'll call buffers which are undergoing non-structural mutations NSM-active.

The general rules of COW say that mutating operations must ensure that their buffer is uniquely referenced before performing the modification. NSM tracking works by having non-structural mutations perform a weaker check: the buffer must be either uniquely referenced or be NSM-active. If the non-structural mutation allows arbitrary code to run between the start of the mutation and the end --- as an addressor does --- it must both retain the buffer and flag it as NSM-active for the entire duration.

Because the retain still occurs, and because any structural changes to the buffer that might invalidate the addresses of subobjects are still blocked by that retain, all of the earlier analysis about the memory safety of simultaneous accesses still applies. The only change is that simultaneous non-structural modifications, as would be created by simultaneous formal accesses to subobjects, will now be able to occur on a single buffer.

A set of simultaneous formal accesses on a single thread follows a natural stack protocol, or can be made to do so with straightforward SILGen and SIL optimizer consideration. Therefore, the runtime can track whether a buffer is NSM-active on a thread using a single bit, which nested modifications can be told not to clear. Call this the NSM bit. Ignoring multithreading considerations for a moment, since the NSM bit is only ever set at the same as a retain and only ever cleared at the same time as a release, it makes sense to pack this into the strong reference count. There is no need to support this operation on non-Swift objects. The runtime should provide three new functions:

  • A function to test whether an object is either uniquely referenced or NSM-active. Call this swift_isUniquelyReferencedForNSM.
  • A function to perform the above test and, if the test passes and the NSM bit is not set, atomically retain the object and set the NSM bit. It should return both the result of the test and an object to later set as NSM-inactive. That object will be nil if the test failed or the NSM bit was already set. Call this swift_tryRetainForNSM.
  • A function to atomically clear the NSM bit and release the object. Call this swift_releaseForNSM.

These operations should also be reflected in SIL.

Concurrent modifications and the non-structural modification bit

What about concurrency? Two concurrent non-structural modifications could race to set the NSM bit, and then the winning thread could clear it before the other thread's modification is complete. This could cause memory-unsafe behavior, since the losing thread would be modifying the object through an address while not retaining the value.

The major question here is whether this is a significant objection. It's accepted that race conditions have undefined behavior. Is such code inherently racy?

The answer appears to be "no", and that it is possible to write code which concurrently writes to existing non-overlapping elements of a COW aggregate without causing races; but that such code is extremely fraught, and moreover it is extremely fraught regardless of whether NSM-activeness is tracked with a single bit or a wider count. Consider:

  • If the shared aggregate value is ever non-uniquely referenced, two threads concurrently modifying it will race to unique the array. This unavoidably has undefined behavior, because uniquing the array requires the previous value to eventually be released, and a race may cause an over-release.
  • Assume that it's possible to guarantee that the aggregate value's buffer is uniquely referenced before any threads concurrently access it. Now, all of the threads are performing different concurrent accesses.
    • If any of the accesses is a structural modification, there will be a race to re-unique the buffer.
    • If all of the accesses are non-structural modifications, then there will be no races as long as the retain-and-set and release-and-clear operations are atomic: when starting any particular operation, the buffer will always either be uniquely referenced or have the bit set.
    • If any of the accesses is a read, and that read does not occur during a non-structural modification, then the buffer may briefly become non-uniquely referenced and there will be a race from concurrent modifications to re-unique it.
    • If any of the accesses is a read, and that read occurs during a non-structural modification, and the optimizer does not re-order the read's retain/release around the retainForNSM/releaseForNSM operations, then it matters how NSM-activeness is tracked.

      If there is complete tracking (i.e. a count, not just a single bit), the retain for the read will only occur while the buffer is flagged as NSM-active, and so it will have no effect.

      If there is incomplete tracking (i.e. just a single NSM bit), then there is a potential for undefined behavior. Suppose two threads race to set the NSM bit. The loser then initiates a read and retains the buffer. Before the loser releases the buffer, the winner clears the NSM bit. Now another thread might see that the buffer is non-uniquely referenced and not NSM-active, and so it will attempt to unique the buffer.

      It is probably unreasonable to require the optimizer to never reorder ordinary retains and releases past retainForNSM and releaseForNSM operations.

More importantly, the use case here (many threads concurrently accessing different elements of a shared data structure) just inherently doesn't really work well with a COW data structure. Even if the library were able to make enough guarantees to ensure that, with the right pattern of accesses, there would never be a structural copy of the aggregate, it would still be extremely inefficient, because all of the threads would be competing for atomic access to the strong reference count.

In short, I think it's reasonable for the library to say that programs which want to do this should always use a type with reference semantics. Therefore, it's reasonable to ignore concurrent accesses when deciding how to best track whether an aggregate is undergoing non-structural modification. This removes the only objection I can see to tracking this with a single NSM bit.

Code generation patterns

The signatures and access patterns for addressors will need to change in order to ensure memory-safety.

mutableAddress currently returns an UnsafeMutablePointer; it will need to return (Builtin.NativeObject?, UnsafeMutablePointer). The owner pointer must be a native object; we cannot efficiently support either uniqueness checking or the NSM bit on non-Swift objects. SILGen will mark that the address depends on the owner reference and push a cleanup to releaseForNSM it.

address currently returns an UnsafePointer; it will need to return (Builtin.NativeObject?, UnsafePointer). I do not currently see a reason to allow non-Swift owners, but the model doesn't depend on that. SILGen will mark that the address depends on the owner reference and push a cleanup to release it.

In order to support ultimately calling an addressor in the conservative access path, materializeForSet must also return an owner reference. Since materializeForSet calls mutableAddress in this case, SILGen will follow that pattern for calls. SILGen will also assume that the need to perform a releaseForNSM is exclusive with the need to call the setter.

Mutating operations on COW types will now have two different paths for making a buffer mutable and unique: one for structural mutations and another for non-structural mutations. I expect that this will require separate semantics annotations, and the optimizer will have to recognize both.

releaseForNSM operations will not be reorderable unless the optimizer can prove that the objects are distinct.

Summary of proposal and plan

Let me summarize what I'm proposing:

  • Swift's core approach to optimizing accesses should be based around providing direct access to memory, either statically or dynamically. In other words, Swift should adopt addressors on core data structures as much as possible.
  • Swift should fix the current memory hole with addressors by retaining for the duration of the access and, for modifications, flagging the buffer as NSM-active. The implementation plan follows:
    • The runtime implements the NSM-bit and its entrypoints.
    • SIL provides operations for manipulating and querying the NSM bit. IRGen implements these operations using the runtime functions. Builtins are exposed.
    • The standard library changes data structures to do different uniquing for structural and non-structural modifications. This patch is not yet committed.
    • The optimizer reacts to the above. When both are settled, they can be committed.
    • SILGen changes the emission patterns for l-values so that addresses and writebacks are live only during the formal access.
    • Sema changes the signature of address, mutableAddress, and materializeForSet to return an optional owner reference. Sema changes materializeForSet synthesis to return the owner correctly. SILGen implements the desired code patterns.

      The standard library changes its addressor implementations to continue to compile, but for staging purposes, it only uses nil owners.

    • The standard library changes addressor implementations to use meaningful owners. This patch is not yet committed.
    • The optimizer reacts to the above. When both are settled, they can be committed.