Permalink
Branch: master
Find file Copy path
647 lines (533 sloc) 27.7 KB

SIMD Vectors

Introduction

This proposal would expose a common subset of operations on the SIMD types supported by most processors in the standard library. It is based on Apple's <simd/simd.h> module, which is used throughout Apple's platforms as the common currency type for fixed-size vectors and matrices. It is not a complete re-implementation; rather it provides the low-level support needed to import any such library, and tries to make a number of things much nicer in Swift than they are in C or C++.

Preliminary Swift-evolution discussion.

Motivation

Task 1: SIMD programming

Essentially every modern CPU has support for SIMD ("Single Instruction, Multiple Data") instructions in hardware. Without getting into a long discussion of the architectural details, effective use of these instructions allows 2-10x better performance than is otherwise possible for a large class of data-parallel problems, without incurring the synchronization and data-movement hassles of working with the GPU.

Historically, there have been a number of obstacles to taking advantage of this hardware. Four programming models have been commonly used, all of which the author has considerable experience with:

  • Assembly: this has the advantage that you get exactly the code you want. It has numerous disadvantages--another language to learn, requiring either macro soup or separate implementations for every target (in effectively a different language for each), there are few good learning resources, and you have to slog through all the tedious things that the compiler normally does for you, like allocating registers and getting the calling conventions right (or wrong, in subtle ways that bite your users many years later).
  • Intrinsics: The model historically pushed by hardware vendors. Each architecture has its own set of C "intrinsic" types like __m128 (x86) or int8x8_t (ARM). These are superficially nicer than assembly, but in practice incur nearly all of the downsides, plus a few additional ones. The biggest problem is that these types are often bolted awkwardly onto the language, and so are incompatible with fundamental language or runtime assumptions about size or alignment. Innumerable bugs have been created by people attempting to use vector intrinsic types together with C++ containers, for example. These types move your implementation into a portable language (C or C++), and then immediately remove that portability by being tied to a specific architecture.
  • "Vector class" libraries: Agner's is the most well-known. What these generally bring to the table is support for familiar operators rather than arcane intrinsics (e.g. a + b instead of _mm_addps(a, b)). Most are narrowly focused on a single architecture (almost always x86, so they still don't provide real portability). Apple's <simd/simd.h> is similar to these, but with full support for every architecture that Apple uses, which in practice makes code written against it fairly portable.
  • Autovectorization: works well for simple problems, but no one has yet demonstrated a satisfactory solution for more general tasks. One of several problems is that the underlying machine model assumed by vector code is fundamentally distinct from the underlying machine model that most scalar code is written against, forcing an autovectorizing compiler to map between the two. Historically this was largely done via an ad-hoc set of transformations, which never really worked all that well. Newer approaches show some promise, but explicit manual vectorization is still frequently needed.

A major goal of these new data types in Swift is to provide a better API for vector programming. We want to capture the cross-platform niceties of the <simd/simd.h> module, but also add some new features that were difficult or impossible to do in C; things like enabling generic programming, but also things as simple as a native way to express vector permutations and unaligned loads, or even just conversions between different vector types with the same number of elements (this requires a function call rather than a cast with most vector libraries).

Looking ahead, I expect that we will use these primitives to expose things like iterating over vectors extracted from a collection of scalars, to make explicitly vectorized code much more approachable.

Task 2: geometry primitives

There is a large class of computational tasks (graphics and animation, image processing, AR, VR, computer vision) that want to have 2, 3, and 4 dimensional vector and matrix types. For these applications, these types are just as fundamental as Int and Array are for "normal" programming--they are the foundation upon which everything else is constructed.

These tasks require both elementwise operations, as well as some operations on types as abstract vectors--things like the dot and cross products, vector length, and orientation tests.

Task 3: GPU data structures

Closely related to part 2, short vectors are also essential data types for representing GPU data structures. It's frequently necessary to handle these on the CPU side do do pre/post processing, or data marshalling, and exposing these types in Swift enables that task.

Putting it together

Superficially, the only thing that these tasks have in common is that the fundamental types are "homogeneous aggregates", and that they want to benefit from the SIMD hardware that is available. Two or even three sets of types may seem more appropriate. However, our experience with clients of <simd/simd.h> is that all of our clients use a diverse subset of operations from the module, and that it's difficult to draw clear boundaries of what belongs where. While it may be reasonable to refine the underlying protocols, the types should probably remain unified.

Looking ahead, we would like to enable more sophisticated storage layouts and transforms to and from them, such as SoA - AoS conversion (a fancy way of saying "matrix transpose" or "interleave / deinterleave"; these are all the same thing). Once you start thinking about such transforms, the distinction between these types really goes out the window, because you want to map between things like 16 vectors of 3 floats and 3 vectors of 16 floats.

Proposed solution

This proposal adds types like SIMD2<Int>, SIMD16<UInt16> etc. to the language, with operations whose semantics map to SIMD types available on most architectures. The full set of types are SIMD2<T>, SIMD3<T>, SIMD4<T>, SIMD8<T>, SIMD16<T>, SIMD32<T>, and SIMD64<T>, where T is any type that conforms to the SIMDScalar protocol (discussed in detail below). All of the standard library integer and floating-point types (except for Float80) conform to SIMDScalar.

Detailed design

To support SIMD vectors, a type conforms to the SIMDScalar protocol:

public protocol SIMDScalar {
  associatedtype SIMDMaskScalar : SIMDScalar & FixedWidthInteger & SignedInteger
  associatedtype SIMD2Storage : SIMDStorage where SIMD2Storage.Scalar == Self
  associatedtype SIMD4Storage : SIMDStorage where SIMD4Storage.Scalar == Self
  associatedtype SIMD8Storage : SIMDStorage where SIMD8Storage.Scalar == Self
  associatedtype SIMD16Storage : SIMDStorage where SIMD16Storage.Scalar == Self
  associatedtype SIMD32Storage : SIMDStorage where SIMD32Storage.Scalar == Self
  associatedtype SIMD64Storage : SIMDStorage where SIMD64Storage.Scalar == Self
}

Let's leave aside SIMDMaskScalar for now, and consider the other six associatedtype requirements. They all have the same form; a type that provides storage for a vector of the designated size with the correct scalar type.

The SIMDStorage protocol is extremely simple:

public protocol SIMDStorage {
  /// The type of scalars in the vector space.
  associatedtype Scalar : Hashable

  /// The number of elements in the vector.
  var scalarCount: Int { get }

  /// A vector with zero or the default value in all lanes.
  init()

  /// Element access to the vector.
  subscript(index: Int) -> Scalar { get set }
}

This is the bare minimum functionality upon which everything else is built; the expectation is that types conforming to this protocol will wrap LLVM Builtin vector types, but this is not necessary. Users can conform types to this protocol with any backing storage that can provide the necessary operations, which allows them to benefit from the semantics (and many operations will be autovectorized by the compiler in most cases, even without explicit vector types as storage).

The SIMD protocol refines SIMDStorage, adding some additional requirements:

public protocol SIMD : SIMDStorage, Hashable, CustomStringConvertible {
  associatedtype MaskStorage : SIMD
    where MaskStorage.Scalar : FixedWidthInteger & SignedInteger
}

Let's discuss Masks. SIMDs are Equatable, so they have the == and != operators, but they also provide the "pointwise comparison" .== and .!= operators, which compare the lanes of two vectors, and produce a Mask, which is a vector of boolean values. Each lane of the mask is either true or false, depending on the result of comparing the values in the corresponding lanes. An example:

(swift) let x = SIMD4<Int>(1,2,3,4)
// x : SIMD4<Int> = SIMD4<Int>(1, 2, 3, 4)
(swift) let y = SIMD4<Int>(3,2,1,0)
// y : SIMD4<Int> = SIMD4<Int>(3, 2, 1, 0)
(swift) x .== y
// r0 : SIMDMask<SIMD4<Int.SIMDMaskScalar>> = SIMDMask<SIMD4<Int>>(false, true, false, false)

here, the second lane is true, because 2 == 2, while all other lanes are false because the elements of x and y in those lanes are not equal.

Because of some technical details of how these comparisons happen in SIMD hardware on CPUs, it's often advantageous to store these vectors-of-bools as vectors-of-ints whose width matches that of the vectors being compared. Because of this, there isn't a single SIMD4<Bool> type, but rather a whole family of SIMDMask<SIMD4<IntN>> types. The MaskStorage associatedtype on SIMDScalar allows us to get to the desired mask type from SIMD4<Scalar>.

The concrete SIMD2<T>, SIMD3<T>, etc types conform to SIMD; and almost all of their operations are provided by extensions on the protocol:

public extension SIMD {
  /// The valid indices for subscripting the vector.
  var indices: Range<Int> { get }

  /// A vector with value in all lanes.
  init(repeating value: Scalar)

  /// Conformance to Equatable
  static func ==(lhs: Self, rhs: Self) -> Bool

  /// Conformance to Hashable
  func hash(into hasher: inout Hasher)
  
  /// Conformance to CustomStringConvertible
  var description: String
  
  /// Pointwise equality and inequality
  static func .==(lhs: Self, rhs: Self) -> SIMDMask<MaskStorage>
  static func .!=(lhs: Self, rhs: Self) -> SIMDMask<MaskStorage>
  static func .==(lhs: Scalar, rhs: Self) -> SIMDMask<MaskStorage>
  static func .!=(lhs: Scalar, rhs: Self) -> SIMDMask<MaskStorage>
  static func .==(lhs: Self, rhs: Scalar) -> SIMDMask<MaskStorage>
  static func .!=(lhs: Self, rhs: Scalar) -> SIMDMask<MaskStorage>
  
  /// Replaces elements of this vector with `other` in the lanes where
  /// `mask` is `true`.
  mutating func replace(with other: Self, where mask: SIMDMask<MaskStorage>)
  mutating func replace(with other: Scalar, where mask: SIMDMask<MaskStorage>)
  func replacing(with other: Self, where mask: SIMDMask<MaskStorage>) -> Self
  func replacing(with other: Scalar, where mask: SIMDMask<MaskStorage>) -> Self
  
  /// Initialize from array literal
  ///
  /// Precondition: the array must have exactly scalarCount elements.
  init(arrayLiteral elements: Scalar...)
  
  /// Initialize from sequence
  ///
  /// Precondition: the sequence must have exactly scalarCount elements.
  init<S: Sequence>(_ elements: S) where S.Element == Scalar
}

public extension SIMD where Scalar : Comparable {
  /// Pointwise ordered comparisons
  static func .<(lhs: Self, rhs: Self) -> SIMDMask<MaskStorage>
  static func .<=(lhs: Self, rhs: Self) -> SIMDMask<MaskStorage> 
  static func .>=(lhs: Self, rhs: Self) -> SIMDMask<MaskStorage>
  static func .>(lhs: Self, rhs: Self) -> SIMDMask<MaskStorage> 
  
  static func .<(lhs: Scalar, rhs: Self) -> SIMDMask<MaskStorage>
  static func .<=(lhs: Scalar, rhs: Self) -> SIMDMask<MaskStorage> 
  static func .>=(lhs: Scalar, rhs: Self) -> SIMDMask<MaskStorage>
  static func .>(lhs: Scalar, rhs: Self) -> SIMDMask<MaskStorage> 
  
  static func .<(lhs: Self, rhs: Scalar) -> SIMDMask<MaskStorage>
  static func .<=(lhs: Self, rhs: Scalar) -> SIMDMask<MaskStorage> 
  static func .>=(lhs: Self, rhs: Scalar) -> SIMDMask<MaskStorage>
  static func .>(lhs: Self, rhs: Scalar) -> SIMDMask<MaskStorage> 
}

public extension SIMDMask {
  /// Boolean operations on Masks; these use the .-prefixed form because
  /// they do not short-circuit like `&&` and `||`, but the normal
  /// boolean operators have the wrong precedence for the typical use
  /// of these operators.
  static prefix func .!(rhs: SIMDMask) -> SIMDMask
  
  static func .&(lhs: SIMDMask, rhs: SIMDMask) -> SIMDMask
  static func .^(lhs: SIMDMask, rhs: SIMDMask) -> SIMDMask
  static func .|(lhs: SIMDMask, rhs: SIMDMask) -> SIMDMask
  
  static func .&(lhs: Bool, rhs: SIMDMask) -> SIMDMask
  static func .^(lhs: Bool, rhs: SIMDMask) -> SIMDMask
  static func .|(lhs: Bool, rhs: SIMDMask) -> SIMDMask
  
  static func .&(lhs: SIMDMask, rhs: Bool) -> SIMDMask
  static func .^(lhs: SIMDMask, rhs: Bool) -> SIMDMask
  static func .|(lhs: SIMDMask, rhs: Bool) -> SIMDMask
  
  static func .&=(lhs: inout SIMDMask, rhs: SIMDMask)
  static func .^=(lhs: inout SIMDMask, rhs: SIMDMask)
  static func .|=(lhs: inout SIMDMask, rhs: SIMDMask)
  
  static func .&=(lhs: inout SIMDMask, rhs: Bool)
  static func .^=(lhs: inout SIMDMask, rhs: Bool)
  static func .|=(lhs: inout SIMDMask, rhs: Bool)
  
  static func random<T: RandomNumberGenerator>(using generator: inout T) -> SIMDMask
  static func random() -> SIMDMask
}

public extension SIMD where Scalar : FixedWidthInteger {

  static var zero: Self
  var leadingZeroBitCount: Self
  var trailingZeroBitCount: Self
  var nonzeroBitCount: Self
  
  static prefix func ~(rhs: Self) -> Self
  static func &(lhs: Self, rhs: Self) -> Self
  static func ^(lhs: Self, rhs: Self) -> Self
  static func |(lhs: Self, rhs: Self) -> Self
  static func &<<(lhs: Self, rhs: Self) -> Self
  static func &>>(lhs: Self, rhs: Self) -> Self
  static func &+(lhs: Self, rhs: Self) -> Self
  static func &-(lhs: Self, rhs: Self) -> Self
  static func &*(lhs: Self, rhs: Self) -> Self
  static func /(lhs: Self, rhs: Self) -> Self
  static func %(lhs: Self, rhs: Self) -> Self
  
  static func &(lhs: Scalar, rhs: Self) -> Self
  static func ^(lhs: Scalar, rhs: Self) -> Self
  static func |(lhs: Scalar, rhs: Self) -> Self
  static func &<<(lhs: Scalar, rhs: Self) -> Self
  static func &>>(lhs: Scalar, rhs: Self) -> Self
  static func &+(lhs: Scalar, rhs: Self) -> Self 
  static func &-(lhs: Scalar, rhs: Self) -> Self 
  static func &*(lhs: Scalar, rhs: Self) -> Self 
  static func /(lhs: Scalar, rhs: Self) -> Self
  static func %(lhs: Scalar, rhs: Self) -> Self
  
  static func &(lhs: Self, rhs: Scalar) -> Self
  static func ^(lhs: Self, rhs: Scalar) -> Self
  static func |(lhs: Self, rhs: Scalar) -> Self
  static func &<<(lhs: Self, rhs: Scalar) -> Self
  static func &>>(lhs: Self, rhs: Scalar) -> Self
  static func &+(lhs: Self, rhs: Scalar) -> Self 
  static func &-(lhs: Self, rhs: Scalar) -> Self 
  static func &*(lhs: Self, rhs: Scalar) -> Self 
  static func /(lhs: Self, rhs: Scalar) -> Self
  static func %(lhs: Self, rhs: Scalar) -> Self
  
  static func &=(lhs: inout Self, rhs: Self)
  static func ^=(lhs: inout Self, rhs: Self)
  static func |=(lhs: inout Self, rhs: Self)
  static func &<<=(lhs: inout Self, rhs: Self)
  static func &>>=(lhs: inout Self, rhs: Self)
  static func &+=(lhs: inout Self, rhs: Self) 
  static func &-=(lhs: inout Self, rhs: Self) 
  static func &*=(lhs: inout Self, rhs: Self) 
  static func /=(lhs: inout Self, rhs: Self)
  static func %=(lhs: inout Self, rhs: Self)
  
  static func &=(lhs: inout Self, rhs: Scalar) 
  static func ^=(lhs: inout Self, rhs: Scalar) 
  static func |=(lhs: inout Self, rhs: Scalar) 
  static func &<<=(lhs: inout Self, rhs: Scalar)
  static func &>>=(lhs: inout Self, rhs: Scalar)
  static func &+=(lhs: inout Self, rhs: Scalar) 
  static func &-=(lhs: inout Self, rhs: Scalar) 
  static func &*=(lhs: inout Self, rhs: Scalar) 
  static func /=(lhs: inout Self, rhs: Scalar)
  static func %=(lhs: inout Self, rhs: Scalar)
  
  static func random<T: RandomNumberGenerator>(
    in range: Range<Scalar>,
    using generator: inout T
  ) -> Self
  
  static func random(in range: Range<Scalar>) -> Self
  
  static func random<T: RandomNumberGenerator>(
    in range: ClosedRange<Scalar>,
    using generator: inout T
  ) -> Self
  
  static func random(in range: ClosedRange<Scalar>) -> Self
}

public extension SIMD where Scalar : FloatingPoint {

  static var zero: Self

  static func +(lhs: Self, rhs: Self) -> Self
  static func -(lhs: Self, rhs: Self) -> Self
  static func *(lhs: Self, rhs: Self) -> Self
  static func /(lhs: Self, rhs: Self) -> Self

  static func +(lhs: Scalar, rhs: Self) -> Self
  static func -(lhs: Scalar, rhs: Self) -> Self
  static func *(lhs: Scalar, rhs: Self) -> Self
  static func /(lhs: Scalar, rhs: Self) -> Self

  static func +(lhs: Self, rhs: Scalar) -> Self
  static func -(lhs: Self, rhs: Scalar) -> Self
  static func *(lhs: Self, rhs: Scalar) -> Self
  static func /(lhs: Self, rhs: Scalar) -> Self

  static func +=(lhs: inout Self, rhs: Self)
  static func -=(lhs: inout Self, rhs: Self)
  static func *=(lhs: inout Self, rhs: Self)
  static func /=(lhs: inout Self, rhs: Self)

  static func +=(lhs: inout Self, rhs: Scalar)
  static func -=(lhs: inout Self, rhs: Scalar)
  static func *=(lhs: inout Self, rhs: Scalar)
  static func /=(lhs: inout Self, rhs: Scalar)

  func addingProduct(_ lhs: Self, _ rhs: Self) -> Self
  func addingProduct(_ lhs: Scalar, _ rhs: Self) -> Self
  func addingProduct(_ lhs: Self, _ rhs: Scalar) -> Self
  mutating func addProduct(_ lhs: Self, _ rhs: Self)
  mutating func addProduct(_ lhs: Scalar, _ rhs: Self)
  mutating func addProduct(_ lhs: Self, _ rhs: Scalar)
  
  func squareRoot( ) -> Self
  mutating func formSquareRoot( )
  
  func rounded(_ rule: FloatingPointRoundingRule) -> Self
  mutating func round(_ rule: FloatingPointRoundingRule)
}

public extension SIMD where Scalar : BinaryFloatingPoint,
                   Scalar.RawSignificand : FixedWidthInteger {

  static func random<T: RandomNumberGenerator>(
    in range: Range<Scalar>,
    using generator: inout T
  ) -> Self
  
  static func random(in range: Range<Scalar>) -> Self

  static func random<T: RandomNumberGenerator>(
    in range: ClosedRange<Scalar>,
    using generator: inout T
  ) -> Self

  static func random(in range: ClosedRange<Scalar>) -> Self
}

As you can see, this is a fairly huge API.

These operations are mostly implemented in terms of each other; the base operations are implemented as for-loops over the elements of the vector. Most of these are already auto- vectorized by the compiler when optimization is enabled, but this will not be the full long- term solution. Instead, they will be annotated with @_semantics to allow them to be lowered to special SIL nodes and from there to the corresponding LLVM IR vector nodes so that we can guarantee vector codegen always occurs. This optimization work does not effect the semantics of the operations at the Swift language level, so it will happen over the next few months as bug fixes.

After all that, the actual machinery of the concrete types is pretty boring:

public struct SIMD4<Scalar> : SIMD
                 where Scalar : SIMDScalar {

  public var _storage: Scalar.SIMD4Storage

  public var scalarCount: Int { return 4 }

  public init() { _storage = Scalar.SIMD4Storage() }

  public subscript(index: Int) -> Scalar {
    get {
      _precondition(indices.contains(index))
      return _storage[index]
    }
    set {
      _precondition(indices.contains(index))
      _storage[index] = newValue
    }
  }

  public init(_ v0: Scalar, _ v1: Scalar, _ v2: Scalar, _ v3: Scalar) {
    self.init()
    self[0] = v0
    self[1] = v1
    self[2] = v2
    self[3] = v3
  }
  
  public init(x: Scalar, y: Scalar, z: Scalar, w: Scalar) {
    self.init(x, y, z, w)
  }

  public var x: Scalar {
    get { return self[0]}
    set { self[0] = newValue }
  }

  public var y: Scalar {
    get { return self[1]}
    set { self[1] = newValue }
  }

  public var z: Scalar {
    get { return self[2]}
    set { self[2] = newValue }
  }

  public var w: Scalar {
    get { return self[3]}
    set { self[3] = newValue }
  }

  public init(lowHalf: SIMD2<Scalar>, highHalf: SIMD2<Scalar>) {
    self.init()
    self.lowHalf = lowHalf
    self.highHalf = highHalf
  }

  public var lowHalf: SIMD2<Scalar> {
    get {
      var result = SIMD2<Scalar>()
      for i in result.indices { result[i] = self[i] }
      return result
    }
    set {
      for i in newValue.indices { self[i] = newValue[i] }
    }
  }

  public var highHalf: SIMD2<Scalar> {
    get {
      var result = SIMD2<Scalar>()
      for i in result.indices { result[i] = self[2+i] }
      return result
    }
    set {
      for i in newValue.indices { self[2+i] = newValue[i] }
    }
  }

  public var evenHalf: SIMD2<Scalar> {
    get {
      var result = SIMD2<Scalar>()
      for i in result.indices { result[i] = self[2*i] }
      return result
    }
    set {
      for i in newValue.indices { self[2*i] = newValue[i] }
    }
  }

  public var oddHalf: SIMD2<Scalar> {
    get {
      var result = SIMD2<Scalar>()
      for i in result.indices { result[i] = self[2*i+1] }
      return result
    }
    set {
      for i in newValue.indices { self[2*i+1] = newValue[i] }
    }
  }
}

Beyond the new Swift API for vector types, there's an accompanying change to the clang importer that will allow C functions and structures using vector types to be imported. When we find a "clang extended vector type" or "extvector" whose underlying scalar type is imported as a SIMDScalar type in the standard library and whose element count is 2, 3, 4, 8, 16, 32, or 64, we'll import it as the correponding standard library vector type.

Source compatibility

No source compatibility changes.

Effect on ABI stability

No effects on ABI stability at the language level. There are some minor changes around how the <simd/simd.h> types are imported on Apple platforms, but these will have no effect on source compatibility; they only may tweak some low-level ABI details.

Effect on API resilience

Because these are entirely new types, they come with a large set of API that will become part of the standard library interface. New API can be added in the future, including to protocols, because it is generally possible to provide good default implementations for simd operations.

Alternatives considered

The main alternative is "don't do anything, let people write structs, and trust the autovectorizer." This might even mostly work, but even if it worked flawlessly (which it doesn't today), you would be left with the problem that these types are common, and everyone would be using their own set of structs, with slightly incompatible size and alignment or operations provided. Even when the layout matches up, you would still need to provide conversion shims between each and every library you work with. There is a lot of merit in providing a single ground-truth for low-level vectors in the stdlib, over which libraries and programs can build additional operations.

Notes from pre-review discussion:

  1. I have added a static .zero property to integer and floating point vectors based on email conversation with Rick Roe. This is a departure from scalar numeric types, but he convinced me that these are good to have as an explicit alternative to T().

  2. As mentioned above, .-prefixes have been added to some elementwise operations. I see pretty good arguments both in favor of and against this change. I think that it largely comes down to a matter of taste. There are two reasonable alternative positions here. (a) don't add the prefixes; this makes type checking a little more complex. (b) add .-prefixes to all lanewise operations; this adds a lot of new operators and a fair bit of noise, but there's reasonable precedent for it as well (julialang).

  3. Richard Wei has argued strongly against the spelling of replacing(with: where:), objecting to both the lack of an explicit object of the verb replacing and to the use of where: as a label for an argument that is not a predicate function. I am not ignoring his concerns, but from a pragmatic point of view, I believe that the fact that this operates on elements is implicit, and where: simply reads more clearly (to me and the other people that I surveyed) than any other option we could come up with.

Changelog:

  1. The first version of this proposal used the <, <=, >=, and > operators. These have been replaced with the .-prefixed operators.

  2. The first version of this proposal used .* and .&* for elementwise multiplication; these have been replaced with the "normal" arithmetic operations, matching +, -, /, and %.

  3. We have adopted .|, .&, etc on the principle that .-prefixed operators are "pointwise".

  4. An earlier version of this proposal used names of the form Float.Vector4 instead of Vector4<Float>. The core team feels that the latter name is a better direction to take.

  5. We have removed several operations whose naming was not totally satisfactory. We expect to add those operations and more in future additive proposals; only the types are necessary for ABI stability.

  6. Updates based on discussion on Swift-Evolution and with core team: I have eliminated _-prefixed protocol requirements, to make it somewhat more obvious how user types can be made to conform. This required some renaming to make the function of these associatedtypes more clear. I have also systematically removed Vector from the naming scheme, in order to remove confusion with C++-style vectors or "mathematical" vector structures that might be added at some future point. The resulting names are generally shorter, which is an nice benefit, and they now have a uniform SIMD prefix.

Protocols renamed:

Old name New name
SIMDVector SIMD
SIMDVectorizable SIMDScalar
SIMDVectorStorage SIMDStorage
SIMDMaskVector N/A*

Associated types renamed:

Old name New name
Element Scalar
_MaskElement SIMDMaskScalar
_Vector${n} SIMD${n}Storage
Mask N/A*
N/A* MaskStorage

Concrete types renamed:

Old name New name
Vector2<Scalar> SIMD2<Scalar>
Vector3<Scalar> SIMD3<Scalar>
Vector4<Scalar> SIMD4<Scalar>
Vector8<Scalar> SIMD8<Scalar>
Vector16<Scalar> SIMD16<Scalar>
Vector32<Scalar> SIMD32<Scalar>
Vector64<Scalar> SIMD64<Scalar>
N/A* SIMDMask<Storage>

(*) The way in which masks are implemented has also been simplified somewhat, which allowed me to eliminate the SIMDMaskVector protocol entirely.