Permalink
Find file
1413 lines (1105 sloc) 63.9 KB
orphan:

The Swift ABI

Hard Constraints on Resilience

The root of a class hierarchy must remain stable, at pain of invalidating the metaclass hierarchy. Note that a Swift class without an explicit base class is implicitly rooted in the SwiftObject Objective-C class.

Type Layout

Fragile Struct and Tuple Layout

Structs and tuples currently share the same layout algorithm, noted as the "Universal" layout algorithm in the compiler implementation. The algorithm is as follows:

  • Start with a size of 0 and an alignment of 1.
  • Iterate through the fields, in element order for tuples, or in var declaration order for structs. For each field:
    • Update size by rounding up to the alignment of the field, that is, increasing it to the least value greater or equal to size and evenly divisible by the alignment of the field.
    • Assign the offset of the field to the current value of size.
    • Update size by adding the size of the field.
    • Update alignment to the max of alignment and the alignment of the field.
  • The final size and alignment are the size and alignment of the aggregate. The stride of the type is the final size rounded up to alignment.

Note that this differs from C or LLVM's normal layout rules in that size and stride are distinct; whereas C layout requires that an embedded struct's size be padded out to its alignment and that nothing be laid out there, Swift layout allows an outer struct to lay out fields in the inner struct's tail padding, alignment permitting. Unlike C, zero-sized structs and tuples are also allowed, and take up no storage in enclosing aggregates. The Swift compiler emits LLVM packed struct types with manual padding to get the necessary control over the binary layout. Some examples:

// LLVM <{ i64, i8 }>
struct S {
  var x: Int
  var y: UInt8
}

// LLVM <{ i8, [7 x i8], <{ i64, i8 }>, i8 }>
struct S2 {
  var x: UInt8
  var s: S
  var y: UInt8
}

// LLVM <{}>
struct Empty {}

// LLVM <{ i64, i64 }>
struct ContainsEmpty {
  var x: Int
  var y: Empty
  var z: Int
}

Class Layout

Swift relies on the following assumptions about the Objective-C runtime, which are therefore now part of the Objective-C ABI:

  • 32-bit platforms never have tagged pointers. ObjC pointer types are either nil or an object pointer.
  • On x86-64, a tagged pointer either sets the lowest bit of the pointer or the highest bit of the pointer. Therefore, both of these bits are zero if and only if the value is not a tagged pointer.
  • On ARM64, a tagged pointer always sets the highest bit of the pointer.
  • 32-bit platforms never perform any isa masking. object_getClass is always equivalent to *(Class*)object.
  • 64-bit platforms perform isa masking only if the runtime exports a symbol uintptr_t objc_debug_isa_class_mask;. If this symbol is exported, object_getClass on a non-tagged pointer is always equivalent to (Class)(objc_debug_isa_class_mask & *(uintptr_t*)object).
  • The superclass field of a class object is always stored immediately after the isa field. Its value is either nil or a pointer to the class object for the superclass; it never has other bits set.

The following assumptions are part of the Swift ABI:

  • Swift class pointers are never tagged pointers.

TODO

Fragile Enum Layout

In laying out enum types, the ABI attempts to avoid requiring additional storage to store the tag for the enum case. The ABI chooses one of five strategies based on the layout of the enum:

Empty Enums

In the degenerate case of an enum with no cases, the enum is an empty type.

enum Empty {} // => empty type
Single-Case Enums

In the degenerate case of an enum with a single case, there is no discriminator needed, and the enum type has the exact same layout as its case's data type, or is empty if the case has no data type.

enum EmptyCase { case X }             // => empty type
enum DataCase { case Y(Int, Double) } // => LLVM <{ i64, double }>
C-Like Enums

If none of the cases has a data type (a "C-like" enum), then the enum is laid out as an integer tag with the minimal number of bits to contain all of the cases. The machine-level layout of the type then follows LLVM's data layout rules for integer types on the target platform. The cases are assigned tag values in declaration order.

enum EnumLike2 { // => LLVM i1
  case A         // => i1 0
  case B         // => i1 1
}

enum EnumLike8 { // => LLVM i3
  case A         // => i3 0
  case B         // => i3 1
  case C         // => i3 2
  case D         // etc.
  case E
  case F
  case G
  case H
}

Discriminator values after the one used for the last case become extra inhabitants of the enum type (see Single-Payload Enums).

Single-Payload Enums

If an enum has a single case with a data type and one or more no-data cases (a "single-payload" enum), then the case with data type is represented using the data type's binary representation, with added zero bits for tag if necessary. If the data type's binary representation has extra inhabitants, that is, bit patterns with the size and alignment of the type but which do not form valid values of that type, they are used to represent the no-data cases, with extra inhabitants in order of ascending numeric value matching no-data cases in declaration order. If the type has spare bits (see Multi-Payload Enums), they are used to form extra inhabitants. The enum value is then represented as an integer with the storage size in bits of the data type. Extra inhabitants of the payload type not used by the enum type become extra inhabitants of the enum type itself.

enum CharOrSectionMarker { => LLVM i32
  case Paragraph            => i32 0x0020_0000
  case Char(UnicodeScalar)  => i32 (zext i21 %Char to i32)
  case Chapter              => i32 0x0020_0001
}

CharOrSectionMarker.Char('\x00') => i32 0x0000_0000
CharOrSectionMarker.Char('\u10FFFF') => i32 0x0010_FFFF

enum CharOrSectionMarkerOrFootnoteMarker { => LLVM i32
  case CharOrSectionMarker(CharOrSectionMarker) => i32 %CharOrSectionMarker
  case Asterisk                                 => i32 0x0020_0002
  case Dagger                                   => i32 0x0020_0003
  case DoubleDagger                             => i32 0x0020_0004
}

If the data type has no extra inhabitants, or there are not enough extra inhabitants to represent all of the no-data cases, then a tag bit is added to the enum's representation. The tag bit is set for the no-data cases, which are then assigned values in the data area of the enum in declaration order.

enum IntOrInfinity { => LLVM <{ i64, i1 }>
  case NegInfinity    => <{ i64, i1 }> {    0, 1 }
  case Int(Int)       => <{ i64, i1 }> { %Int, 0 }
  case PosInfinity    => <{ i64, i1 }> {    1, 1 }
}

IntOrInfinity.Int(    0) => <{ i64, i1 }> {     0, 0 }
IntOrInfinity.Int(20721) => <{ i64, i1 }> { 20721, 0 }
Multi-Payload Enums

If an enum has more than one case with data type, then a tag is necessary to discriminate the data types. The ABI will first try to find common spare bits, that is, bits in the data types' binary representations which are either fixed-zero or ignored by valid values of all of the data types. The tag will be scattered into these spare bits as much as possible. Currently only spare bits of primitive integer types, such as the high bits of an i21 type, are considered. The enum data is represented as an integer with the storage size in bits of the largest data type.

enum TerminalChar {             => LLVM i32
  case Plain(UnicodeScalar)     => i32     (zext i21 %Plain     to i32)
  case Bold(UnicodeScalar)      => i32 (or (zext i21 %Bold      to i32), 0x0020_0000)
  case Underline(UnicodeScalar) => i32 (or (zext i21 %Underline to i32), 0x0040_0000)
  case Blink(UnicodeScalar)     => i32 (or (zext i21 %Blink     to i32), 0x0060_0000)
  case Empty                    => i32 0x0080_0000
  case Cursor                   => i32 0x0080_0001
}

If there are not enough spare bits to contain the tag, then additional bits are added to the representation to contain the tag. Tag values are assigned to data cases in declaration order. If there are no-data cases, they are collected under a common tag, and assigned values in the data area of the enum in declaration order.

class Bignum {}

enum IntDoubleOrBignum { => LLVM <{ i64, i2 }>
  case Int(Int)           => <{ i64, i2 }> {           %Int,            0 }
  case Double(Double)     => <{ i64, i2 }> { (bitcast  %Double to i64), 1 }
  case Bignum(Bignum)     => <{ i64, i2 }> { (ptrtoint %Bignum to i64), 2 }
}

Existential Container Layout

Values of protocol type, protocol composition type, or Any type are laid out using existential containers (so-called because these types are "existential types" in type theory).

Opaque Existential Containers

If there is no class constraint on a protocol or protocol composition type, the existential container has to accommodate a value of arbitrary size and alignment. It does this using a fixed-size buffer, which is three pointers in size and pointer-aligned. This either directly contains the value, if its size and alignment are both less than or equal to the fixed-size buffer's, or contains a pointer to a side allocation owned by the existential container. The type of the contained value is identified by its type metadata record, and witness tables for all of the required protocol conformances are included. The layout is as if declared in the following C struct:

struct OpaqueExistentialContainer {
  void *fixedSizeBuffer[3];
  Metadata *type;
  WitnessTable *witnessTables[NUM_WITNESS_TABLES];
};
Class Existential Containers

If one or more of the protocols in a protocol or protocol composition type have a class constraint, then only class values can be stored in the existential container, and a more efficient representation is used. Class instances are always a single pointer in size, so a fixed-size buffer and potential side allocation is not needed, and class instances always have a reference to their own type metadata, so the separate metadata record is not needed. The layout is thus as if declared in the following C struct:

struct ClassExistentialContainer {
  HeapObject *value;
  WitnessTable *witnessTables[NUM_WITNESS_TABLES];
};

Note that if no witness tables are needed, such as for the "any class" type protocol<class> or an Objective-C protocol type, then the only element of the layout is the heap object pointer. This is ABI-compatible with id and id <Protocol> types in Objective-C.

Type Metadata

The Swift runtime keeps a metadata record for every type used in a program, including every instantiation of generic types. These metadata records can be used by (TODO: reflection and) debugger tools to discover information about types. For non-generic nominal types, these metadata records are generated statically by the compiler. For instances of generic types, and for intrinsic types such as tuples, functions, protocol compositions, etc., metadata records are lazily created by the runtime as required. Every type has a unique metadata record; two metadata pointer values are equal iff the types are equivalent.

In the layout descriptions below, offsets are given relative to the metadata pointer as an index into an array of pointers. On a 32-bit platform, offset 1 means an offset of 4 bytes, and on 64-bit platforms, it means an offset of 8 bytes.

Common Metadata Layout

All metadata records share a common header, with the following fields:

  • The value witness table pointer references a vtable of functions that implement the value semantics of the type, providing fundamental operations such as allocating, copying, and destroying values of the type. The value witness table also records the size, alignment, stride, and other fundamental properties of the type. The value witness table pointer is at offset -1 from the metadata pointer, that is, the pointer-sized word immediately before the pointer's referenced address.

  • The kind field is a pointer-sized integer that describes the kind of type the metadata describes. This field is at offset 0 from the metadata pointer.

    The current kind values are as follows:

    • Struct metadata has a kind of 1.
    • Enum metadata has a kind of 2.
    • Opaque metadata has a kind of 8. This is used for compiler Builtin primitives that have no additional runtime information.
    • Tuple metadata has a kind of 9.
    • Function metadata has a kind of 10.
    • Protocol metadata has a kind of 12. This is used for protocol types, for protocol compositions, and for the Any type.
    • Metatype metadata has a kind of 13.
    • Class metadata, instead of a kind, has an isa pointer in its kind slot, pointing to the class's metaclass record. This isa pointer is guaranteed to have an integer value larger than 4096 and so can be discriminated from non-class kind values.

Struct Metadata

In addition to the common metadata layout fields, struct metadata records contain the following fields:

  • The nominal type descriptor is referenced at offset 1.

  • A reference to the parent metadata record is stored at offset 2. For structs that are members of an enclosing nominal type, this is a reference to the enclosing type's metadata. For top-level structs, this is null.

    TODO: The parent pointer is currently always null.

  • A vector of field offsets begins at offset 3. For each field of the struct, in var declaration order, the field's offset in bytes from the beginning of the struct is stored as a pointer-sized integer.

  • If the struct is generic, then the generic parameter vector begins at offset 3+n, where n is the number of fields in the struct.

Enum Metadata

In addition to the common metadata layout fields, enum metadata records contain the following fields:

  • The nominal type descriptor is referenced at offset 1.

  • A reference to the parent metadata record is stored at offset 2. For enums that are members of an enclosing nominal type, this is a reference to the enclosing type's metadata. For top-level enums, this is null.

    TODO: The parent pointer is currently always null.

  • If the enum is generic, then the generic parameter vector begins at offset 3.

Tuple Metadata

In addition to the common metadata layout fields, tuple metadata records contain the following fields:

  • The number of elements in the tuple is a pointer-sized integer at offset 1.

  • The labels string is a pointer to a list of consecutive null-terminated label names for the tuple at offset 2. Each label name is given as a null-terminated, UTF-8-encoded string in sequence. If the tuple has no labels, this is a null pointer.

    TODO: The labels string pointer is currently always null, and labels are not factored into tuple metadata uniquing.

  • The element vector begins at offset 3 and consists of a vector of type-offset pairs. The metadata for the nth element's type is a pointer at offset 3+2*n. The offset in bytes from the beginning of the tuple to the beginning of the nth element is at offset 3+2*n+1.

Function Metadata

In addition to the common metadata layout fields, function metadata records contain the following fields:

  • The number of arguments to the function is stored at offset 1.

  • A reference to the result type metadata record is stored at offset 2. If the function has multiple returns, this references a tuple metadata record.

  • The argument vector begins at offset 3 and consists of pointers to metadata records of the function's arguments.

    If the function takes any inout arguments, a pointer to each argument's metadata record will be appended separately, the lowest bit being set if it is inout. Because of pointer alignment, the lowest bit will always be free to hold this tag.

    If the function takes no inout arguments, there will be only one pointer in the vector for the following cases:

    • 0 arguments: a tuple metadata record for the empty tuple
    • 1 argument: the first and only argument's metadata record
    • >1 argument: a tuple metadata record containing the arguments

Protocol Metadata

In addition to the common metadata layout fields, protocol metadata records contain the following fields:

  • A layout flags word is stored at offset 1. The bits of this word describe the existential container layout used to represent values of the type. The word is laid out as follows:

    • The number of witness tables is stored in the least significant 31 bits. Values of the protocol type contain this number of witness table pointers in their layout.
    • The class constraint is stored at bit 31. This bit is set if the type is not class-constrained, meaning that struct, enum, or class values can be stored in the type. If not set, then only class values can be stored in the type, and the type uses a more efficient layout.

    Note that the field is pointer-sized, even though only the lowest 32 bits are currently inhabited on all platforms. These values can be derived from the protocol descriptor records, but are pre-calculated for convenience.

  • The number of protocols that make up the protocol composition is stored at offset 2. For the "any" types Any or Any : class, this is zero. For a single-protocol type P, this is one. For a protocol composition type P & Q & ..., this is the number of protocols.

  • The protocol descriptor vector begins at offset 3. This is an inline array of pointers to the protocol descriptor for every protocol in the composition, or the single protocol descriptor for a protocol type. For an "any" type, there is no protocol descriptor vector.

Metatype Metadata

In addition to the common metadata layout fields, metatype metadata records contain the following fields:

  • A reference to the metadata record for the instance type that the metatype represents is stored at offset 1.

Class Metadata

Class metadata is designed to interoperate with Objective-C; all class metadata records are also valid Objective-C Class objects. Class metadata pointers are used as the values of class metatypes, so a derived class's metadata record also serves as a valid class metatype value for all of its ancestor classes.

  • The destructor pointer is stored at offset -2 from the metadata pointer, behind the value witness table. This function is invoked by Swift's deallocator when the class instance is destroyed.

  • The isa pointer pointing to the class's Objective-C-compatible metaclass record is stored at offset 0, in place of an integer kind discriminator.

  • The super pointer pointing to the metadata record for the superclass is stored at offset 1. If the class is a root class, it is null.

  • Two words are reserved for use by the Objective-C runtime at offset 2 and offset 3.

  • The rodata pointer is stored at offset 4; it points to an Objective-C compatible rodata record for the class. This pointer value includes a tag. The low bit is always set to 1 for Swift classes and always set to 0 for Objective-C classes.

  • The class flags are a 32-bit field at offset 5.

  • The instance address point is a 32-bit field following the class flags. A pointer to an instance of this class points this number of bytes after the beginning of the instance.

  • The instance size is a 32-bit field following the instance address point. This is the number of bytes of storage present in every object of this type.

  • The instance alignment mask is a 16-bit field following the instance size. This is a set of low bits which must not be set in a pointer to an instance of this class.

  • The runtime-reserved field is a 16-bit field following the instance alignment mask. The compiler initializes this to zero.

  • The class object size is a 32-bit field following the runtime-reserved field. This is the total number of bytes of storage in the class metadata object.

  • The class object address point is a 32-bit field following the class object size. This is the number of bytes of storage in the class metadata object.

  • The nominal type descriptor for the most-derived class type is referenced at an offset immediately following the class object address point. This is offset 8 on a 64-bit platform or offset 11 on a 32-bit platform.

  • For each Swift class in the class's inheritance hierarchy, in order starting from the root class and working down to the most derived class, the following fields are present:

    • First, a reference to the parent metadata record is stored. For classes that are members of an enclosing nominal type, this is a reference to the enclosing type's metadata. For top-level classes, this is null.

      TODO: The parent pointer is currently always null.

    • If the class is generic, its generic parameter vector is stored inline.

    • The vtable is stored inline and contains a function pointer to the implementation of every method of the class in declaration order.

    • If the layout of a class instance is dependent on its generic parameters, then a field offset vector is stored inline, containing offsets in bytes from an instance pointer to each field of the class in declaration order. (For classes with fixed layout, the field offsets are accessible statically from global variables, similar to Objective-C ivar offsets.)

    Note that none of these fields are present for Objective-C base classes in the inheritance hierarchy.

Generic Parameter Vector

Metadata records for instances of generic types contain information about their generic parameters. For each parameter of the type, a reference to the metadata record for the type argument is stored. After all of the type argument metadata references, for each type parameter, if there are protocol requirements on that type parameter, a reference to the witness table for each protocol it is required to conform to is stored in declaration order.

For example, given a generic type with the parameters <T, U, V>, its generic parameter record will consist of references to the metadata records for T, U, and V in succession, as if laid out in a C struct:

struct GenericParameterVector {
  TypeMetadata *T, *U, *V;
};

If we add protocol requirements to the parameters, for example, <T: Runcible, U: Fungible & Ansible, V>, then the type's generic parameter vector contains witness tables for those protocols, as if laid out:

struct GenericParameterVector {
  TypeMetadata *T, *U, *V;
  RuncibleWitnessTable *T_Runcible;
  FungibleWitnessTable *U_Fungible;
  AnsibleWitnessTable *U_Ansible;
};

Nominal Type Descriptor

The metadata records for class, struct, and enum types contain a pointer to a nominal type descriptor, which contains basic information about the nominal type such as its name, members, and metadata layout. For a generic type, one nominal type descriptor is shared for all instantiations of the type. The layout is as follows:

  • The kind of type is stored at offset 0, which is as follows:
    • 0 for a class,
    • 1 for a struct, or
    • 2 for an enum.
  • The mangled name is referenced as a null-terminated C string at offset 1. This name includes no bound generic parameters.
  • The following four fields depend on the kind of nominal type.
    • For a struct or class:
      • The number of fields is stored at offset 2. This is the length of the field offset vector in the metadata record, if any.
      • The offset to the field offset vector is stored at offset 3. This is the offset in pointer-sized words of the field offset vector for the type in the metadata record. If no field offset vector is stored in the metadata record, this is zero.
      • The field names are referenced as a doubly-null-terminated list of C strings at offset 4. The order of names corresponds to the order of fields in the field offset vector.
      • The field type accessor is a function pointer at offset 5. If non-null, the function takes a pointer to an instance of type metadata for the nominal type, and returns a pointer to an array of type metadata references for the types of the fields of that instance. The order matches that of the field offset vector and field name list.
    • For an enum:
      • The number of payload cases and payload size offset are stored at offset 2. The least significant 24 bits are the number of payload cases, and the most significant 8 bits are the offset of the payload size in the type metadata, if present.
      • The number of no-payload cases is stored at offset 3.
      • The case names are referenced as a doubly-null-terminated list of C strings at offset 4. The names are ordered such that payload cases come first, followed by no-payload cases. Within each half of the list, the order of names corresponds to the order of cases in the enum declaration.
      • The case type accessor is a function pointer at offset 5. If non-null, the function takes a pointer to an instance of type metadata for the enum, and returns a pointer to an array of type metadata references for the types of the cases of that instance. The order matches that of the case name list. This function is similar to the field type accessor for a struct, except also the least significant bit of each element in the result is set if the enum case is an indirect case.
  • If the nominal type is generic, a pointer to the metadata pattern that is used to form instances of the type is stored at offset 6. The pointer is null if the type is not generic.
  • The generic parameter descriptor begins at offset 7. This describes the layout of the generic parameter vector in the metadata record:
    • The offset of the generic parameter vector is stored at offset 7. This is the offset in pointer-sized words of the generic parameter vector inside the metadata record. If the type is not generic, this is zero.
    • The number of type parameters is stored at offset 8. This count includes associated types of type parameters with protocol constraints.
    • The number of type parameters is stored at offset 9. This count includes only the primary formal type parameters.
    • For each type parameter n, the following fields are stored:
      • The number of witnesses for the type parameter is stored at offset 10+n. This is the number of witness table pointers that are stored for the type parameter in the generic parameter vector.

Note that there is no nominal type descriptor for protocols or protocol types. See the protocol descriptor description below.

Protocol Descriptor

Protocol metadata contains references to zero, one, or more protocol descriptors that describe the protocols values of the type are required to conform to. The protocol descriptor is laid out to be compatible with Objective-C Protocol objects. The layout is as follows:

  • An isa placeholder is stored at offset 0. This field is populated by the Objective-C runtime.
  • The mangled name is referenced as a null-terminated C string at offset 1.
  • If the protocol inherits one or more other protocols, a pointer to the inherited protocols list is stored at offset 2. The list starts with the number of inherited protocols as a pointer-sized integer, and is followed by that many protocol descriptor pointers. If the protocol inherits no other protocols, this pointer is null.
  • For an ObjC-compatible protocol, its required instance methods are stored at offset 3 as an ObjC-compatible method list. This is null for native Swift protocols.
  • For an ObjC-compatible protocol, its required class methods are stored at offset 4 as an ObjC-compatible method list. This is null for native Swift protocols.
  • For an ObjC-compatible protocol, its optional instance methods are stored at offset 5 as an ObjC-compatible method list. This is null for native Swift protocols.
  • For an ObjC-compatible protocol, its optional class methods are stored at offset 6 as an ObjC-compatible method list. This is null for native Swift protocols.
  • For an ObjC-compatible protocol, its instance properties are stored at offset 7 as an ObjC-compatible property list. This is null for native Swift protocols.
  • The size of the protocol descriptor record is stored as a 32-bit integer at offset 8. This is currently 72 on 64-bit platforms and 40 on 32-bit platforms.
  • Flags are stored as a 32-bit integer after the size. The following bits are currently used (counting from least significant bit zero):
    • Bit 0 is the Swift bit. It is set for all protocols defined in Swift and unset for protocols defined in Objective-C.
    • Bit 1 is the class constraint bit. It is set if the protocol is not class-constrained, meaning that any struct, enum, or class type may conform to the protocol. It is unset if only classes can conform to the protocol. (The inverted meaning is for compatibility with Objective-C protocol records, in which the bit is never set. Objective-C protocols can only be conformed to by classes.)
    • Bit 2 is the witness table bit. It is set if dispatch to the protocol's methods is done through a witness table, which is either passed as an extra parameter to generic functions or included in the existential container layout of protocol types. It is unset if dispatch is done through objc_msgSend and requires no additional information to accompany a value of conforming type.
    • Bit 31 is set by the Objective-C runtime when it has done its initialization of the protocol record. It is unused by the Swift runtime.

Heap Objects

Heap Metadata

Heap Object Header

Mangling

mangled-name ::= '_S' global

All Swift-mangled names begin with this prefix.

The basic mangling scheme is a list of 'operators' where the operators are structured in a post-fix order. For example the mangling may start with an identifier but only later in the mangling a type-like operator defines how this identifier has to be interpreted:

4Test3FooC   // The trailing 'C' says that 'Foo' is a class in module 'Test'

Operators are either identifiers or a sequence of one or more characters, like C for class. All operators share the same name-space. Important operators are a single character, which means that no other operator may start with the same character.

Some less important operators are longer and may also contain one or more natural numbers. But it's always important that the demangler can identify the end (the last character) of an operator. For example, it's not possible to determine the last character if there are two operators M and Ma: a could belong to M or it could be the first character of the next operator.

The intention of the post-fix order is to optimize for common pre-fixes. Regardless, if it's the mangling for a metatype or a function in a module, the mangled name will start with the module name (after the _S).

In the following, productions which are only _part_ of an operator, are named with uppercase letters.

Globals

global ::= type 'N'                    // type metadata (address point)
                                       // -- type starts with [BCOSTV]
global ::= type 'Mf'                   // 'full' type metadata (start of object)
global ::= type 'MP'                   // type metadata pattern
global ::= type 'Ma'                   // type metadata access function
global ::= type 'ML'                   // type metadata lazy cache variable
global ::= nominal-type 'Mm'           // class metaclass
global ::= nominal-type 'Mn'           // nominal type descriptor
global ::= protocol 'Mp'               // protocol descriptor
global ::= type 'MF'                   // metadata for remote mirrors: field descriptor
global ::= type 'MB'                   // metadata for remote mirrors: builtin type descriptor
global ::= protocol-conformance 'MA'   // metadata for remote mirrors: associated type descriptor
global ::= nominal-type 'MC'           // metadata for remote mirrors: superclass descriptor

// TODO check this::
global ::= mangled-name 'TA'                     // partial application forwarder
global ::= mangled-name 'Ta'                     // ObjC partial application forwarder

global ::= type 'w' VALUE-WITNESS-KIND // value witness
global ::= protocol-conformance 'Wa'   // protocol witness table accessor
global ::= protocol-conformance 'WG'   // generic protocol witness table
global ::= protocol-conformance 'WI'   // generic protocol witness table instantiation function
global ::= type protocol-conformance 'WL'   // lazy protocol witness table cache variable
global ::= entity 'Wo'                 // witness table offset
global ::= protocol-conformance 'WP'   // protocol witness table

global ::= protocol-conformance identifier 'Wt' // associated type metadata accessor
global ::= protocol-conformance identifier nominal-type 'WT' // associated type witness table accessor
global ::= type protocol-conformance 'Wl' // lazy protocol witness table accessor
global ::= type 'WV'                   // value witness table
global ::= entity 'Wv' DIRECTNESS      // field offset

global ::= type 'Wy' // Outlined Copy Function Type
global ::= type 'We' // Outlined Consume Function Type

DIRECTNESS ::= 'd'                         // direct
DIRECTNESS ::= 'i'                         // indirect

A direct symbol resolves directly to the address of an object. An indirect symbol resolves to the address of a pointer to the object. They are distinct manglings to make a certain class of bugs immediately obvious.

The terminology is slightly overloaded when discussing offsets. A direct offset resolves to a variable holding the true offset. An indirect offset resolves to a variable holding an offset to be applied to type metadata to get the address of the true offset. (Offset variables are required when the object being accessed lies within a resilient structure. When the layout of the object may depend on generic arguments, these offsets must be kept in metadata. Indirect field offsets are therefore required when accessing fields in generic types where the metadata itself has unknown layout.)

global ::= global 'TO'                 // ObjC-as-swift thunk
global ::= global 'To'                 // swift-as-ObjC thunk
global ::= global 'TD'                 // dynamic dispatch thunk
global ::= global 'Td'                 // direct method reference thunk
global ::= global 'TV'                 // vtable override thunk
global ::= type 'D'                    // type mangling for the debugger. TODO: check if we really need this
global ::= protocol-conformance entity 'TW' // protocol witness thunk
global ::= context identifier identifier 'TB' // property behavior initializer thunk (not used currently)
global ::= context identifier identifier 'Tb' // property behavior setter thunk (not used currently)
global ::= global 'T_' specialization  // reset substitutions before demangling specialization
global ::= entity                      // some identifiable thing
global ::= type type generic-signature? 'T' REABSTRACT-THUNK-TYPE   // reabstraction thunk helper function

REABSTRACT-THUNK-TYPE ::= 'R'          // reabstraction thunk helper function
REABSTRACT-THUNK-TYPE ::= 'r'          // reabstraction thunk

The types in a reabstraction thunk helper function are always non-polymorphic <impl-function-type> types.

VALUE-WITNESS-KIND ::= 'al'           // allocateBuffer
VALUE-WITNESS-KIND ::= 'ca'           // assignWithCopy
VALUE-WITNESS-KIND ::= 'ta'           // assignWithTake
VALUE-WITNESS-KIND ::= 'de'           // deallocateBuffer
VALUE-WITNESS-KIND ::= 'xx'           // destroy
VALUE-WITNESS-KIND ::= 'XX'           // destroyBuffer
VALUE-WITNESS-KIND ::= 'Xx'           // destroyArray
VALUE-WITNESS-KIND ::= 'CP'           // initializeBufferWithCopyOfBuffer
VALUE-WITNESS-KIND ::= 'Cp'           // initializeBufferWithCopy
VALUE-WITNESS-KIND ::= 'cp'           // initializeWithCopy
VALUE-WITNESS-KIND ::= 'TK'           // initializeBufferWithTakeOfBuffer
VALUE-WITNESS-KIND ::= 'Tk'           // initializeBufferWithTake
VALUE-WITNESS-KIND ::= 'tk'           // initializeWithTake
VALUE-WITNESS-KIND ::= 'pr'           // projectBuffer
VALUE-WITNESS-KIND ::= 'xs'           // storeExtraInhabitant
VALUE-WITNESS-KIND ::= 'xg'           // getExtraInhabitantIndex
VALUE-WITNESS-KIND ::= 'Cc'           // initializeArrayWithCopy
VALUE-WITNESS-KIND ::= 'Tt'           // initializeArrayWithTakeFrontToBack
VALUE-WITNESS-KIND ::= 'tT'           // initializeArrayWithTakeBackToFront
VALUE-WITNESS-KIND ::= 'ug'           // getEnumTag
VALUE-WITNESS-KIND ::= 'up'           // destructiveProjectEnumData
VALUE-WITNESS-KIND ::= 'ui'           // destructiveInjectEnumTag

<VALUE-WITNESS-KIND> differentiates the kinds of value witness functions for a type.

Entities

entity ::= nominal-type                    // named type declaration
entity ::= context entity-spec static? curry-thunk?

static ::= 'Z'
curry-thunk ::= 'Tc'

// The leading type is the function type
entity-spec ::= type 'fC'                  // allocating constructor
entity-spec ::= type 'fc'                  // non-allocating constructor
entity-spec ::= type 'fU' INDEX            // explicit anonymous closure expression
entity-spec ::= type 'fu' INDEX            // implicit anonymous closure
entity-spec ::= 'fA' INDEX                 // default argument N+1 generator
entity-spec ::= 'fi'                       // non-local variable initializer
entity-spec ::= 'fD'                       // deallocating destructor; untyped
entity-spec ::= 'fd'                       // non-deallocating destructor; untyped
entity-spec ::= 'fE'                       // ivar destroyer; untyped
entity-spec ::= 'fe'                       // ivar initializer; untyped

entity-spec ::= decl-name function-signature generic-signature? 'F'    // function
entity-spec ::= decl-name type 'i'                 // subscript ('i'ndex) itself (not the individual accessors)
entity-spec ::= decl-name type 'v'                 // variable
entity-spec ::= decl-name type 'f' ACCESSOR
entity-spec ::= decl-name type 'fp'                // generic type parameter (not used?)
entity-spec ::= decl-name type 'fo'                // enum element (currently not used)

ACCESSOR ::= 'm'                           // materializeForSet
ACCESSOR ::= 's'                           // setter
ACCESSOR ::= 'g'                           // getter
ACCESSOR ::= 'G'                           // global getter
ACCESSOR ::= 'w'                           // willSet
ACCESSOR ::= 'W'                           // didSet
ACCESSOR ::= 'a' ADDRESSOR-KIND            // mutable addressor
ACCESSOR ::= 'l' ADDRESSOR-KIND            // non-mutable addressor

ADDRESSOR-KIND ::= 'u'                     // unsafe addressor (no owner)
ADDRESSOR-KIND ::= 'O'                     // owning addressor (non-native owner)
ADDRESSOR-KIND ::= 'o'                     // owning addressor (native owner)
ADDRESSOR-KIND ::= 'p'                     // pinning addressor (native owner)

decl-name ::= identifier
decl-name ::= identifier 'L' INDEX         // locally-discriminated declaration
decl-name ::= identifier identifier 'LL'    // file-discriminated declaration

The first identifier in a file-discriminated <decl-name>> is a string that represents the file the original declaration came from. It should be considered unique within the enclosing module. The second identifier is the name of the entity. Not all declarations marked private declarations will use this mangling; if the entity's context is enough to uniquely identify the entity, the simple identifier form is preferred.

Declaration Contexts

These manglings identify the enclosing context in which an entity was declared, such as its enclosing module, function, or nominal type.

context ::= module
context ::= entity
context ::= entity module generic-signature? 'E'

An extension mangling is used whenever an entity's declaration context is an extension and the entity being extended is in a different module. In this case the extension's module is mangled first, followed by the entity being extended. If the extension and the extended entity are in the same module, the plain entity mangling is preferred. If the extension is constrained, the constraints on the extension are mangled in its generic signature.

When mangling the context of a local entity within a constructor or destructor, the non-allocating or non-deallocating variant is used.

module ::= identifier                      // module name
module ::= known-module                    // abbreviation

known-module ::= 's'                       // Swift
known-module ::= 'SC'                      // C
known-module ::= 'So'                      // Objective-C

The Objective-C module is used as the context for mangling Objective-C classes as <type>s.

Types

nominal-type ::= substitution
nominal-type ::= context decl-name 'C'     // nominal class type
nominal-type ::= context decl-name 'O'     // nominal enum type
nominal-type ::= context decl-name 'V'     // nominal struct type
nominal-type ::= protocol 'P'              // nominal protocol type

nominal-type ::= known-nominal-type

known-nominal-type ::= 'Sa'                // Swift.Array
known-nominal-type ::= 'Sb'                // Swift.Bool
known-nominal-type ::= 'Sc'                // Swift.UnicodeScalar
known-nominal-type ::= 'Sd'                // Swift.Float64
known-nominal-type ::= 'Sf'                // Swift.Float32
known-nominal-type ::= 'Si'                // Swift.Int
known-nominal-type ::= 'SV'                // Swift.UnsafeRawPointer
known-nominal-type ::= 'Sv'                // Swift.UnsafeMutableRawPointer
known-nominal-type ::= 'SP'                // Swift.UnsafePointer
known-nominal-type ::= 'Sp'                // Swift.UnsafeMutablePointer
known-nominal-type ::= 'SQ'                // Swift.ImplicitlyUnwrappedOptional
known-nominal-type ::= 'Sq'                // Swift.Optional
known-nominal-type ::= 'SR'                // Swift.UnsafeBufferPointer
known-nominal-type ::= 'Sr'                // Swift.UnsafeMutableBufferPointer
known-nominal-type ::= 'SS'                // Swift.String
known-nominal-type ::= 'Su'                // Swift.UInt

protocol ::= context decl-name

type ::= 'Bb'                              // Builtin.BridgeObject
type ::= 'BB'                              // Builtin.UnsafeValueBuffer
type ::= 'Bf' NATURAL '_'                  // Builtin.Float<n>
type ::= 'Bi' NATURAL '_'                  // Builtin.Int<n>
type ::= 'BO'                              // Builtin.UnknownObject
type ::= 'Bo'                              // Builtin.NativeObject
type ::= 'Bp'                              // Builtin.RawPointer
type ::= type 'Bv' NATURAL '_'             // Builtin.Vec<n>x<type>
type ::= 'Bw'                              // Builtin.Word
type ::= context decl-name 'a'             // Type alias (DWARF only)
type ::= function-signature 'c'            // function type
type ::= function-signature 'X' FUNCTION-KIND // special function type
type ::= type 'y' (type* '_')* type* 'G'   // bound generic type (one type-list per nesting level of type)
type ::= type 'Sg'                         // optional type, shortcut for: type 'ySqG'
type ::= type 'Xo'                         // @unowned type
type ::= type 'Xu'                         // @unowned(unsafe) type
type ::= type 'Xw'                         // @weak type
type ::= impl-function-type 'XF'           // function implementation type (currently unused)
type ::= type 'Xb'                         // SIL @box type (deprecated)
type ::= type-list 'Xx'                    // SIL box type
type ::= type-list type-list generic-signature 'XX'
                                           // Generic SIL box type
type ::= type 'XD'                         // dynamic self type
type ::= type 'm'                          // metatype without representation
type ::= type 'XM' METATYPE-REPR           // metatype with representation
type ::= type 'Xp'                         // existential metatype without representation
type ::= type 'Xm' METATYPE-REPR           // existential metatype with representation
type ::= 'Xe'                              // error or unresolved type


FUNCTION-KIND ::= 'f'                      // @thin function type
FUNCTION-KIND ::= 'U'                      // uncurried function type (currently not used)
FUNCTION-KIND ::= 'K'                      // @auto_closure function type
FUNCTION-KIND ::= 'B'                      // objc block function type
FUNCTION-KIND ::= 'C'                      // C function pointer type

function-signature ::= params-type params-type throws? // results and parameters

params-type := type                        // tuple in case of multiple parameters
params-type := empty-list                  // shortcut for no parameters

throws ::= 'K'                             // 'throws' annotation on function types

type-list ::= list-type '_' list-type* 'd'?  // list of types with optional variadic specifier
type-list ::= empty-list

list-type ::= type identifier? 'z'?        // type with optional label and inout convention

METATYPE-REPR ::= 't'                      // Thin metatype representation
METATYPE-REPR ::= 'T'                      // Thick metatype representation
METATYPE-REPR ::= 'o'                      // ObjC metatype representation

type ::= archetype
type ::= associated-type
type ::= nominal-type
type ::= protocol-list 'p'                 // existential type
type ::= type-list 't'                     // tuple
type ::= type generic-signature 'u'        // generic type
type ::= 'x'                               // generic param, depth=0, idx=0
type ::= 'q' GENERIC-PARAM-INDEX           // dependent generic parameter
type ::= type assoc-type-name 'qa'         // associated type of non-generic param
type ::= assoc-type-name 'Qy' GENERIC-PARAM-INDEX  // associated type
type ::= assoc-type-name 'Qz'                      // shortcut for 'Qyz'
type ::= assoc-type-list 'QY' GENERIC-PARAM-INDEX  // associated type at depth
type ::= assoc-type-list 'QZ'                      // shortcut for 'QYz'

protocol-list ::= protocol '_' protocol*
protocol-list ::= empty-list

assoc-type-list ::= assoc-type-name '_' assoc-type-name*

archetype ::= context 'Qq' INDEX           // archetype+context (DWARF only)
archetype ::= associated-type

associated-type ::= substitution
associated-type ::= protocol 'QP'          // self type of protocol
associated-type ::= archetype identifier 'Qa' // associated type

assoc-type-name ::= identifier                // associated type name without protocol
assoc-type-name ::= identifier protocol 'P'   //

empty-list ::= 'y'

Associated types use an abbreviated mangling when the base generic parameter or associated type is constrained by a single protocol requirement. The associated type in this case can be referenced unambiguously by name alone. If the base has multiple conformance constraints, then the protocol name is mangled in to disambiguate.

impl-function-type ::= type* 'I' FUNC-ATTRIBUTES '_'
impl-function-type ::= type* generic-signature 'I' PSEUDO-GENERIC? FUNC-ATTRIBUTES '_'

FUNC-ATTRIBUTES ::= CALLEE-CONVENTION? FUNC-REPRESENTATION PARAM-CONVENTION* RESULT-CONVENTION* ('z' RESULT-CONVENTION)

PSEUDO-GENERIC ::= 'P'

CALLEE-CONVENTION ::= 'y'                  // @callee_unowned
CALLEE-CONVENTION ::= 'g'                  // @callee_guaranteed
CALLEE-CONVENTION ::= 'x'                  // @callee_owned
CALLEE-CONVENTION ::= 't'                  // thin

FUNC-REPRESENTATION ::= 'B'                // C block invocation function
FUNC-REPRESENTATION ::= 'C'                // C global function
FUNC-REPRESENTATION ::= 'M'                // Swift method
FUNC-REPRESENTATION ::= 'J'                // ObjC method
FUNC-REPRESENTATION ::= 'K'                // closure
FUNC-REPRESENTATION ::= 'W'                // protocol witness

PARAM-CONVENTION ::= 'i'                   // indirect in
PARAM-CONVENTION ::= 'l'                   // indirect inout
PARAM-CONVENTION ::= 'b'                   // indirect inout aliasable
PARAM-CONVENTION ::= 'n'                   // indirect in guaranteed
PARAM-CONVENTION ::= 'x'                   // direct owned
PARAM-CONVENTION ::= 'y'                   // direct unowned
PARAM-CONVENTION ::= 'g'                   // direct guaranteed
PARAM-CONVENTION ::= 'e'                   // direct deallocating

RESULT-CONVENTION ::= 'r'                  // indirect
RESULT-CONVENTION ::= 'o'                  // owned
RESULT-CONVENTION ::= 'd'                  // unowned
RESULT-CONVENTION ::= 'u'                  // unowned inner pointer
RESULT-CONVENTION ::= 'a'                  // auto-released

For the most part, manglings follow the structure of formal language types. However, in some cases it is more useful to encode the exact implementation details of a function type.

The type* list contains parameter and return types (including the error result), in that order. The number of parameters and results must match with the number of <PARAM-CONVENTION> and <RESULT-CONVENTION> characters after the <FUNC-REPRESENTATION>. The <generic-signature> is used if the function is polymorphic.

Generics

protocol-conformance ::= type protocol module generic-signature?

<protocol-conformance> refers to a type's conformance to a protocol. The named module is the one containing the extension or type declaration that declared the conformance.

protocol-conformance ::= context identifier protocol identifier generic-signature?  // Property behavior conformance

Property behaviors are implemented using private protocol conformances.

generic-signature ::= requirement* 'l'     // one generic parameter
generic-signature ::= requirement* 'r' GENERIC-PARAM-COUNT* 'l'

GENERIC-PARAM-COUNT ::= 'z'                // zero parameters
GENERIC-PARAM-COUNT ::= INDEX              // N+1 parameters

requirement ::= protocol 'R' GENERIC-PARAM-INDEX                  // protocol requirement
requirement ::= protocol assoc-type-name 'Rp' GENERIC-PARAM-INDEX // protocol requirement on associated type
requirement ::= protocol assoc-type-list 'RP' GENERIC-PARAM-INDEX // protocol requirement on associated type at depth
requirement ::= protocol substitution 'RQ'                        // protocol requirement with substitution
requirement ::= type 'Rb' GENERIC-PARAM-INDEX                     // base class requirement
requirement ::= type assoc-type-name 'Rc' GENERIC-PARAM-INDEX     // base class requirement on associated type
requirement ::= type assoc-type-list 'RC' GENERIC-PARAM-INDEX     // base class requirement on associated type at depth
requirement ::= type substitution 'RB'                            // base class requirement with substitution
requirement ::= type 'Rs' GENERIC-PARAM-INDEX                     // same-type requirement
requirement ::= type assoc-type-name 'Rt' GENERIC-PARAM-INDEX     // same-type requirement on associated type
requirement ::= type assoc-type-list 'RT' GENERIC-PARAM-INDEX     // same-type requirement on associated type at depth
requirement ::= type substitution 'RS'                            // same-type requirement with substitution
requirement ::= type 'Rl' GENERIC-PARAM-INDEX LAYOUT-CONSTRAINT   // layout requirement
requirement ::= type assoc-type-name 'Rm' GENERIC-PARAM-INDEX LAYOUT-CONSTRAINT    // layout requirement on associated type
requirement ::= type assoc-type-list 'RM' GENERIC-PARAM-INDEX LAYOUT-CONSTRAINT    // layout requirement on associated type at depth
requirement ::= type substitution 'RM' LAYOUT-CONSTRAINT                           // layout requirement with substitution

GENERIC-PARAM-INDEX ::= 'z'                // depth = 0,   idx = 0
GENERIC-PARAM-INDEX ::= INDEX              // depth = 0,   idx = N+1
GENERIC-PARAM-INDEX ::= 'd' INDEX INDEX    // depth = M+1, idx = N

LAYOUT-CONSTRAINT ::= 'N'  // NativeRefCountedObject
LAYOUT-CONSTRAINT ::= 'R'  // RefCountedObject
LAYOUT-CONSTRAINT ::= 'T'  // Trivial
LAYOUT-CONSTRAINT ::= 'E' LAYOUT-SIZE-AND-ALIGNMENT  // Trivial of exact size
LAYOUT-CONSTRAINT ::= 'e' LAYOUT-SIZE  // Trivial of exact size
LAYOUT-CONSTRAINT ::= 'M' LAYOUT-SIZE-AND-ALIGNMENT  // Trivial of size at most N bits
LAYOUT-CONSTRAINT ::= 'm' LAYOUT-SIZE  // Trivial of size at most N bits
LAYOUT-CONSTRAINT ::= 'U'  // Unknown layout

LAYOUT-SIZE ::= INDEX // Size only
LAYOUT-SIZE-AND-ALIGNMENT ::= INDEX INDEX // Size followed by alignment

A generic signature begins with an optional list of requirements. The <GENERIC-PARAM-COUNT> describes the number of generic parameters at each depth of the signature. As a special case, no <GENERIC-PARAM-COUNT> values indicates a single generic parameter at the outermost depth:

x_xCru                           // <T_0_0> T_0_0 -> T_0_0
d_0__xCr_0_u                     // <T_0_0><T_1_0, T_1_1> T_0_0 -> T_1_1

A generic signature must only precede an operator character which is different from any character in a <GENERIC-PARAM-COUNT>.

Identifiers

identifier ::= substitution
identifier ::= NATURAL IDENTIFIER-STRING   // identifier without word substitutions
identifier ::= '0' IDENTIFIER-PART         // identifier with word substitutions

IDENTIFIER-PART ::= NATURAL IDENTIFIER-STRING
IDENTIFIER-PART ::= [a-z]                  // word substitution (except the last one)
IDENTIFIER-PART ::= [A-Z]                  // last word substitution in identifier

IDENTIFIER-STRING ::= IDENTIFIER-START-CHAR IDENTIFIER-CHAR*
IDENTIFIER-START-CHAR ::= [_a-zA-Z]
IDENTIFIER-CHAR ::= [_$a-zA-Z0-9]

<identifier> is run-length encoded: the natural indicates how many characters follow. Operator characters are mapped to letter characters as given. In neither case can an identifier start with a digit, so there's no ambiguity with the run-length.

If the run-length start with a 0 the identifier string contains word substitutions. A word is a sub-string of an identifier which contains letters and digits [A-Za-z0-9]. Words are separated by underscores _. In addition a new word begins with an uppercase letter [A-Z] if the previous character is not an uppercase letter:

Abc1DefG2HI          // contains four words 'Abc1', 'Def' and 'G2' and 'HI'
_abc1_def_G2hi       // contains three words 'abc1', 'def' and G2hi

The words of all identifiers, which are encoded in the current mangling are enumerated and assigned to a letter: a = first word, b = second word, etc.

An identifier containing word substitutions is a sequence of run-length encoded sub-strings and references to previously mangled words. All but the last word-references are lowercase letters and the last one is an uppercase letter. If there is no literal sub-string after the last word-reference, the last word-reference is followed by a 0.

Let's assume the current mangling already encoded the identifier AbcDefGHI:

02Myac1_B    // expands to: MyAbcGHI_Def

A maximum of 26 words in a mangling can be used for substitutions.

identifier ::= '00' natural '_'? IDENTIFIER-CHAR+  // '_' is inserted if the identifier starts with a digit or '_'.

Identifiers that contain non-ASCII characters are encoded using the Punycode algorithm specified in RFC 3492, with the modifications that _ is used as the encoding delimiter, and uppercase letters A through J are used in place of digits 0 through 9 in the encoding character set. The mangling then consists of an 00 followed by the run length of the encoded string and the encoded string itself. For example, the identifier vergüenza is mangled to 0012vergenza_JFa. (The encoding in standard Punycode would be vergenza-95a)

If the encoded string starts with a digit or an _, an additional _ is inserted between the run length and the encoded string.

identifier ::= identifier 'o' OPERATOR-FIXITY

OPERATOR-FIXITY ::= 'p'                    // prefix operator
OPERATOR-FIXITY ::= 'P'                    // postfix operator
OPERATOR-FIXITY ::= 'i'                    // infix operator

OPERATOR-CHAR ::= 'a'                      // & 'and'
OPERATOR-CHAR ::= 'c'                      // @ 'commercial at'
OPERATOR-CHAR ::= 'd'                      // / 'divide'
OPERATOR-CHAR ::= 'e'                      // = 'equals'
OPERATOR-CHAR ::= 'g'                      // > 'greater'
OPERATOR-CHAR ::= 'l'                      // < 'less'
OPERATOR-CHAR ::= 'm'                      // * 'multiply'
OPERATOR-CHAR ::= 'n'                      // ! 'not'
OPERATOR-CHAR ::= 'o'                      // | 'or'
OPERATOR-CHAR ::= 'p'                      // + 'plus'
OPERATOR-CHAR ::= 'q'                      // ? 'question'
OPERATOR-CHAR ::= 'r'                      // % 'remainder'
OPERATOR-CHAR ::= 's'                      // - 'subtract'
OPERATOR-CHAR ::= 't'                      // ~ 'tilde'
OPERATOR-CHAR ::= 'x'                      // ^ 'xor'
OPERATOR-CHAR ::= 'z'                      // . 'zperiod'

If an identifier is followed by an o its text is interpreted as an operator. Each lowercase character maps to an operator character (OPERATOR-CHAR).

Operators that contain non-ASCII characters are mangled by first mapping the ASCII operator characters to letters as for pure ASCII operator names, then Punycode-encoding the substituted string. For example, the infix operator «+» is mangled to 007p_qcaDcoi (p_qcaDc being the encoding of the substituted string «p»).

Substitutions

substitution ::= 'A' INDEX                  // substitution of N+26
substitution ::= 'A' [a-z]* [A-Z]           // One or more consecutive substitutions of N < 26

<substitution> is a back-reference to a previously mangled entity. The mangling algorithm maintains a mapping of entities to substitution indices as it runs. When an entity that can be represented by a substitution (a module, nominal type, or protocol) is mangled, a substitution is first looked for in the substitution map, and if it is present, the entity is mangled using the associated substitution index. Otherwise, the entity is mangled normally, and it is then added to the substitution map and associated with the next available substitution index.

For example, in mangling a function type (zim.zang.zung, zim.zang.zung, zim.zippity) -> zim.zang.zoo (with module zim and class zim.zang), the recurring contexts zim, zim.zang, and zim.zang.zung will be mangled using substitutions after being mangled for the first time. The first argument type will mangle in long form, 3zim4zang4zung, and in doing so, zim will acquire substitution AA, zim.zang will acquire substitution AB, and zim.zang.zung will acquire AC. The second argument is the same as the first and will mangle using its substitution, AC. The third argument type will mangle using the substitution for zim, AA7zippity. (It also acquires substitution AD which would be used if it mangled again.) The result type will mangle using the substitution for zim.zang, AB3zoo (and acquire substitution AE).

There are some pre-defined substitutions, see <known-nominal-type>.

If the mangling contains two or more consecutive substitutions, it can be abbreviated with the A substitution. Similar to word-substitutions the index is encoded as letters, whereas the last letter is uppercase:

AaeB      // equivalent to A_A4_A0_

Numbers and Indexes

INDEX ::= '_'                               // 0
INDEX ::= NATURAL '_'                       // N+1
NATURAL ::= [1-9] [0-9]*
NATURAL_ZERO ::= [0-9]+

<INDEX> is a production for encoding numbers in contexts that can't end in a digit; it's optimized for encoding smaller numbers.

Function Specializations

specialization ::= type '_' type* 'Tg' SPEC-INFO     // Generic re-abstracted specialization
specialization ::= type '_' type* 'TG' SPEC-INFO     // Generic not re-abstracted specialization

The types are the replacement types of the substitution list.

specialization ::= type 'Tp' SPEC-INFO // Partial generic specialization
specialization ::= type 'TP' SPEC-INFO // Partial generic specialization, not re-abstracted

The type is the function type of the specialized function.

specialization ::= spec-arg* 'Tf' SPEC-INFO UNIQUE-ID? ARG-SPEC-KIND* '_' ARG-SPEC-KIND  // Function signature specialization kind

The <ARG-SPEC-KIND> describes how arguments are specialized. Some kinds need arguments, which precede Tf.

spec-arg ::= identifier
spec-arg ::= type

SPEC-INFO ::= FRAGILE? PASSID

PASSID ::= '0'                             // AllocBoxToStack,
PASSID ::= '1'                             // ClosureSpecializer,
PASSID ::= '2'                             // CapturePromotion,
PASSID ::= '3'                             // CapturePropagation,
PASSID ::= '4'                             // FunctionSignatureOpts,
PASSID ::= '5'                             // GenericSpecializer,

FRAGILE ::= 'q'

UNIQUE-ID ::= NATURAL                      // Used to make unique function names

ARG-SPEC-KIND ::= 'n'                      // Unmodified argument
ARG-SPEC-KIND ::= 'c'                      // Consumes n 'type' arguments which are closed over types in argument order
                                           // and one 'identifier' argument which is the closure symbol name
ARG-SPEC-KIND ::= 'p' CONST-PROP           // Constant propagated argument
ARG-SPEC-KIND ::= 'd' 'G'? 'X'?            // Dead argument, with optional owned=>guaranteed or exploded-specifier
ARG-SPEC-KIND ::= 'g' 'X'?                 // Owned => Guaranteed,, with optional exploded-specifier
ARG-SPEC-KIND ::= 'x'                      // Exploded
ARG-SPEC-KIND ::= 'i'                      // Box to value
ARG-SPEC-KIND ::= 's'                      // Box to stack

CONST-PROP ::= 'f'                         // Consumes one identifier argument which is a function symbol name
CONST-PROP ::= 'g'                         // Consumes one identifier argument which is a global symbol name
CONST-PROP ::= 'i' NATURAL_ZERO            // 64-bit-integer
CONST-PROP ::= 'd' NATURAL_ZERO            // float-as-64-bit-integer
CONST-PROP ::= 's' ENCODING                // string literal. Consumes one identifier argument.

ENCODING ::= 'b'                           // utf8
ENCODING ::= 'w'                           // utf16
ENCODING ::= 'c'                           // utf16

If the first character of the string literal is a digit [0-9] or an underscore _, the identifier for the string literal is prefixed with an additional underscore _.