Add Comments HERE
Motivations & Concerns (What goals must this design achieve? What are the concerns that this design addresses?)
The primary goal of this document is to specify the Itanium C++ ABI for the contract entrypoint function. By fully specifying the ABI, we ensure interoperability between different compilers (GCC and Clang) and standard libraries (libc++ and libstdc++).
The ABI proposed in this document is designed to be:
Future changes cannot break existing code
The ABI cannot preclude future extensions.
- Code generation at the call site must be minimal and not affect other code.
- Allow users control over object size increases generated by contracts.
This section briefly describes the motivations and concerns that this design addresses. It assumes most readers of this document have heard Eric and Josh drone on about these ad nauseam.
libc++ and libstdc++ must support contracts generated by both GCC and Clang. The compiler must generate the same calls to the runtime entrypoint function, regardless of which runtime it's targeting (and often it doesn't know. MSVC not withstanding)
Therefore, this specification aims to define a portable ABI for the entrypoint function, which is used by both compilers.
- Currently needed data
std::source_location
- Source text
- Assertion kind (pre/post/contract_assert)
- Evaluation semantic (enforced/observed)
- Failure kind (exception thrown / assertion failed)
Data | Data Type | Static/Dynamic | Description |
---|---|---|---|
Source location | std::source_location |
Static | Location of the contract in the source code. |
Source text | const char* |
Static | The source text of the contract assertion. |
Assertion kind | std::assertion_kind |
Static | pre/post/contract_assert (may be parsed from source string?) |
Evaluation semantic | std::evaluation_semantic |
Static or Dynamic | In future, may be a runtime property, must support both modes |
Detection mode | std::detection_mode |
Static or Dynamic | Known at code generation time, but storing in static storage requires duplicating the data. |
- Future needed data (likely)
- Custom labels to identify or group the contract
- Custom violation handler for the contract
- Custom source text (in addition to the source text, or as a replacement for)
Example:
#define CONTRACT(assertion, message) \ contract_assert [[clang::assertion_message(message)]] (assertion) void f(int x) { CONTRACT(x > 0, "x must be positive"); }
-
The compiler must generate a call to the entrypoint function without seeing it, or even knowing which runtime it will eventually be linked against.
-
At the contract violation site, the more arguments the entrypoint function takes, the worse the code generation will be, not only for the contract violation, but also for the surrounding code.
Therefore, we want to minimize the number of arguments the entrypoint function takes. Encoding the necessary data either into a single chunk of static storage, or when possible encoding the data in the violation handler name itself.
For example, the nature of the violation (whether it is an exception thrown or an assertion failed), is not encodable in the static data without duplicating the data in full. Therefore, we
This document proposes two generic signatures for the entrypoint function.
extern "C"
void __handle_contract_violation(
// A descriptor and it's matching data,
// for data which can be stored in the data segment of the binary.
descriptor_t *static_descriptor,
void *static_data,
// predicate_false/evaluation_exception
// Special case because it's always needed.
std::detection_mode mode,
// evaluation_semantic. Currently, implementations only support compile-time
// evaluation semantics, but this may change in the future.
std::evaluation_semantic semantic,
// Dynamic data, which is only known at runtime. Because the data is dynamic,
// it doesn't make sense to emit the descriptor statically, so instead
// a descriptor is attached to each piece of dynamic data in-line.
//
runtime_data_t *dynamic_data,
)
Implementations should provide as a fallback the following signatures, which call the generic entrypoint function with the appropriate arguments.
The table belowe describes the manual mangling of the entrypoint names, the data types and values are mangled into the function names using the following mangling abbreviation scheme. All function signatures accept a static descriptor and static data, which are not encoded in the name.
Data Type | Value | Mangling | Order | Optional |
---|---|---|---|---|
std::detection_mode |
m |
0 | N | |
std::detection_mode |
predicate_false |
pf |
0 | . |
std::detection_mode |
evaluation_exception |
pe |
0 | . |
std::evaluation_semantic |
s |
1 | Y (may also be passed as static data) | |
std::evaluation_semantic |
observed |
so |
1 | . |
std::evaluation_semantic |
enforced |
se |
1 | . |
'runtime_data_t' | r |
2 | N |
(1) If the signature contains a argument of a particular type, the single letter encoding is appended to the function name, in the order specified in the table above. (2) If the signature encodes the value of a particular type in the name, the multi letter encoding is appended to the function name, in the order specified in the table above. No argument of that type is passed to the function. (3) If the signature does not encode a particular type, no encoding is appended to the function name. Instead, the function will use the default value for that type when invoking the generic entrypoint function.
The value of std::detection_mode
and std::evaluation_semantic
must be passed or encoded in all signatures.
The runtime_data_t
parameter may be omitted, and the function will forward it as nullptr
to the generic entrypoint function.
The initial runtime implementation must provide the following overloads of the entrypoint function, in addition to the generic entrypoint function.
// predicate_false, evaluation_semantic::enforced
extern "C" void __handle_contract_violation_pf_se(
descriptor_t *static_descriptor,
void *static_data,
);
// predicate_false, evaluation_semantic::observed
extern "C" void __handle_contract_violation_pf_so(
descriptor_t *static_descriptor,
void *static_data,
);
// evaluation_exception, evaluation_semantic::enforced
extern "C" void __handle_contract_violation_pe_se(
descriptor_t *static_descriptor,
void *static_data,
);
// evaluation_exception, evaluation_semantic::observed
extern "C" void __handle_contract_violation_pe_so(
descriptor_t *static_descriptor,
void *static_data,
);
// predicate_false, evaluation_semantic::enforced, runtime data
extern "C" void __handle_contract_violation_pf_se_r(
descriptor_t *static_descriptor,
void *static_data,
runtime_data_t *runtime_data
);
// predicate_false, evaluation_semantic::observed, runtime data
extern "C" void __handle_contract_violation_pf_so_r(
descriptor_t *static_descriptor,
void *static_data,
runtime_data_t *runtime_data
);
// evaluation_exception, evaluation_semantic::enforced
extern "C" void __handle_contract_violation_pe_se_r(
descriptor_t *static_descriptor,
void *static_data,
runtime_data_t *runtime_data
);
// evaluation_exception, evaluation_semantic::observed
extern "C" void __handle_contract_violation_pe_so_r(
descriptor_t *static_descriptor,
void *static_data,
runtime_data_t *runtime_data
);
The current proposal only mandates additional signatures which encode the
std::evaluation_semantic
or std::detection_mode
in the name, but proposes a mangling
to allow the passing of these values as arguments.
The intention is to accommodate future extensions of the C++ standard, which may add new values to these enumerations.
The quality of the generated code depends on the presences of the wrapper overloads to generate efficient code at for each contract. If new features are added to the C++ standard which require additional function signatures, the compiler may not know if the runtime supports the new overloads.
In this case, the compiler should either:
- Use the generic entrypoint function, or
- Emit a weak or internal definition for the new overloads itself (these should be easy to omit, since they just re-arrange the arguments).
We believe the ability for the compiler to emit it's own definitions is critical for the success of this design, as it allows both efficient code generation and future extensibility.
When a program mixes code compiled with and without exceptions, bad things can happen. Yet, we should consider supporting this use case. As such, we may need to additionally encode whether the contract violation occured in a context where exceptions are not enabled.
This would allow the runtime to prevent exceptions thrown by the users violation handler from propagating to the caller, which isn't compiled to handle exceptions.
If we decide to support this, the additional mangling would be as follows:
Additionally, we may need the following additional manglings to support code compiled with
-fno-exceptions
, which will not safely tolerate an exception being thrown from the user provided violation handler.
Case | Value | Mangling | Order | Optional |
---|---|---|---|---|
Exeptions Enabled | False | n |
3 | Y |
The encoding of the function signature in the name is done as follows:
The C++ standard doesn't specify the exact size or layout of the data types used in the contract violation object. However, the Itanium C++ ABI must specify the exact size and layout for these types, and for enumerators, the exact values as well.
This document specifies the "itanium representation" of the standard library types used in the contract violation object, which are used when passing these types to the entrypoint function.
The types in question, and their corresponding "itanium representation" are:
Standard Type | Itanium Representation | Underlying Type in Itanium |
---|---|---|
std::source_location |
source_location_ptr_t | See Below |
std::assertion_kind |
assertion_kind_t | uint8_t |
std::evaluation_semantic |
evaluation_semantic_t | uint8_t |
std::detection_mode |
detection_mode_t | uint8_t |
With the exception of std::source_location
, this document places no requirements types or
values of the standard library types. It instead specifies a coorisponding "itanium representation"
which should be used when passing these types to the entrypoint function.
std::source_location
contains a single pointer to it's data, which itself has the
following layout for both libc++ and libstdc++.
struct _SourceLoc {
const char* file_name;
const char* function_name;
unsigned line;
unsigned column;
};
This section specifies the values to use when passing an enumerator to the entrypoint function. In addition to the standard library enumerators, this section also specifies the values for unspecified/uninitialized enumerators (which may or may not be useful in practice).
Enumerator Value | Itanium Representation |
---|---|
Not specified | 0x00 |
std::assertion_kind::pre |
0x01 |
std::assertion_kind::post |
0x02 |
std::assertion_kind::contract_assert |
0x03 |
Enumerator Value | Itanium Representation |
---|---|
Not specified | 0x00 |
std::evaluation_semantic::enforced |
0x01 |
std::evaluation_semantic::observed |
0x02 |
Enumerator Value | Itanium Representation |
---|---|
Not specified | 0x00 |
std::detection_mode::predicate_false |
0x01 |
std::detection_mode::evaluation_exception |
0x02 |
The static data descriptor is a fully-specified structure which can fully specify and identify the data pointed to by the static data argument.
There goals for the static data descriptor are:
- Easily identify the layout of the "standard" or "required data" in an efficient manner.
- Allow for future extensions to the static data descriptor, without breaking existing code.
- Allow size/security-concerned users to strip bits of the static data descriptor they don't need.
The static data descriptor is specified in two parts:
- The
// We may need this, we may not. It aims to support vendor-specific extensions in a
// way that doesn't interfere with other vendors and their extensions.
//
// Suggestion: New vendors should hash the name of their runtime dylib and use the hash (truncated to 4 bits as the vendor ID).
enum vendor_it_t : uint8_t {
VENDOR_GENERIC = 0x00, // Generic, no vendor-specific data.
VENDOR_IT_CLANG = 0x01, // Clang
VENDOR_IT_GCC = 0x02, // GCC
VENDOR_IT_MSVC = 0x03, // MSVC
// Future vendors can be added here.
};
struct descriptor_table_t {
// The size of the static data, in bytes.
uint8_t version; // in case we need it.
// The number of entries in the descriptor.
uint8_t num_entries;
// The entries in the descriptor.
// Each entry describes a single piece of data in the static data.
base_descriptor_entry_t *entries[];
};
enum descriptor_entry_kind_t : uint8_t {
// unknown/reserved = 0x00, // Unknown or reserved type.
// default summary representation, containing source location, source text, and assertion kind (see below)
summary = 0x01, // A summary descriptor, which contains the source location, source text, and assertion kind.
// builtins
// a pointer to a _SourceLoc
source_location_ptr = 0x11,
// A _SourceLoc inline structure.
source_location_inline = 0x12,
// A pointer to a null-terminated string
source_text = 0x13,
// The kind of assertion, such as pre/post/contract_assert.
assertion_kind = 0x14,
// reserved = 0x21, // Reserved for future use.
// reserved = 0x22, // Reserved for future use.
// reserved = 0x2F, // Reserved for future use.
extended = 0x30, // Extended descriptor entry, of type `extended_descriptor_entry_t`
vendor = 0x40, // Vendor-specific descriptor, of type `vendor_extended_descriptor_entry_t`
};
struct base_descriptor_entry_t {
// The type of the data.
// This is a vendor-specific type, which can be used to identify the data.
descriptor_entry_kind_t descripton_type;
// The offset of the data in the static data from the start of the static data.
uint16_t offset;
};
struct extended_descriptor_entry_t : base_descriptor_entry_t {
// The type of the data.
// This is a standard type, such as `std::source_location`, `std::assertion_kind`, etc.
// The type is specified by the `descriptor_entry_kind_t` enumeration.
// The size of the data, in bytes.
uint16_t size;
const char* data_type; // Or some other representation of the type
// The name of the data, which can be used to identify the data.
const char *name; // The name of the data, which can be used to identify the data.
};
struct vendor_extended_descriptor_entry_t : base_descriptor_entry_t {
// The type of the data.
// This is a vendor-specific type, which can be used to identify the data.
uint8_t vendor_id ; // The vendor ID, which can be used to identify the data.
// Whatever the fudge the vendor wants to put here.
};
The extended and vendor specific descriptor tables are not required for the initial implementation, but they are provided to allow for future extensibility and vendor-specific extensions (or at least to provide an idea of how to do it).
Further, this document proposes a default layout for the needed static data, which can be used to identify the entirey of the data in a single descriptor entry.
One possible layout is as follows:
Type | Offset in Static Data | Size in Bytes |
---|---|---|
_SourceLoc pointer |
0 | sizeof(void*) |
const char* (source text) |
sizeof(void*) | sizeof(void*) |
std::assertion_kind |
sizeof(void*) * 2 | sizeof(uint8_t) |
Implementations could omit the source location or source text by providing a null pointer, or by using a more complex descriptor table representation.
The most basic summary descriptor table is as follows:
base_descriptor_entry_t summary_entry = {
.descripton_type = descriptor_entry_kind_t::summary,
.offset = 0,
};
descriptor_table_t descriptor_table_summary = {
.version = 1,
.num_entries = 1,
.entries = {
&summary_entry,
}
};
This would describe the same data layout as the more-detailed descriptor table below, but in a more compact form.
base_descriptor_entry_t source_location_ptr_entry = {
.descripton_type = descriptor_entry_kind_t::source_location_ptr,
.offset = 0,
};
base_descriptor_entry_t source_text_entry = {
.descripton_type = descriptor_entry_kind_t::source_text,
.offset = sizeof(void*),
};
base_descriptor_entry_t assertion_kind_entry = {
.descripton_type = descriptor_entry_kind_t::assertion_kind,
.offset = sizeof(void*) * 2,
};
descriptor_table_t default_descriptor = {
.version = 1,
.num_entries = 3,
.entries = {
&source_location_ptr_entry,
&source_text_entry,
&assertion_kind_entry,
}
};
The runtime data and descriptor can be specified at a later date, as long as the generic entrypoint function is defined to accept them.