Type System Overview
Author: Michal Strehovsky (@MichalStrehovsky) - 2016
The type system is a major component of CoreRT. It represents the modules, types, methods, and fields within a program and provides higher level services to the type system users that lets them get answers to various interesting questions.
Some of the high level services the type system provides are:
- Loading new types from the metadata
- Computing set of interfaces implemented by a specific type
- Computing static and instance field layout (assigning offsets to individual fields)
- Computing static and instance GC layout of types (identifying GC pointers within object/class data)
- Computing VTable layout (assigning slots to virtual methods) and resolving virtual methods to slots
- Deciding whether a type can be stored to a location of another type
Three major themes drive the design of the type system:
- Low overhead and high performance
- Extensibility and reusability
Low overhead is achieved by lazy loading - instead of eagerly populating the types with fields, various attributes, names, etc. these are read on demand from the underlying data source (metadata). Caching is used conservatively.
Where necessary, partial classes, extension methods, and pluggable algorithms are used to achieve goal 3 instead of polymorphism and object hierarchies. The reusability of the type system is at the source level (source-including different sets of files to get different features). This allows extensibility without making sacrifices that would take us away from goal 1.
The type system in its purest form (i.e. without any partial class extensions) tries to avoid introducing concepts that are not defined in the ECMA-335 specification. The specification is a suggested prerequisite reading to this document and provides definitions to various terms used in this document.
Relationship with metadata
While metadata (such as the file formats described in the ECMA-335 specification) has a close relationship with the type system, there is a clear distinction between these two: the metadata describes physical shape of the type (e.g. what is the base class of the type; or what fields does it have), but the type system builds higher level concepts on top of the shape (e.g. how many bytes are required to store an instance of the type at runtime; what interfaces does the type implement, including the inherited ones).
The type system provides access to most of the underlying metadata, but abstracts the way it was obtained. This allows types and members that are backed by metadata in other formats, or in no physical format at all (such as methods on array types), to be representable within the same type system context.
Type system class hierarchy
The classes that represent types within the type system are:
Most of the classes in this hierarchy are not supposed to be derived by the type system user and many of them are sealed to prevent that.
The classes that are extensible (and are actually abstract classes) are shown with dark background above. The concrete class should provide implementation of the abstract and virtual methods based on some logic, such as reading metadata from an ECMA-335 module file (the type system already provides such implementation of
MetadataType in its
EcmaType, for example). Ideally, the type system consumers should operate on the abstract classes and use the concrete class only when creating a new instance. Casting to the concrete implementation type such as
EcmaType is discouraged.
Type system classes
Following section goes briefly over the classes representing types within the type system.
TypeDesc is the base class of all types within the type system. It defines a list of operations all classes must support. Not all operations might make sense for all the children of
TypeDesc (for example, it doesn't make sense to request a list of methods on a pointer type), but care is taken to provide an implementation that makes sense for each particular child (i.e. the list of methods on a pointer type is empty).
ParameterizedType (ArrayType, ByRefType, PointerType)
These are constructed types with a single parameter:
- an array (either multi-dimensional, or a vector - a single dimensional array with an implicit zero lower bound),
- a managed reference, or
- an unmanaged pointer type.
Note the distinction between multidimensional arrays of rank 1 and vectors is a crucial one, and a source of potential bugs for the type system users. Type system users should take special care.
DefType (NoMetadataType, MetadataType)
DefType represents a value type, interface, or a class. While most instances of
DefType will be of children of
MetadataType (a type that is based off of some concrete metadata describing the type in full), there will be scenarios where full metadata is no longer available. In those cases, only restricted information (such as the number of bytes occupied by the instance of the type on the GC heap, or whether the type is a value type) is available. It is important that the type system is able to operate on such types. E.g. it should be possible for a type with restricted metadata to be a base type for a type with full metadata and the field layout algorithm should be able to compute the field layout of such a type.
Represents a generic parameter, along with its constraints. Generic definitions are represented as instantiations over generic parameters.
Note for readers familiar with the .NET reflection type system: while the .NET reflection type system doesn't distinguish between a generic definition (e.g.
List<T>) and an open instantiation of a generic type (e.g.
List<!0>), the CoreRT type system draws a distinction between those two. This distinction is important when representing member references from within IL method bodies - e.g. an IL reference using an LDTOKEN instruction to
List<T>.Add should always refer to the uninstantiated definition, while a reference to
List<!0>.Add will refer to a concrete method after substituting the signature variable.
SignatureVariable (SignatureTypeVariable, SignatureMethodVariable)
Signature variables represent variables that can be substituted by other types within the system. They differ from generic parameters (because e.g. they don't have constraints or variance). They are simply placeholders to be replaced by other types as part of a process called instantiation. Signature variables have an index that refers to a position within the instantiation context.
Other type system classes
Each use of a type system starts with creating a type system context. A type system context represents a type universe across which all types share reference identity (two
TypeDesc objects represent identical types if and only if they are the same object instance). Type system context is used to resolve all modules and constructed types within the universe. It's not legal to create new instances of constructed types outside of the type system context.
Other important classes within the type system are a
MethodDesc (represents a method within the type system) and
FieldDesc (represents a field within the type system). A
ModuleDesc describes a single module which can optionally implement
IAssemblyDesc interface if the module is an assembly.
ModuleDesc is typically the owner of the type/method/field definitions within the module. It's the responsibility of the
ModuleDesc to maintain the reference identity of those.
Most algorithms (e.g. the field layout algorithm) provided by the type system are pluggable. The type system context can influence the choice of the algorithm by providing different implementations of it.
The algorithms are used as an extensibility mechanism in places where partial classes and source inclusion wouldn't be sufficient. The choice of the particular algorithm might depend on multiple factors and the type system user might want to use multiple algorithms depending on a certain set of conditions determined at runtime (e.g. computing the list of runtime interfaces of regular
DefTypes vs. the runtime interfaces of array types).
Hash codes within the type system
An interesting property of the type system lays in its ability to compute hash codes that can be reliably computed for any type or method represented within the system at compile time and at runtime. Having the same hash code available at both compile time and runtime is leveraged to build high performance lookup tables used by the CoreRT runtime. The hash code is computed from type names and gets preserved as part of the runtime data structures so that it's available in situations when the type name has been optimized away by the compiler.
Throwing exceptions from the type system
Throwing an exception from within the type system is a bit more involved than a simple
throw statement. This is because the type system is designed to be usable in many places and each could have a different requirement about how exceptions are thrown. For example, when the type system is included from the runtime, a
System.TypeLoadException should be thrown when type loading fails. On the other hand, if a type loading error occurs in a compiler or IL verifier, a
System.TypeLoadException would be indistinguishable from an actual problem with the managed assemblies that comprise the compiler. Therefore a different exception should be thrown.
Exception throwing within the type system is wrapped in a
ThrowHelper class. The consumer of the type system provides a definition of this class and its methods. The methods control what exception type will be thrown.
The type system provides a default implementation of the
ThrowHelper class that throws exceptions deriving from a
TypeSystemException exception base class. This default implementation is suitable for use in non-runtime scenarios.
The exception messages are assigned string IDs and get consumed by the throw helper as well. We require this indirection to support the compiler scenarios: when a type loading exception occurs during an AOT compilation, the AOT compiler has two tasks - emit a warning to warn the user that this occured, and potentially generate a method body that will throw this exception at runtime when the problematic type is accessed. The localization of the compiler might not match the localization of the class library the compiler output is linking against. Indirecting the actual exception message through the string ID lets us wrap this. The consumer of the type system may reuse the throw helper in places outside the type system where this functionality is needed.
The type system implementation is found in:
src/Common/src/TypeSystem/Common: most of the common type system is here
src/Common/src/TypeSystem/Ecma: concrete implementations of
FieldDescetc. that read metadata from ECMA-335 module files is here
src/ILCompiler.TypeSystem/tests: unit tests that shed some light into the operation and features of the type system. This is a good starting point to learn about the code.