Skip to content
This repository has been archived by the owner on Mar 15, 2022. It is now read-only.

Reader: support union types #275

Closed
AndyAyersMS opened this issue Mar 10, 2015 · 1 comment · Fixed by #553
Closed

Reader: support union types #275

AndyAyersMS opened this issue Mar 10, 2015 · 1 comment · Fixed by #553
Assignees

Comments

@AndyAyersMS
Copy link
Member

Union types aren't common but can be created with explicit layout. Right now we bail out on any method that refers to a union.

See this TODO.

@AndyAyersMS
Copy link
Member Author

The general idea is as follows.

In GenIR;:getClassType we already enumerate all fields (as offset, handle pairs) and sort by starting offset. We also know if a type has overlapping fields. If so, we pass the sorted field collection to a helper method. This locates the sets of overlapping fields.

Within each overlap set, we look to see if any of the field types are or have GC references. If not, we remove all of the fields from the original collection, and replace them by a single int8[] field that has the same extent and a null handle.

If any field type is or has a GC reference, we recursively expand any GC reference containing structs until we have only non-GC types and GC references. We verify the GC references all line up and aren't overlapped by other fields. We then split the union range into sub-ranges around each set of GC references, and report int8[] fields for the non-GC parts and object* for the GC parts. [Note all this is done mainly so the LLVM type accurately describes the location of the GC references]. All these new fields are given null handles, and the original set of overlapping fields are removed from the field collection.

Later on when looking up an LLVM type index by field handle via the FieldIndexMap, we will fail to locate an index for any field that is part of an overlap set, and we'll fall back to using the raw field offset reported by the EE, as is done here. This means we'll use a flattened GEP instead of a struct GEP, and the callers will need to cast the pointer type appropriately (which they already should be doing).

AndyAyersMS added a commit to AndyAyersMS/llilc that referenced this issue May 7, 2015
Closes dotnet#275.

Types with explicit layout may have fields that overlap one another (aka union types). Before this change we'd fail to compile any method that referenced one of these union types. We now model them as best we can using LLVM types.

LLVM doesn't provide a strong way to describe unions. Instead we generally provide a byte array that covers the extent of the union as a representative placeholder. That means for some fields there is no exact LLMV type counterpart. Downstream consumers cope with this as follows: for any field within the extent of a set of overlapping fields, we omit that field handle from the `FieldIndexMap`, so that when a ldfld or similar goes to find out what LLVM type index to use, it comes up empty-handed. We already had a fall-back path here to simply use the EE provided field offset when this happens, and so now that path kicks in for accesses to overlapping fields, and subsequent pointer casts then fix up the types properly.

However things are not quite that simple. We also want the LLVM type to fully describe the location of any GC references within the type, and it is valid for GC references to appear in one of these overlap sets. GC references must fully overlap one another and can't be safely overlapped with non-GC data (see Ecma-355, II.10.7). This means the extents of the GC references in an union partition the union into ranges of either non-GC data or GC references. We report the former using the byte arrays, and the latter as object references. Thus for instance a type `C` like
```
[StructLayout(LayoutKind.Explicit)]
public struct A
{
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

[StructLayout(LayoutKind.Explicit)]
public struct C
{
    [FieldOffset(0)]public A X;
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

```
would on x64 be described as
```
type <{ %System.Object addrspace(1)*, [8 x i8] }>
```
This special handling for GC references is strictly only necessary for value types (since GC reference locations for value types on the stack must be reported at GC safe points) but for uniformity we do it for all types. We also continue to double-check value type GC locations against the GC pointer info provided by the EE.
AndyAyersMS added a commit to AndyAyersMS/llilc that referenced this issue May 7, 2015
Closes dotnet#275.

Types with explicit layout may have fields that overlap one another (aka union types). Before this change we'd fail to compile any method that referenced one of these union types. We now model them as best we can using LLVM types.

LLVM doesn't provide a strong way to describe unions. Instead we generally provide a byte array that covers the extent of the union as a representative placeholder. That means for some fields there is no exact LLMV type counterpart. Downstream consumers cope with this as follows: for any field within the extent of a set of overlapping fields, we omit that field handle from the `FieldIndexMap`, so that when a ldfld or similar goes to find out what LLVM type index to use, it comes up empty-handed. We already had a fall-back path here to simply use the EE provided field offset when this happens, and so now that path kicks in for accesses to overlapping fields, and subsequent pointer casts then fix up the types properly.

However things are not quite that simple. We also want the LLVM type to fully describe the location of any GC references within the type, and it is valid for GC references to appear in one of these overlap sets. GC references must fully overlap one another and can't be safely overlapped with non-GC data (see Ecma-355, II.10.7). This means the extents of the GC references in an union partition the union into ranges of either non-GC data or GC references. We report the former using the byte arrays, and the latter as object references. Thus for instance a type `C` like
```
[StructLayout(LayoutKind.Explicit)]
public struct A
{
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

[StructLayout(LayoutKind.Explicit)]
public struct C
{
    [FieldOffset(0)]public A X;
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

```
would on x64 be described as
```
type <{ %System.Object addrspace(1)*, [8 x i8] }>
```
This special handling for GC references is strictly only necessary for value types (since GC reference locations for value types on the stack must be reported at GC safe points) but for uniformity we do it for all types. We also continue to double-check value type GC locations against the GC pointer info provided by the EE.
AndyAyersMS added a commit to AndyAyersMS/llilc that referenced this issue May 8, 2015
Closes dotnet#275.

Types with explicit layout may have fields that overlap one another (aka union types). Before this change we'd fail to compile any method that referenced one of these union types. We now model them as best we can using LLVM types.

LLVM doesn't provide a strong way to describe unions. Instead we generally provide a byte array that covers the extent of the union as a representative placeholder. That means for some fields there is no exact LLMV type counterpart. Downstream consumers cope with this as follows: for any field within the extent of a set of overlapping fields, we omit that field handle from the `FieldIndexMap`, so that when a ldfld or similar goes to find out what LLVM type index to use, it comes up empty-handed. We already had a fall-back path here to simply use the EE provided field offset when this happens, and so now that path kicks in for accesses to overlapping fields, and subsequent pointer casts then fix up the types properly.

However things are not quite that simple. We also want the LLVM type to fully describe the location of any GC references within the type, and it is valid for GC references to appear in one of these overlap sets. GC references must fully overlap one another and can't be safely overlapped with non-GC data (see Ecma-355, II.10.7). This means the extents of the GC references in an union partition the union into ranges of either non-GC data or GC references. We report the former using the byte arrays, and the latter as object references. Thus for instance a type `C` like
```
[StructLayout(LayoutKind.Explicit)]
public struct A
{
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

[StructLayout(LayoutKind.Explicit)]
public struct C
{
    [FieldOffset(0)]public A X;
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

```
would on x64 be described as
```
type <{ %System.Object addrspace(1)*, [8 x i8] }>
```
This special handling for GC references is strictly only necessary for value types (since GC reference locations for value types on the stack must be reported at GC safe points) but for uniformity we do it for all types. We also continue to double-check value type GC locations against the GC pointer info provided by the EE.
swaroop-sridhar pushed a commit to swaroop-sridhar/llilc that referenced this issue May 9, 2015
Closes dotnet#275.

Types with explicit layout may have fields that overlap one another (aka union types). Before this change we'd fail to compile any method that referenced one of these union types. We now model them as best we can using LLVM types.

LLVM doesn't provide a strong way to describe unions. Instead we generally provide a byte array that covers the extent of the union as a representative placeholder. That means for some fields there is no exact LLMV type counterpart. Downstream consumers cope with this as follows: for any field within the extent of a set of overlapping fields, we omit that field handle from the `FieldIndexMap`, so that when a ldfld or similar goes to find out what LLVM type index to use, it comes up empty-handed. We already had a fall-back path here to simply use the EE provided field offset when this happens, and so now that path kicks in for accesses to overlapping fields, and subsequent pointer casts then fix up the types properly.

However things are not quite that simple. We also want the LLVM type to fully describe the location of any GC references within the type, and it is valid for GC references to appear in one of these overlap sets. GC references must fully overlap one another and can't be safely overlapped with non-GC data (see Ecma-355, II.10.7). This means the extents of the GC references in an union partition the union into ranges of either non-GC data or GC references. We report the former using the byte arrays, and the latter as object references. Thus for instance a type `C` like
```
[StructLayout(LayoutKind.Explicit)]
public struct A
{
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

[StructLayout(LayoutKind.Explicit)]
public struct C
{
    [FieldOffset(0)]public A X;
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

```
would on x64 be described as
```
type <{ %System.Object addrspace(1)*, [8 x i8] }>
```
This special handling for GC references is strictly only necessary for value types (since GC reference locations for value types on the stack must be reported at GC safe points) but for uniformity we do it for all types. We also continue to double-check value type GC locations against the GC pointer info provided by the EE.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants