Skip to content

[NativeAOT] Add cDAC data descriptor infrastructure#126972

Draft
max-charlamb wants to merge 8 commits intomainfrom
dev/max-charlamb/managed-type-descriptors
Draft

[NativeAOT] Add cDAC data descriptor infrastructure#126972
max-charlamb wants to merge 8 commits intomainfrom
dev/max-charlamb/managed-type-descriptors

Conversation

@max-charlamb
Copy link
Copy Markdown
Member

Note

This PR was created with assistance from GitHub Copilot.

Summary

Adds the cDAC data descriptor infrastructure for NativeAOT, enabling diagnostic tools (cDAC reader, SOS) to inspect NativeAOT runtime state through the same contract-based mechanism used by CoreCLR.

Changes

Native data descriptor (datadescriptor.inc)

  • Thread/ThreadStore: Thread state, OS ID, exception tracker, stack bounds, alloc context, transition frame
  • EEAllocContext/GCAllocContext: Allocation pointer, limit, bytes allocated
  • MethodTable (EEType): Flags, base size, related type, vtable slots, interfaces, hash code — with flag constants exposed via cdac_data<> friend pattern
  • ExInfo: Exception linked list traversal
  • RuntimeInstance: ThreadStore pointer
  • Globals: Runtime instance, free object MethodTable, GC bounds, thread state flags, object unmask
  • Contracts: Thread (1001), Exception (1), RuntimeTypeSystem (1001)
  • Sub-descriptors: GC (workstation + server) and managed type descriptors

ILC managed type descriptor (ManagedDataDescriptorNode)

  • Computes managed type field offsets at compile time in ILC
  • Emits a ContractDescriptor (DotNetManagedContractDescriptor) with JSON-encoded type layouts
  • Referenced by the native descriptor as a sub-descriptor via CDAC_GLOBAL_SUB_DESCRIPTOR
  • Currently registers System.Threading.Thread fields (ManagedThreadId, Name, StartHelper)

Build integration

  • CMake integration using shared clrdatadescriptors.cmake infrastructure
  • Separate GC descriptor targets for workstation and server GC
  • Symbol export via --export-dynamic-symbol in Microsoft.NETCore.Native.targets

Validation

  • Build: build.cmd clr.aot+libs -rc release — ✅ 0 errors, 0 warnings
  • Symbol verified in Runtime.WorkstationGC.lib via dumpbin
  • cDAC reader tests: ✅ 1586/1586 passed
  • tools.cdac tests: ✅ All passed

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke, @dotnet/ilc-contrib
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds cDAC contract descriptor generation to the NativeAOT runtime, plus an ILC-emitted managed sub-descriptor so diagnostic tools can inspect NativeAOT runtime/managed state via the shared contract mechanism.

Changes:

  • Integrates NativeAOT cDAC contract descriptor (and GC sub-descriptors) into the NativeAOT CMake build and runtime libraries.
  • Introduces a managed type layout sub-descriptor emitted by ILC (DotNetManagedContractDescriptor) and wires it into the NativeAOT descriptor as a sub-descriptor.
  • Exposes select private NativeAOT runtime offsets/constants to the descriptor via the cdac_data<T> friend pattern and exports the main contract descriptor symbol for diagnostics.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/tools/aot/ILCompiler/Program.cs Adds the managed descriptor root provider to ILC compilation roots.
src/coreclr/tools/aot/ILCompiler.Compiler/ILCompiler.Compiler.csproj Includes new managed descriptor provider/node sources in the build.
src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/ManagedDataDescriptorProvider.cs Registers managed types to be described and roots the descriptor + JSON blob.
src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/ManagedDataDescriptorNode.cs Emits a ContractDescriptor-shaped symbol containing JSON type layout data.
src/coreclr/nativeaot/Runtime/threadstore.h Exposes ThreadStore private offsets for descriptor generation via cdac_data<>.
src/coreclr/nativeaot/Runtime/inc/MethodTable.h Exposes MethodTable offsets and flag constants for descriptor consumption via cdac_data<>.
src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc Defines the NativeAOT data descriptor types/globals/contracts and sub-descriptors.
src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.h Provides includes and declares the managed sub-descriptor symbol address for inclusion.
src/coreclr/nativeaot/Runtime/datadescriptor/CMakeLists.txt Adds descriptor generation targets for NativeAOT runtime + GC (wks/svr).
src/coreclr/nativeaot/Runtime/RuntimeInstance.h Exposes RuntimeInstance private offsets via cdac_data<>.
src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt Links the generated descriptor libraries into WorkstationGC/ServerGC runtime libs.
src/coreclr/nativeaot/Runtime/CMakeLists.txt Adds the datadescriptor subdirectory to the NativeAOT runtime build (non-WASM).
src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.targets Exports DotNetRuntimeContractDescriptor symbol for diagnostics on all OSes.
Comments suppressed due to low confidence (1)

src/coreclr/nativeaot/Runtime/datadescriptor/CMakeLists.txt:73

  • target_compile_definitions entries should be raw preprocessor symbols (e.g., SERVER_GC), not compiler flags. Passing -DSERVER_GC here will typically result in an invalid definition being forwarded to the compiler. Use SERVER_GC (or SERVER_GC=1) instead.

Comment thread src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc Outdated
Comment thread src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.h Outdated
}
}

private static void TryRegisterConditionalWeakTableTypes(ManagedDataDescriptorNode descriptorNode, EcmaModule systemModule)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid hardcoding BCL implementation details like this in the compiler. Can the list of fields to include in the data contract .json be specified via an input file?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either input file or custom attributes on the types/fields in question (can be a corelib private attribute that we match by name).

Do we need to generate these even if the type was never allocated, or only for types that could really exist on the GC heap?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid hardcoding BCL implementation details like this in the compiler. Can the list of fields to include in the data contract .json be specified via an input file?

Makes sense, I'll switch to using an attribute.

Do we need to generate these even if the type was never allocated, or only for types that could really exist on the GC heap?

In theory we would not. I do think that all of the types we would access are part of the nativeaot internal system and would likely exist. If they don't exist, we wouldn't need this data.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would likely exist

It depends on how far we go with marking things with this attribute. For example, some debugging scenarios need offset for async infrastructure. The async infrastructure won't be part of the app if the app does not use async.


CDAC_TYPE_BEGIN(RuntimeInstance)
CDAC_TYPE_INDETERMINATE(RuntimeInstance)
CDAC_TYPE_FIELD(RuntimeInstance, T_POINTER, ThreadStore, cdac_data<RuntimeInstance>::ThreadStore)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this to ThreadStore, so it is looks the same as regular CoreCLR

}
}

private static void TryRegisterConditionalWeakTableTypes(ManagedDataDescriptorNode descriptorNode, EcmaModule systemModule)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either input file or custom attributes on the types/fields in question (can be a corelib private attribute that we match by name).

Do we need to generate these even if the type was never allocated, or only for types that could really exist on the GC heap?

Comment on lines 237 to +240
<IlcArg Condition="'$(_targetOS)' == 'win' and '$(DebuggerSupport)' != 'false'" Include="--export-dynamic-symbol:DotNetRuntimeDebugHeader,DATA" />
<IlcArg Condition="'$(_targetOS)' != 'win' and '$(DebuggerSupport)' != 'false'" Include="--export-dynamic-symbol:DotNetRuntimeDebugHeader" />
<IlcArg Condition="'$(_targetOS)' == 'win' and '$(DebuggerSupport)' != 'false'" Include="--export-dynamic-symbol:DotNetRuntimeContractDescriptor,DATA" />
<IlcArg Condition="'$(_targetOS)' != 'win' and '$(DebuggerSupport)' != 'false'" Include="--export-dynamic-symbol:DotNetRuntimeContractDescriptor" />
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relationship between DotNetRuntimeDebugHeader and DotNetRuntimeContractDescriptor?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DotNetRuntimeDebugHeader export exposes a set of offsets that the current NativeAOT SOS uses to implement some commands ad-hoc.

The DotNetRuntimeContractDescriptor is the cDAC contract blob, similar to CoreCLR builds. Once the cDAC NAOT infrastructure is in place and working, the legacy DotNetRuntimeDebugHeader export and blob will be removed.

CDAC_GLOBAL(ObjectToMethodTableUnmask, uint8, 3)
#endif

// Contracts: declare which contracts this runtime supports
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add stresslog here as well (without the module table). It can use the same contract version as CoreCLR.

max-charlamb and others added 2 commits April 16, 2026 16:17
Add the native cDAC data descriptor for NativeAOT, enabling diagnostic
tools (cDAC reader, SOS) to inspect NativeAOT runtime state through the
same contract-based mechanism used by CoreCLR.

Includes:
- datadescriptor.inc with Thread, ThreadStore, MethodTable, ExInfo,
  EEAllocContext, GCAllocContext, RuntimeInstance types and globals
- CMake integration using shared clrdatadescriptors.cmake infrastructure
- GC sub-descriptor for workstation and server GC
- Contract declarations: Thread (1001), Exception (1), RuntimeTypeSystem (1001)
- Symbol export via --export-dynamic-symbol in build targets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ILC infrastructure to emit managed type field offsets as a cDAC
sub-descriptor, enabling diagnostic tools to inspect managed type
instances without runtime metadata.

Types opt-in via [CdacType] and [CdacField] attributes in CoreLib.
ILC discovers annotated types at compile time, computes field offsets,
and emits a ContractDescriptor (DotNetManagedContractDescriptor) that
the native runtime references as a sub-descriptor.

Includes:
- CdacTypeAttribute / CdacFieldAttribute in NativeAOT CoreLib
- ManagedDataDescriptorNode: emits ContractDescriptor with JSON layout
- ManagedDataDescriptorProvider: attribute-based type discovery with
  validation (rejects generics, duplicates, empty names)
- Thread.NativeAot.cs annotated with ManagedThread descriptor fields

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 16, 2026 20:49
@max-charlamb max-charlamb force-pushed the dev/max-charlamb/managed-type-descriptors branch from 9462d5c to f226bc3 Compare April 16, 2026 20:49
@max-charlamb max-charlamb deleted the dev/max-charlamb/managed-type-descriptors branch April 16, 2026 20:53
@max-charlamb max-charlamb restored the dev/max-charlamb/managed-type-descriptors branch April 16, 2026 20:54
@max-charlamb max-charlamb reopened this Apr 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Comment thread src/coreclr/nativeaot/Runtime/datadescriptor/CMakeLists.txt Outdated
Comment thread src/coreclr/nativeaot/Runtime/datadescriptor/CMakeLists.txt
Comment thread src/coreclr/nativeaot/Runtime/datadescriptor/CMakeLists.txt
Comment thread src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc
private ManagedThreadId _managedThreadId;
[CdacField("Name")]
private string? _name;
[CdacField("StartHelper")]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the StartHelper field actually used in the diagnostic stack somewhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I was using these values to test that the descriptor was generated properly.

- Move ThreadStore from RuntimeInstance to direct global pointer (jkotas)
- Update doc comment to clarify SystemModule-only scope (copilot-bot)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

}

/// <summary>
/// The cDAC descriptor field name. If not specified, the field's declared name is used.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to always publish managed types and fields under their actual names and disallow any renaming, so that the lookup using cdac contract and lookup using symbols is seamlessly interchangeable.

rootProvider.AddCompilationRoot(descriptorNode.JsonBlobNode, "Managed descriptor JSON data");
}

private void DiscoverAnnotatedTypes(ManagedDataDescriptorNode descriptorNode)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, we should do this in EETypeNode so that:

a) we don't need to load all types from corelib
b) we don't generate this info for types that don't have a MethodTable

(EETypeNode is the thing that generates the MethodTable)

EETypeNode.ComputeNonRelocationDependencies should check for the custom attribute on the type and if it exists, drop a factory.ManagedTypeDataDescriptor(_type) to the graph. This will be a new node (e.g. ManagedTypeDataDescriptorNode) that will largely work the same as e.g. ExactMethodInstantiationsEntryNode - there will be one of these nodes in the graph for each type with a CDAC descriptor. We'll collect them in MetadataManager as they show up (same as ExactMethodInstantiationsEntryNode) and there will be an API on MetadataManager to get a list of them when we're writing the object file (same as for ExactMethodInstantiationsEntryNode).

Copy link
Copy Markdown
Member

@jkotas jkotas Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect we will end up with some static fields too. Is this going to work well for static fields?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more node then - not just ManagedTypeDataDescriptorNode: instead of that, add InstanceFieldDataDescriptorNode and StaticFieldDataDescritorNode. The static field one will be added when the static base for the type in question is generated (GCStaticsNode, NonGCStaticsNode, ThreadStaticsNode).

How are we going to embed their addresses in a textual JSON though?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to embed their addresses in a textual JSON though?

The textual JSON has an index into a side-table with absolute addresses.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, we could also get away without the node, we already have API on MetadataManager to go over all EETypes (GetTypesWithEETypes) and all static bases (GetTypesWithStaticBases). We can just go over those and check for the attributes once we're ready to generate the JSON. So skip the new nodes.

Add ClrNativeAotSubset to the HasCdacBuildTool condition in runtime.proj
so the cdac-build-tool processes datadescriptor.inc for NativeAOT instead
of linking a stub contract descriptor. Add missing GPTR_DECL for
g_pFreeObjectEEType in datadescriptor.h.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 17, 2026 16:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Comment on lines +174 to +185
if (desc.FieldMappings is not null)
{
// Check if any cDAC name maps to this managed field name
cdacFieldName = null;
foreach (var kvp in desc.FieldMappings)
{
if (kvp.Value == fieldName)
{
cdacFieldName = kvp.Key;
break;
}
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field selection with FieldMappings currently does an O(fields × mappings) scan by iterating the entire FieldMappings dictionary for every instance field to find a matching managed field name. This is fine for a few fields, but will scale poorly as more managed types/fields are added. Consider building a reverse map (managed field name → cDAC field name) once per ManagedTypeDescriptor so the loop is O(fields).

Copilot uses AI. Check for mistakes.
- Refactor ManagedDataDescriptorNode to use MetadataManager.GetTypesWithEETypes()
  for type discovery (Michal), ensuring only types with MethodTables are included
- Remove name overriding from CdacTypeAttribute and CdacFieldAttribute (Jan),
  types and fields are published under their actual managed names
- Remove StartHelper field annotation (Jan) — not used in diagnostics
- Fix Linux build break: add SKIP_TRACING_DEFINITIONS to datadescriptor
  CMake interface to avoid clretwallmain.h dependency
- Embed JSON inline in descriptor node instead of separate BlobNode
- Restore RuntimeInstance.h whitespace

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
};
typedef DPTR(RuntimeInstance) PTR_RuntimeInstance;


Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this code diff

}

GPTR_IMPL_INIT(RuntimeInstance, g_pTheRuntimeInstance, NULL);
GPTR_IMPL_INIT(ThreadStore, g_pThreadStore, NULL);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a static field on ThreadStore like Jan mentioned. We want it to match the coreclr implementation

Move g_pThreadStore global to ThreadStore::s_pThreadStore static member,
matching how CoreCLR exposes its ThreadStore pointer for the cDAC.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 17, 2026 19:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Comment on lines +116 to +122
// Use 0 (indeterminate) for reference types
int typeSize = type.IsValueType ? type.InstanceFieldSize.AsInt : 0;

sb.Append('"').Append(type.GetName()).Append("\":[");
sb.Append(typeSize);
sb.Append(",{");

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON emitted for each entry in the "types" dictionary is not in the compact data-descriptor format expected by ContractDescriptorParser.TypeDescriptorConverter. The converter requires each type value to be a JSON object (with optional "!" size sigil and field-name properties), but this code emits an array [typeSize,{...}], which will fail to deserialize and cause the managed sub-descriptor to be ignored by the reader.

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +122
// Use 0 (indeterminate) for reference types
int typeSize = type.IsValueType ? type.InstanceFieldSize.AsInt : 0;

sb.Append('"').Append(type.GetName()).Append("\":[");
sb.Append(typeSize);
sb.Append(",{");

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference types are currently encoded with a size value of 0 (see comment and typeSize computation). In the compact descriptor format, an indeterminate-size type must omit the "!" size property entirely; specifying size 0 makes the type determinate with an incorrect size and can break pointer arithmetic logic in consumers.

Suggested change
// Use 0 (indeterminate) for reference types
int typeSize = type.IsValueType ? type.InstanceFieldSize.AsInt : 0;
sb.Append('"').Append(type.GetName()).Append("\":[");
sb.Append(typeSize);
sb.Append(",{");
sb.Append('"').Append(type.GetName()).Append("\":[");
if (type.IsValueType)
{
sb.Append(type.InstanceFieldSize.AsInt);
sb.Append(",{");
}
else
{
sb.Append('{');
}

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +119
private static void EmitTypeJson(StringBuilder sb, EcmaType type)
{
// Use 0 (indeterminate) for reference types
int typeSize = type.IsValueType ? type.InstanceFieldSize.AsInt : 0;

sb.Append('"').Append(type.GetName()).Append("\":[");
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type.GetName() returns only the metadata type name (no namespace), so the descriptor keys can easily collide (e.g., two annotated types named "Timer" in different namespaces). Consider using a fully-qualified managed name (namespace + nesting) for the JSON key to ensure uniqueness within the descriptor.

Suggested change
private static void EmitTypeJson(StringBuilder sb, EcmaType type)
{
// Use 0 (indeterminate) for reference types
int typeSize = type.IsValueType ? type.InstanceFieldSize.AsInt : 0;
sb.Append('"').Append(type.GetName()).Append("\":[");
private static string GetQualifiedTypeName(MetadataType type)
{
if (type.ContainingType is not null)
{
return GetQualifiedTypeName(type.ContainingType) + "+" + type.GetName();
}
if (string.IsNullOrEmpty(type.Namespace))
{
return type.GetName();
}
return type.Namespace + "." + type.GetName();
}
private static void EmitTypeJson(StringBuilder sb, EcmaType type)
{
// Use 0 (indeterminate) for reference types
int typeSize = type.IsValueType ? type.InstanceFieldSize.AsInt : 0;
sb.Append('"').Append(GetQualifiedTypeName(type)).Append("\":[");

Copilot uses AI. Check for mistakes.
Move add_subdirectory(datadescriptor) after add_subdirectory(eventpipe)
so that the generated clretwallmain.h header exists before the
datadescriptor intermediary compiles. This removes the need for the
SKIP_TRACING_DEFINITIONS workaround.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 17, 2026 21:25
@max-charlamb max-charlamb force-pushed the dev/max-charlamb/managed-type-descriptors branch from 7f16fd5 to 3b5d334 Compare April 17, 2026 21:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated no new comments.

The datadescriptor compilation unit only needs type layouts (offsetof),
not ETW tracing headers. Define SKIP_TRACING_DEFINITIONS on the
interface target to skip the clretwallmain.h include from gcenv.h.
This matches the pattern used by clrgc.enabled.cpp.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the dev/max-charlamb/managed-type-descriptors branch from 3b5d334 to 656e2c7 Compare April 18, 2026 01:02
@github-actions
Copy link
Copy Markdown
Contributor

🤖 Copilot Code Review — PR #126972

Note

This review was generated by GitHub Copilot using multi-model analysis (Claude Opus 4.6 primary, Claude Sonnet 4.5 sub-agent). A GPT-5.3-Codex sub-agent was also launched but did not complete within the 10-minute timeout.

Holistic Assessment

Motivation: Well-justified. NativeAOT needs cDAC support for diagnostic tool compatibility. The infrastructure enables SOS and the cDAC reader to inspect NativeAOT runtime state through the same contract-based mechanism used by CoreCLR.

Approach: Sound. The PR reuses the existing shared clrdatadescriptors.cmake infrastructure, adds a novel managed sub-descriptor mechanism leveraging ILC's compile-time knowledge, and correctly gates everything on DebuggerSupport. The attribute-based discovery pattern ([CdacType]/[CdacField]) is clean and minimal.

Summary: ⚠️ Needs Human Review. The code is well-structured and follows established patterns. However, there are two areas that warrant human attention: (1) the Thread/RuntimeThreadLocals offset chain in datadescriptor.inc may double-count the alloc context offset when the reader is eventually implemented, and (2) JSON generation lacks string escaping. Both are low-risk today (no reader for version 1001 exists yet, and current annotated names are safe), but they are latent issues worth verifying.


Detailed Findings

⚠️ Thread.RuntimeThreadLocals offset may double-count m_eeAllocContext

In datadescriptor.inc, both the Thread to RuntimeThreadLocals and RuntimeThreadLocals to AllocContext paths use the same offset:

// Line 31: Thread field
CDAC_TYPE_FIELD(Thread, TYPE(RuntimeThreadLocals), RuntimeThreadLocals,
    offsetof(RuntimeThreadLocals, m_eeAllocContext))

// Line 42: RuntimeThreadLocals field - same offset value
CDAC_TYPE_FIELD(RuntimeThreadLocals, TYPE(EEAllocContext), AllocContext,
    offsetof(RuntimeThreadLocals, m_eeAllocContext))

In NativeAOT, Thread IS RuntimeThreadLocals (validated by static_assert(sizeof(Thread) == sizeof(RuntimeThreadLocals))). If a future cDAC reader (Thread_1001) chains these inline type offsets additively, the total offset to AllocContext would be 2 * offsetof(m_eeAllocContext) instead of the correct offsetof(m_eeAllocContext).

In CoreCLR's descriptor, Thread.RuntimeThreadLocals is a T_POINTER (pointer dereference), so there is no additive chaining. The NativeAOT version uses TYPE(RuntimeThreadLocals) (inline), which changes the semantics.

Suggested fix - since Thread IS RuntimeThreadLocals, the inline sub-object starts at offset 0:

CDAC_TYPE_FIELD(Thread, TYPE(RuntimeThreadLocals), RuntimeThreadLocals, 0)

This is not immediately broken (the reader for version 1001 does not exist yet), but it should be resolved before the reader-side implementation. (Flagged by both models)

⚠️ Missing JSON string escaping in ManagedDataDescriptorNode.cs

EmitTypeJson (lines 119, 136) inserts type/field names directly into JSON without escaping:

sb.Append('"').Append(type.GetName()).Append("\":[");
sb.Append('"').Append(field.GetName()).Append("\":");

If a type or field name contained ", \, or control characters, the generated JSON would be malformed. Currently safe (the only annotated type is System.Threading.Thread with fields _managedThreadId and _name), but a latent bug for future annotated types.

Suggestion: Add a minimal JSON escaping helper, or add a Debug.Assert / comment noting that names are constrained to safe identifiers. (Flagged by both models)

⚠️ No cDAC reader implementations for contract versions 1001

The descriptor declares:

CDAC_GLOBAL_CONTRACT(Thread, 1001)
CDAC_GLOBAL_CONTRACT(RuntimeTypeSystem, 1001)

The cDAC reader in CoreCLRContracts.cs only registers version 1 for these contracts. CachingContractRegistry.GetContract<T>() will throw NotImplementedException for version 1001.

This is presumably intentional infrastructure-first work, with reader-side implementations planned as follow-up. The version numbers (1001 vs 1) correctly prevent the CoreCLR reader from misinterpreting NativeAOT data layouts (which differ, e.g. Thread.RuntimeThreadLocals is inline vs pointer). This is an observation for tracking, not a blocking issue. (Flagged by both models)

✅ ContractDescriptor binary layout is correct

The ManagedDataDescriptorNode.GetData() emits a header matching the ContractDescriptor C struct layout exactly:

  • magic(8) + flags(4) + descriptor_size(4) + descriptor(ptr) + pointer_data_count(4) + pad0(4) + pointer_data(ptr)
  • EmitPointerReloc(this, headerSize) correctly points the descriptor field to the inline JSON bytes
  • The descriptor_size is jsonBytes.Length (excluding null terminator), matching the native convention

Verified against src/coreclr/debug/datadescriptor-shared/inc/contract-descriptor.h.

✅ CMake integration follows established patterns

  • generate_data_descriptors() calls with proper INTERFACE_TARGET and EXPORT_VISIBLE
  • SKIP_TRACING_DEFINITIONS on the interface target correctly avoids the clretwallmain.h dependency, matching the pattern in clrgc.enabled.cpp
  • WorkstationGC links nativeaot_gc_wks_descriptor, ServerGC links nativeaot_gc_svr_descriptor - correct GC variant separation
  • datadescriptor subdirectory is added before Full in the non-WASM guard, ensuring targets are available

✅ ThreadStore static pointer matches CoreCLR pattern

SPTR_DECL(ThreadStore, s_pThreadStore) + SPTR_IMPL follows the established DAC-compatible pattern. The assignment in RuntimeInstance::Initialize() runs during single-threaded startup (guarded by ASSERT_MSG(g_pTheRuntimeInstance == NULL, "multi-instances are not supported")), so no thread-safety concern.

✅ Symbol export and build integration are correct

  • DotNetRuntimeContractDescriptor export is correctly gated on DebuggerSupport != 'false', matching DotNetRuntimeDebugHeader
  • Windows gets ,DATA suffix for proper PE section annotation
  • ClrNativeAotSubset is correctly added to HasCdacBuildTool condition in runtime.proj

✅ Attributes are internal - no public API surface change

CdacTypeAttribute and CdacFieldAttribute are internal sealed in NativeAOT's System.Private.CoreLib. No ref/ assembly changes. No API review needed.

💡 Consider adding minimal validation in ManagedDataDescriptorNode

The node accepts any type annotated with [CdacType] without checking for edge cases. Consider adding guards for:

  • Generic types (whose GetName() may produce unexpected results)
  • Types with no [CdacField]-annotated fields (would emit an empty type entry)

This is low-priority since the attribute is internal and currently only applied to Thread.

💡 Magic number could be a named constant

builder.EmitLong(0x0043414443434e44L); // "DNCCDAC\0"

Consider extracting as private const long ContractDescriptorMagic = 0x0043414443434e44L; for self-documentation, matching how the native side uses it in contract-descriptor.c.in.

Generated by Code Review for issue #126972 ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants