Skip to content

[cDAC] Source generator for IData<T> data classes#128356

Draft
max-charlamb wants to merge 9 commits into
dotnet:mainfrom
max-charlamb:dev/max-charlamb/cdac-source-generator-prototype
Draft

[cDAC] Source generator for IData<T> data classes#128356
max-charlamb wants to merge 9 commits into
dotnet:mainfrom
max-charlamb:dev/max-charlamb/cdac-source-generator-prototype

Conversation

@max-charlamb
Copy link
Copy Markdown
Member

Note

This PR was prepared with assistance from GitHub Copilot.

Introduces a Roslyn incremental source generator (DataGenerator) for cDAC IData<T> data classes, and ports ~150 hand-written classes under Microsoft.Diagnostics.DataContractReader.Contracts/Data/ to the new attribute-driven form.

Depends on

  • [cdac] Add IManagedTypeSource contract for FQN based type access #127310[cdac] Add IManagedTypeSource contract for FQN based type access. The first commit on this branch (Implement ManagedTypeSource contract) is that PR's work; this PR builds on top of it. The managed type source is needed for IData classes that use [CdacType(ManagedFullName = "...")] instead of a native descriptor.

Commits in this PR

  1. [cDAC] Add IData<T> source-generator infrastructure — adds the DataGenerator project, the [CdacType]/[Field]/[FieldAddress]/[InstanceDataStart]/[FieldOffset]/[StaticAddress]/[StaticReference]/[ThreadStaticAddress] attributes, an OnInit partial hook, and a Write{Name} write-back path for [Field(Writable = true)]. No existing classes are converted; build remains green.
  2. [cDAC] Convert IData<T> classes to source-generator form — ports ~150 IData classes to use the new attributes. Net diff is roughly -1300 lines.

Design notes

See docs/design/datacontracts/IData.md (added in commit 1) for the full attribute surface and the "good practices" guidance the conversions follow:

  • Don't put logic in IData classes — declarative attributes only; use OnInit for things that don't fit.
  • Don't eagerly dereference pointers to other IData classes — store as TargetPointer and let callers materialize, to avoid ambiguous null semantics. Inline structs ([Field(InPlace = true)]) are fine.
  • One class, one descriptor.
  • [Field(Writable = true)] properties must be declared { get; private set; } so the generated Write{Name} method can update the in-memory cache.

Intentional surface refinements

A handful of conversions go slightly beyond a mechanical port (also documented in IData.md):

  • Thread.RuntimeThreadLocals — eager IData deref → lazy TargetPointer; Thread_1.cs materializes via ProcessedData.GetOrAdd.
  • InteropSyncBlockInfo.{RCW,CCW,CCF,TaggedMemory} — always-non-null TargetPointer (with .Null sentinel) → nullable TargetPointer?; SyncBlock_1.cs updated.
  • Thread.DebuggerControlledThreadState — now a real [Field(Writable = true)]; Set/Reset paths use the generated WriteDebuggerControlledThreadState against the cached Data.Thread.

Holdout

JITNotification is intentionally left hand-written; its mutable, count-driven layout doesn't map cleanly onto the current generator surface.

Test results

dotnet build cdac.slnx clean (0 warnings, 0 errors). All 2177 cDAC unit tests pass; 16 skipped (unchanged from main).

Max Charlamb and others added 3 commits May 18, 2026 23:22
Squashed from 7 commits:

- base implementation

- update users

- Address Copilot review: fix Lock _owningThreadId type and ComWrappers null handling

- Register ManagedTypeSource contract in datadescriptor.inc

- Document ManagedTypeSource contract and update consumers

- Potential fix for pull request finding

- Add object data offset to SyncBlock.md ManagedTypeSource reads
Introduces a Roslyn incremental source generator (DataGenerator) that
emits the ctor, `IData<T>.Create` factory, `Address` property, and
optional `Write{Name}` write-back methods for cDAC `IData<T>` data
classes, from a small attribute surface:

  - `[CdacType("Foo")]` or `[CdacType(ManagedFullName = "...")]`
    selects native vs managed type descriptors.
  - `[Field]` on a property declares a descriptor-driven field read.
    Bool, primitive, pointer, NUInt, code pointer, in-place struct,
    and pointer-to-IData read kinds are supported. Nullable property
    types are treated as descriptor-optional.
  - `[Field(Writable = true)]` additionally emits a
    `Write{Name}(Target, T)` method.
  - `[FieldAddress]`, `[InstanceDataStart]`, `[FieldOffset(N)]`
    cover address arithmetic and hardcoded-offset reads.
  - `[StaticAddress]`, `[StaticReference]`,
    `[ThreadStaticAddress]` emit partial static accessor methods
    against the managed type source.
  - A `partial void OnInit(Target, TargetPointer)` hook lets the user
    do anything that doesn't fit the declarative surface.

No existing IData<T> classes are converted in this commit; that follows
separately. See docs/design/datacontracts/IData.md for the full
attribute surface and good-practices guidance.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ports ~150 hand-written `IData<T>` data classes under
Microsoft.Diagnostics.DataContractReader.Contracts/Data/ to the
attribute-driven form supported by the DataGenerator source generator
introduced in the previous commit. Each class loses its hand-rolled
ctor/Create boilerplate in favor of declarative `[CdacType]`/
`[Field]` attributes on a `partial` class; the generator emits the
equivalent ctor, `IData<T>.Create`, `Address` property, and any
required `Write{Name}` write-back methods.

A handful of intentional surface refinements come along for the ride
(documented in IData.md):

  - Pointer-to-IData fields are stored as `TargetPointer` and
    materialized lazily by callers, instead of being eagerly
    dereferenced in the ctor. This avoids ambiguous null semantics for
    fields that may be optional or self-referential.
    Affected: Thread.RuntimeThreadLocals, plus a handful of similar
    fields whose callers in Contracts/*.cs have been updated.
  - InteropSyncBlockInfo.{RCW,CCW,CCF,TaggedMemory} switch from
    always-non-null `TargetPointer` (with `Null` sentinels for
    missing fields) to nullable `TargetPointer?`. Callers in
    SyncBlock_1.cs have been updated to handle the new nullability.
  - Thread.DebuggerControlledThreadState is now a real
    `[Field(Writable = true)]` property, and Set/Reset paths in
    Thread_1.cs use the generated `WriteDebuggerControlledThreadState`
    method instead of bespoke `ReadField`/`WriteField` calls.

JITNotification is intentionally left in hand-written form for now
because its mutable, count-driven layout doesn't map cleanly onto the
current generator surface.

All 2177 cDAC unit tests continue to pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Roslyn incremental source generator (DataGenerator) that emits IData<T>.Create factories, field accessors, managed-type lookups, and write-back methods from declarative attributes ([CdacType], [Field], [FieldAddress], [InstanceDataStart], [FieldOffset], [Static…], [ThreadStaticAddress]). It then mechanically converts ~150 hand-written IData<T> classes under Microsoft.Diagnostics.DataContractReader.Contracts/Data/ to the new attribute-driven form, removing ~1300 lines of boilerplate. The PR also depends on/includes the new IManagedTypeSource contract from #127310 and refactors callers (SyncBlock_1, Thread_1, Debugger_1, AuxiliarySymbols_1, dump tests) to use it.

Changes:

  • New Microsoft.Diagnostics.DataContractReader.DataGenerator Roslyn analyzer project (Model, EquatableArray, generator entry, plus attribute/parser/emitter sources not all shown).
  • Conversion of ~150 Data/* classes to [CdacType] + partial with attribute-driven member declarations and optional partial void OnInit hooks.
  • Surface refinements: Thread.RuntimeThreadLocals becomes a lazy TargetPointer; InteropSyncBlockInfo.{RCW,CCW,CCF,TaggedMemory} become TargetPointer?; Debugger writable fields use [Field(Writable = true)] with generated Write{Name} methods; dump/SyncBlock_1/AuxiliarySymbols_1 updated to follow.

Reviewed changes

Copilot reviewed 183 out of 184 changed files in this pull request and generated no comments.

Show a summary per file
File Description
DataGenerator/*.cs, .csproj New incremental generator project (model types, EquatableArray helper, IsExternalInit shim, generator entry point).
Contracts.csproj, cdac.slnx Wire generator as analyzer; add generator project to solution.
Data/*.cs (~140 files) Mechanical port to [CdacType] + partial class with [Field]/[FieldAddress]/[InstanceDataStart]/[FieldOffset] properties; some classes retain OnInit for non-declarative logic.
Data/Managed/*.cs New per-managed-type wrappers (Lock, List, ComWrappers, NativeObjectWrapper, ConditionalWeakTable*) using ManagedFullName.
Data/AuxiliarySymbolInfo.cs Address renamed to CodeAddress to avoid colliding with generator-emitted Address.
Data/Debugger.cs SetField helper removed; writable fields use [Field(Writable = true)] and generated Write{Name}.
Data/InteropSyncBlockInfo.cs, Data/SyncBlock.cs RCW/CCW/CCF/TaggedMemory become nullable; SyncBlock loses Address.
Contracts/Thread_1.cs Materializes RuntimeThreadLocals lazily; handles new nullable ExceptionWatsonBucketTrackerBuckets / UEWatsonBucketTrackerBuckets.
Contracts/SyncBlock_1.cs Uses Data.Managed.Lock + IManagedTypeSource instead of hand-rolled metadata walk; updated for nullable interop pointers.
Contracts/Debugger_1.cs, Contracts/AuxiliarySymbols_1.cs, CoreCLRContracts.cs Switch to Write{Name} helpers; register ManagedTypeSource contract; rename to CodeAddress.
Abstractions/Contracts/IManagedTypeSource.cs, ContractRegistry.cs, IRuntimeTypeSystem.cs New IManagedTypeSource contract; remove GetTypeByNameAndModule / GetCoreLibFieldDescAndDef from IRuntimeTypeSystem.
datadescriptor.inc Register ManagedTypeSource contract version.
docs/design/datacontracts/ComWrappers.md Doc updated to describe ManagedTypeSource-based lookups.
tests/DumpTests/*.cs Replace rts.GetTypeByNameAndModule calls with IManagedTypeSource.GetTypeHandle/TryGetThreadStaticFieldAddress.

Adds three good-practice sections informed by the IData<T> conversion
work:

  - Materialize cached instances through `ProcessedData.GetOrAdd<T>`,
    never via `new T(target, addr)` (avoids cache bypass and stale
    write-back snapshots).
  - Don't capture `Target` in instance state -- treat IData
    instances as snapshots and accept `Target` as a parameter on
    methods that need a live channel.
  - Match the descriptor's declared field type verbatim (no widening,
    narrowing, or sign-flipping); document the standard descriptor
    type -> C# type mapping and call out `bool` as the lone
    intentional deviation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… RawOffset; doc cleanup

Namespace migration:
  - Move CdacAttributes.cs from
    `Abstractions/Generated/CdacAttributes.cs` to
    `Abstractions/CdacAttributes.cs`. The `Generated` subfolder was
    misleading -- the attributes are hand-authored, not source-
    generated.
  - Change the namespace from
    `Microsoft.Diagnostics.DataContractReader.Generated` to the root
    `Microsoft.Diagnostics.DataContractReader` namespace, matching
    where `Target`, `TargetPointer`, and the other foundational
    abstractions already live.
  - Strip the now-unnecessary `using ...Generated;` directive from
    the ~150 IData<T> classes (their file-scoped `...Data` namespace
    is a child of the root and resolves the attributes automatically).
  - Update the generator's FQN constants and doc-comment to match.

FieldOffset -> RawOffset rename:
  - Rename `FieldOffsetAttribute` to `RawOffsetAttribute`. The
    old name collided with `System.Runtime.InteropServices.FieldOffset`
    once the attribute moved to the root namespace; the new name is
    also more accurate (these are raw byte offsets relative to the
    instance address, not BCL-style explicit-layout offsets).
  - Rename all `[FieldOffset(...)]` uses on IData classes
    accordingly (ImageDosHeader, ImageFileHeader, ImageNTHeaders,
    ImageOptionalHeader, ImageSectionHeader, WebcilHeader,
    WebcilSectionHeader).
  - Update Parser.cs FQN constant and emitter helper to match.

IData.md cleanup (consistency with the current code):
  - Reflect the namespace + project + attribute-name changes above.
  - Update the `[CdacType]` attribute-surface table -- the
    `DataType` enum overload was removed earlier; the recommended
    form is now `[CdacType(nameof(DataType.X))]`.
  - Sweep all worked examples to use `[CdacType(nameof(DataType.X))]`
    instead of the obsolete `[CdacType(DataType.X)]`.
  - Fix the generated `WriteFlags` example to show the string form
    that the generator actually emits.
  - Correct the `[Field(Writable = true)]` rules in two places: the
    write goes through the descriptor field offset regardless of
    which side (native or managed) supplied it.
  - Soften the `init`/`required`/`= null!` blanket prohibition
    into a positive recommendation to use `[MemberNotNull]` on
    `OnInit` for properties populated by custom logic.

Build clean; all 2177 cDAC unit tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the dev/max-charlamb/cdac-source-generator-prototype branch from bc0e567 to 04d4c54 Compare May 19, 2026 14:41
Copilot AI review requested due to automatic review settings May 19, 2026 14:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 184 out of 184 changed files in this pull request and generated no new comments.

When an IData class supplies both a native cdac descriptor name and a
managed full name, each `[Field]` resolves at runtime via a per-field
cascade: each candidate name is tried against the native descriptor
first, then against the managed metadata. The first match wins.
Motivation: Jan's review on dotnet#127310 -- types like `Lock` may move
between sources or gain partial native coverage; a single IData class
should survive that without C# changes.

The fallback machinery is contained entirely in the generator's output;
no public type surface is added to `Abstractions`.

User-side surface (collapsed from four name-related properties to one):

  - `[Field("name1", "name2", ...)]` -- `params string[]` ctor.
    Defaults to `[propertyName]` when none given.
  - Cascade tries every name against native first, then managed.
  - `[FieldAddress(...)]` accepts the same `params string[]` shape.

LayoutPair (PostInit-emitted into the consuming assembly):

  - `LayoutPair` struct + `LayoutPairResolver.Resolve(target, ...)`
    are emitted via `RegisterPostInitializationOutput` into the
    consuming assembly, gated by a compilation check so multiple
    InternalsVisibleTo-linked assemblies don't double-emit.
  - All Read/Write/HasField/GetFieldAddress methods take a single
    `string` or `string[]` of candidate names.
  - `ManagedDataOffset` (`Object.Size` for ref types, `0` for
    value types) is applied only when the cascade resolves on the
    managed side.

Generator/parser:

  - `Target.TryGetTypeInfo(string, out TypeInfo)` -- new abstract
    on `Target`; non-throwing form used by `LayoutPairResolver`.
  - Unified codegen: every class that needs a descriptor lookup goes
    through `LayoutPair`. The previous dual single-source vs
    cross-source code paths are gone (~120 LOC deleted from the
    emitter); `[CdacType]` parameterless + `[RawOffset]`-only
    classes still skip the resolver call.
  - `IsSourceProject=false` on the generator csproj to stop the
    repo's DownlevelLibraryImportGenerator from attaching to this
    netstandard2.0 source generator.

Existing 150 IData<T> classes are unchanged: positional forms like
`[Field("_state")]` (Lock) and `[Field("_message")]` (Exception)
still resolve through the cascade. `Exception` is the only existing
two-source class; its descriptor field names happen to match the
managed names, so the happy path is identical to before.

DataGeneratorTests: a new self-contained xUnit sub-project under
`tests/DataGenerator/` exercises the generator's emitted code via a
minimal `TestTarget` (no dependency on the cdac mocking framework)
and 12 test-only IData classes. 10 direct `LayoutPair` unit tests
+ 19 integration scenarios cover single-source, cross-source cascade,
alias resolution, writable round-trip, optional `T?`, and
`[FieldAddress]` paths.

Test counts: 29 new tests in DataGeneratorTests; 2177 existing cdac
Tests unchanged (was 2187 -- the 10 LayoutPair direct tests moved into
the new sub-project). Total 2206 passing across the cdac surface.

IData.md: new Fallback section + updated attribute surface table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the dev/max-charlamb/cdac-source-generator-prototype branch from 1f09704 to 2440300 Compare May 19, 2026 20:28
Copilot AI review requested due to automatic review settings May 19, 2026 20:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 194 out of 194 changed files in this pull request and generated no new comments.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb/cdac-source-generator-prototype branch from f913e03 to 2440300 Compare May 19, 2026 20:35
Copilot AI review requested due to automatic review settings May 19, 2026 20:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 194 out of 194 changed files in this pull request and generated no new comments.

Max Charlamb and others added 2 commits May 19, 2026 17:14
…candidate

The C# property name is now always appended as the lowest-priority

candidate in the [Field] / [FieldAddress] name cascade (de-duped if

already present). This means an explicit name list still falls back

to the property name if none of the listed names matched the

descriptor, removing the need to repeat the property name in mixed

single-source/cross-source classes.

Opt out by setting UsePropertyName = false on the attribute. This is

rarely needed; it exists for cases where the C# property name happens

to collide with an unrelated descriptor field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…kind

Replaces the three-way pattern (string overload, string[] overload,

private ReadOnlySpan<string> core) with a single public method using

'params ReadOnlySpan<string> names' (C# 13).

- Single-name callers still bind to an inline span buffer (no heap

  allocation), matching the previous fast path.

- Multi-name callers can pass either comma-separated string literals

  or an existing string[] (implicit array-to-span conversion).

- Emitter's NameArgs no longer special-cases single vs multi: it

  always emits a comma-separated quoted list.

- WriteField parameter order swapped to put 'value' before the params

  names tail; Emitter codegen updated to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 19, 2026 21:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 194 out of 194 changed files in this pull request and generated no new comments.

LayoutPair previously exposed one wrapper per Target read/write kind

(ReadField, ReadPointerField, ReadNUIntField, ReadCodePointerField,

ReadDataField, WriteField, GetFieldAddress, HasField). Each wrapper

did a name-cascade resolution and then forwarded to the matching

Target method.

The same shape is now generated directly into each IData ctor:

Select / TrySelect resolves once into (TypeInfo, base, name) locals

and the appropriate Target.* call runs inline. This drops the

wrapper layer entirely; optional fields also gain a free win, since

they previously did a HasField + Read pair that resolved twice.

Also folded LayoutPairResolver.Resolve into a static LayoutPair.Resolve

method -- there's no reason to keep the factory in a separate type.

Net surface: LayoutPair has TrySelect, Select, Resolve (static),

InstanceSize, ManagedDataOffset, NativeType, ManagedType. Tests use

small helpers (FieldAddress, HasField) to stay readable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants