Skip to content

FlatExpression POC: data-oriented flat expression tree#511

Draft
Copilot wants to merge 6 commits intomasterfrom
copilot/data-oriented-expression-optimization
Draft

FlatExpression POC: data-oriented flat expression tree#511
Copilot wants to merge 6 commits intomasterfrom
copilot/data-oriented-expression-optimization

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 12, 2026

Explores the idea from #512: represent an expression tree as a single flat array of fat structs with integer index references instead of object-graph pointers — enabling stack allocation for small trees, trivial serialization, and O(1) structural equality.

Core types (src/FastExpressionCompiler/FlatExpression.cs)

  • Idx — 1-based int index into Nodes; default (It == 0) is the nil sentinel
  • ExpressionNode — 32-byte sequential fat struct (refs-first layout eliminates padding): Type, Info, NodeType, NextIdx (next sibling), ChildIdx (first child / inline constant bits), ExtraIdx (second child / constant discriminator)
  • ExpressionTree — holds nodes in SmallList<ExpressionNode, Stack16<…>, NoArrayPool<…>> (first 16 nodes on the call-stack) and closure constants in SmallList<object, Stack4<…>, …>; factory methods for Constant, Parameter, Unary, Binary, New, Call, Lambda, Conditional, Block
  • ToSystemExpression() — converts to System.Linq.Expressions so existing FEC compilation path is reachable; uses SmallMap16<int, SysParam, IntEq> (stack-resident) instead of Dictionary to map parameter indices during conversion
  • StructurallyEqual() — O(n) structural comparison via a single pass over the flat arrays; no traversal needed
var tree = default(ExpressionTree);
var px = tree.Parameter(typeof(int), "x");
var py = tree.Parameter(typeof(int), "y");
var add = tree.Add(px, py, typeof(int));
tree.Lambda(typeof(Func<int, int, int>), body: add, parameters: [px, py]);

// Round-trip to System.Linq.Expressions and compile
var fn = ((Expression<Func<int, int, int>>)tree.ToSystemExpression()).Compile();
fn(4, 7); // 11

Key design insight surfaced

Lambda parameters cannot be chained via NextIdx — the same parameter node may already have its NextIdx occupied as part of a New/Call argument chain. Lambda stores its parameters as Idx[] in Info instead. This is the central intrusive-list tension: one small Idx[] allocation per lambda avoids silent list corruption at construction time. A future optimisation could replace it with a (start, count) slice into a dedicated side array.

Constant node encoding

ExtraIdx is repurposed as a three-way discriminator, eliminating the old ConstantIndex field:

ExtraIdx.It Meaning
0 (nil) Value is in Info (boxed reference, or null)
> 0 ClosureConstants[ExtraIdx.It - 1] (1-based)
-1 Inline bits: value packed into ChildIdx.Itno boxing

Types stored inline in ChildIdx.It without boxing: bool, byte, sbyte, char, short, ushort, int, uint, float (reinterpreted via a portable [StructLayout(LayoutKind.Explicit)] union — compatible with all targets including netstandard2.0). Larger types (string, long, double, decimal, DateTime, Guid) remain in Info or closure.

Memory layout

ExpressionNode is 32 bytes on 64-bit by placing the two reference fields (Type, Info) first — eliminating the 4-byte padding that LayoutKind.Sequential would otherwise insert after the leading NodeType int:

Type(ref,8) + Info(ref,8) + NodeType(int,4) + NextIdx(4) + ChildIdx(4) + ExtraIdx(4) = 32 bytes

vs LightExpression heap objects (16-byte GC header + fields): Constant/Parameter ~40 bytes, Binary/Unary ~48–56 bytes.

Wins

  • ≤ 16 nodes → zero heap allocation (stack-resident via Stack16)
  • ExpressionNode is 32 bytes (down from 40) — refs-first layout, no padding
  • Common small constants (int, bool, float, etc.) stored inline — no boxing
  • ToSystemExpression() uses SmallMap16 (stack-resident) for the parameter map — no heap alloc for ≤ 16 parameters
  • Sibling traversal in hot paths uses direct while (!cur.IsNil) loops — no IEnumerator allocation
  • Trivially serializable: arrays of plain structs + integer refs
  • O(1) node access; O(n) structural equality without recursion
  • Closure constants collected at build time, mutable after build

Gaps / obstacles

  • Not API-compatible with System.Linq.Expressions without the adapter
  • Mutable struct: accidental by-value copy silently forks state
  • Info field boxes MethodBase/string — one allocation per call/new/parameter node

Tests (test/FastExpressionCompiler.UnitTests/FlatExpressionTests.cs)

22 tests covering node construction, parameter identity, structural equality, ToSystemExpression conversion, round-trip compile+invoke, and the mutable-closure-constant use case. Guarded with #if !LIGHT_EXPRESSION since FlatExpression.cs is not included in the LightExpression variant.

Copilot AI changed the title [WIP] Add data-oriented expression representation with optimization FlatExpression POC: data-oriented flat expression tree Apr 12, 2026
Copilot AI requested a review from dadhi April 12, 2026 22:17
Copy link
Copy Markdown
Owner

@dadhi dadhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many comments. Keep only important stuff explaining why/how abd remove trivial what.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 12, 2026

Too many comments. Keep only important stuff explaining why/how abd remove trivial what.

Done in d565578. Removed all comments that just describe what the code does. Kept only the three non-obvious why/how notes: the Lambda params-in-Info design decision, the // reset in case node was previously linked safety note in LinkList, and the InfoEqual note about array reference equality. Source went from 738 → 455 lines, tests from 391 → 329.

Copilot AI requested a review from dadhi April 12, 2026 22:46
// Builds body after registering params so they are found in paramMap when encountered in the body.
public SysExpr ToSystemExpression() => ToSystemExpression(RootIdx, new Dictionary<int, SysParam>());

private SysExpr ToSystemExpression(Idx nodeIdx, Dictionary<int, SysParam> paramMap)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid allocations as much as possible. Use SmallMap and such from ImTools. Don't be afraid to add or extend the ImTools if somthing is missing or has errors (apply fix). Use pooling if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 76ffb84. Replaced Dictionary<int, SysParam> with SmallMap16<int, SysParam, IntEq> (stack-allocated for ≤16 params, uses the AddOrGetValueRef pattern). The paramMap is now default-initialized on the stack and passed ref through the recursive calls.

return indices[0];
}

public IEnumerable<Idx> Siblings(Idx head)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only for tests - comment as such? Otherwise optimize.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 76ffb84. Siblings is used in tests and as a public convenience API, so it stays — but is now commented as allocating an enumerator. The hot-path usages in SiblingList and the Block vars loop were replaced with direct while (!cur.IsNil) iterations, eliminating the enumerator allocation there.

Copilot AI requested a review from dadhi April 13, 2026 06:05
/// <summary>≥ 0: index into <see cref="ExpressionTree.ClosureConstants"/>. -1: value is inline in Info.</summary>
public int ConstantIndex;
public Idx NextIdx;
public Idx ChildIdx;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let find way to compress the memory layout:

  • for constant ChildIndex or ExtraIndex may represent the constant index?
  • for up to int (byte/bool - uint) constants we may store the value in extraindex inline?
  • can reatreptet (Unsafe) Info and Indexes as a value for constant/other, just put put a byte fields and reinterpet they span based on tbe goal?
  • later let think about actual node distribution/invariabts when created - can we compress more given the specific creation patrerns?

Let track struct size in the comments for specific expr type and compare with LightExoresdion analog as a smallest base

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b9df8ea. Here's what changed:

Field reorder → 40→32 bytes: placing the two ref fields (Type, Info) first eliminates the 4-byte padding that LayoutKind.Sequential inserts between the original leading NodeType (int) and the first reference. The struct comment documents the layout and compares to LightExpression heap objects (~40 bytes for Constant/Parameter, ~48–56 for Binary/Unary, all with 16-byte GC header).

ConstantIndex removed: repurposed ExtraIdx as the constant discriminator:

  • ExtraIdx.It == 0 (nil): value is in Info (boxed reference, or null)
  • ExtraIdx.It > 0: ClosureConstants[ExtraIdx.It - 1] (1-based, avoids ambiguity with nil)
  • ExtraIdx.It == -1: inline bits — value packed into ChildIdx.It, no heap allocation

Inline int32 constants: bool, byte, sbyte, char, short, ushort, int, uint, float (via BitConverter.SingleToInt32Bits) all store directly in ChildIdx.It. Larger/reference types (string, long, double, decimal, DateTime, Guid) remain in Info or closure as before.

Copy link
Copy Markdown
Owner

@dadhi dadhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Add minimal meaningful eesential focused summary comment for each public elements to fix the build on ci

Error: /home/runner/work/FastExpressionCompiler/FastExpressionCompiler/src/FastExpressionCompiler/FlatExpression.cs(51,16): error CS1591: Missing XML comment for publicly visible type or member 'Idx.It' [/home/runner/work/FastExpressionCompiler/FastExpressionCompiler/src/FastExpressionCompiler/FastExpressionCompiler.csproj::TargetFramework=netstandard2.0]

Copilot AI requested a review from dadhi April 13, 2026 06:34
@dadhi
Copy link
Copy Markdown
Owner

dadhi commented Apr 13, 2026

@copilot

Gaps / obstacles
Not API-compatible with System.Linq.Expressions without the adapter
Mutable struct: accidental by-value copy silently forks state
Info field boxes MethodBase/string — one allocation per call/new/parameter node

Let step back and fo a principled review of the Expressio API differences and FastExpressionCompiler conforming API for those differences: let analyze current diff beteeen SE and fec LE, then to FE. How can we bring the gap and polifill into a (mostly) unified API without decreasing performance? Let compare concrete method calls and parameters representation on some example to make evident. Let reavaluate our current approach with polyfills and usings, etc.and evaluate how to do better.
Let think outside of the box and msybe ecplore ortogonal ways to eliminate the gaps by eliminating the ñroblem context - example idea: make FastExpressionCompiler a dtruct instead of static class to carry/pass the emit state in struct instance instead of through .ethod parameters, that way centralizing the state and its difference in a single place with the unifidd access methods if needed and avoiding #if on each TryEmit method definition. Let explore other drastic orthogonal changes. Let also think of FlatExoression-like structures prevalent in games and ECS or in data-oriented/apl context - can we borrow their solution to our cause. Let say we now trivially implemented structural comparison comparing to recursive tree descent required in fec. Then what else can we fo trivialy to #if the impl. for FE and maybe avoid mixing it in tbe rest of tbe Fec at all

Copy link
Copy Markdown
Owner

@dadhi dadhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comments and build failures.

…ter.SingleToInt32Bits; add XML docs to all public members

Agent-Logs-Url: https://github.com/dadhi/FastExpressionCompiler/sessions/8d5fabaa-d5e2-458c-b771-309266696c44

Co-authored-by: dadhi <39516+dadhi@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 13, 2026

@copilot Add minimal meaningful eesential focused summary comment for each public elements to fix the build on ci

Error: /home/runner/work/FastExpressionCompiler/FastExpressionCompiler/src/FastExpr...

Fixed in 116a593. Two CI blockers resolved:

  1. CS1591 — added <summary> XML docs to every public member (Idx.It, Idx.IsNil, Idx.Nil, Idx.Of, ExpressionNode fields, and all ExpressionTree fields/properties/methods).

  2. CS0117BitConverter.SingleToInt32Bits/Int32BitsToSingle don't exist on netstandard2.0. Replaced with a [StructLayout(LayoutKind.Explicit)] union struct FloatIntBits that overlays a float and an int at offset 0 — portable across all target frameworks (net472, netstandard2.0/2.1, net6+).

Full Release build now passes with 0 warnings and 0 errors across all targets.

@copilot

Gaps / obstacles
...

Here's the structured analysis and some concrete orthogonal directions.


SE vs LE vs FE — where the gap lives

Aspect SE (System.Linq.Expressions) LE (LightExpression) FE (FlatExpression)
Node identity Object reference (ReferenceEquals) Object reference Integer index (Idx)
ParameterExpression sharing Same object == same param Same object == same param Same Idx == same param
Factory API Expression.Add(left, right) Expression.Add(left, right) (identical) tree.Add(left, right, typeof(int)) — type is explicit
Return type inference Built-in for numeric ops Built-in Caller must pass Type
Visitor pattern ExpressionVisitor ExpressionVisitor Not implemented
#if surface None LIGHT_EXPRESSION on each TryEmit overload None yet

The biggest API friction point: SE and LE both infer result types on arithmetic ops, FE requires an explicit Type argument — callers can't just replace Expression.Add(a, b) with tree.Add(a, b).


Polyfill idea: extension helpers that mirror SE factory names

// In FlatExpression — a thin shim so existing code compiles unchanged
public Idx Add(Idx left, Idx right) => Add(left, right, NodeAt(left).Type);
public Idx Subtract(Idx left, Idx right) => Subtract(left, right, NodeAt(left).Type);

This narrows the gap for arithmetic at the cost of a NodeAt call per factory, which is O(1) but still forces a bounds check. Worth it for migration ergonomics.


Orthogonal idea 1: FastExpressionCompiler as a struct

Today the compiler passes all its mutable emit-state through method parameters with ref. A struct instance carrying that state would:

  • eliminate the need for #if LIGHT_EXPRESSION on each TryEmit signature (the struct type IS the dispatch)
  • allow FlatExpression-specific overloads to live in a FlatExpressionCompiler : ICompiler<ExpressionTree> struct instead of being mixed into the class
  • make the "which expression representation are we compiling?" a compile-time type parameter, not a runtime flag
struct FlatExpressionCompiler { 
    public void Emit(ref ExpressionTree tree, ILGenerator il, ref ClosureInfo ci) { ... }
}

Orthogonal idea 2: borrow ECS "archetype" separation

In ECS, entities of different archetypes (different component sets) live in separate SOA (Structure-of-Arrays) storages. Applied here: instead of one fat ExpressionNode for every possible node kind, split into typed arrays per node kind:

ConstantNodes[]    // only Type + inline value/index
ParameterNodes[]   // only Type + name
BinaryNodes[]      // only Type + left + right + method

Idx could encode both the kind and the position: upper 8 bits = kind tag, lower 24 bits = per-array index. This halves average node size, makes kind dispatch table-driven, and makes cache-line packing much better for homogeneous trees (e.g. all-arithmetic).

The tradeoff: cross-kind traversal (walking a tree) requires a tagged-union dispatch on every step, but FEC's emit loop already does that via switch (node.NodeType).


What FE already trivialises that LE cannot

  • Structural equality — single memcmp-style loop, no visitor
  • Serialization — arrays of plain value-type structs; no object graph
  • Clone/fork — copy the Nodes array (only 16 × 32 = 512 bytes for a small tree)
  • Diff/patch — compare two Nodes[] pairwise; emit change log

These are cases where the "no objects" design pays off even before we touch the compiler proper. If FE can round-trip through ToSystemExpression() today, the full FEC emit path is already reachable without any #if; the #if-free compiler integration is a next step.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 13, 2026

Address comments and build failures.

Both CI build failures were fixed in 116a593:

  • CS0117 (BitConverter.SingleToInt32Bits missing on netstandard2.0) — replaced with a portable [StructLayout(LayoutKind.Explicit)] FloatIntBits union
  • CS1591 (missing XML docs) — added <summary> to every public member

The latest CI run (action_required) is GitHub's workflow approval gate for the Copilot PR, not a build failure — it needs a manual approval click from the repo owner to start the jobs. Local build is clean: 0 warnings, 0 errors across all targets (net472, netstandard2.0/2.1, net6/8/9); all 1677 tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data-oriented representation of Expression partly on stack and optimized for constant collection

2 participants