Skip to content

Switch to non-generated materialization #38310

@roji

Description

@roji

EF's current materialization currently happens via runtime code generation. Translation (and post-processing) yields a ShapedQueryExpression who's shaper represents the thing that is projected out from the database query. This shaper is then fed into ShapedQueryCompilingExpressionVisitor, who's job is to generate a LINQ expression tree representing the code for reading the query results back and materializing .NET objects out of them.

This issue tracks redoing that, moving away from runtime code generation and using simple, general logic written as static .NET code for materialization, calling property setters to inject each property. Motivations to do this:

  • NativeAOT: runtime code generation is simply not supported.
  • Simplification: the current code generator is among the most complicated and debt-ridden parts of EF Core; some of this complexity is inherent to code generation (which is always hard), and some is a result of us designing things in a suboptimal way. For example, we have a provider-agnostic code-generating visitor which is then partially overridden by a provider-specific visitor, which identifies patterns in the tree produced by the former, replacing them with e.g. DbDataReader read methods in relational.
  • Materialization performance: the traditional explanation for the use of runtime code generation has been to generate a 100% tailored materializer for each query, eliminating e.g. unneeded branches for materialization features that aren't needed by that specific query. Ironically enough, benchmarking of a pretty advanced (but incomplete) rewrite showed no performance benefit for typical scenarios, and a substantial performance improvement in more complex scenarios involving includes. It's likely that our current implementation is simply very under-optimized, but the simple non-generated code path also tends to be much friendlier to various JIT optimizations (inlining etc.).
    • Note that there are some specific scenarios whose performance does regress (e.g. anonymous type projection), as reflection is needed where previously a custom compiled delegate was used to e.g. instantiate the type. Such targeted runtime code generation can still be used in JIT mode, so that regressions (hopefully) only occur when actually running under NativeAOT.
  • Compilation performance: compiling the expression tree is expensive; switching to non-generated materialization should speed up the program start-up time.

Partial work has been done on this in the branch roji/non-generated-materializer-poc:

  • Work focused on getting the SQLite query functional tests to be green; almost all tests indeed pass.
  • The work was AI-driven, and while various course corrections and cleanup were done, the focus was to first get everything done and only then focus on deeply reviewing/cleaning up/optimizing the implementation. This still needs to happen, and there is lots of room for improvement.
  • The new materializer is implemented as a relational-only internal implementation detail - the existing generated materialization infra remains untouched, and it should be possible to switch between the two via a simple config flag. Once everything is done in relational, we can also see about extracting common infrastructure up to core, to avoid having to reimplement everything for every provider.
  • The only change outside of relational materialization was done to flow the query's actual projected type all the way from start to end. Our current query pipeline loses the query's returned element type extremely early, calling Execute<IEnumerable<TElement>>, so that later steps are generic over a T that represents the enumerable, and not its element (the same problem exists with async execution and Task<T>). In the branch, enumerable execution flows TElement all the way to materialization generation.
  • Having TElement flow all the way allows us to to e.g. efficiently materialize projected value types without boxing (e.g. Select(b => b.SomeInt); but it's impossible to implement fully non-boxing materialization in some other cases where the value type is in an anonymous type (e.g. Select(b => new { b.SomeInt }). We can also do non-boxing materialization for value types on modeled structural types, as the model contains property accessors for these; this includes value converters as well (already implemented in the branch).

/cc @AndriySvyryd

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions