Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: ILC & R2R UTF-8 name mangling. #3

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

PaulusParssinen
Copy link
Owner

@PaulusParssinen PaulusParssinen commented May 17, 2024

wip: Previewing diffs

todo: desc
todo: r2r correctness

Background

dotnet/corert/issues/2178
dotnet/corert/pull/2200

todo: write motivation

This PR

  • Introduce ref struct Utf8StringBuilder
    • To mirror already familiar internal ValueStringBuilder but UTF-8.
    • Pooling now happens in the Utf8StringBuilder using the ArrayPool<byte>.Shared
      • Previously the builder instance was pooled (and consequently the underlying per-thread buffer, which did very well). There's pros and cons on both approaches as always but the control over the initialBuffer allows us to avoid allocating or renting altogether
      • Previously only the extension method GetName(this ISymbolNode ..) had pooling logic, but now we can fallback to pool anywhere where we build mangled symbol name strings.
  • Avoid intermediate string allocations
    • For the existing a GetMangled[Type/Method/Field/String]Name methods,
      introduce AppendMangled[Type/Method/Field/String]Name overload which can be used to append directly to the existing builder (very common operation)
      • This removes a lot of "back-and-forth transcoding between UTF-8 <-> UTF-16 due to implicit string -> Utf8String cast and its overhead.

Problems and questions

  • This PR was inteded to keep as unintrusive as possible for easier review but that turned out to be very hard due to viral nature of symbol names. In my opinion, it was more straightforward to go all in with the Utf8String so this PR flows it through all the way to the ObjectWriter logic. Choosing a cut off point for the Utf8String elsewhere would render null the transcoding/allocation savings.
  • pre-existing inconsistent name sanitization. Fix now? Still relevant as this was for primarily for the Cpp backend?

Potential future improvements

  • If we get custom UTF-8 string interpolated handlers, they'll allow tidying the mangling logic.
  • IPrefixMangled[Method/Type] could be optimized to append mangled names

Results

dotnet new webapiaot

Stage 2 Goldilocks (TodosApi)

  • -100MB of GC heap allocations removed?
    250601_todosapi

Compile ILC with ILC

  • todo:

Bench with using the NAOT compiled ILC

  • todo: how to measure NAOT GC perf

@PaulusParssinen PaulusParssinen changed the title wip: ILC & R2R name mangling. wip: ILC & R2R UTF-8 name mangling. May 17, 2024
@PaulusParssinen PaulusParssinen force-pushed the ilc-name-mangling-alloc branch 2 times, most recently from 933391b to b789b7c Compare May 29, 2024 23:05
@PaulusParssinen PaulusParssinen marked this pull request as draft May 29, 2024 23:05
* And pass it around as ref. This required dropping the fluent pattern.
accidentally fixed bugs maybe, a lot of todos left
* The last diff is inconsistent usage of AppendMangledTypeName in InterfaceDispatchMap. Other nodes append unmangled < & > characters but in this type it was sanitized, causing this diff.
* more consistent of u8str initialization

* simplify StringTableBuilder.CreateIndex
* Also reduce some allocations in unwind utf-8 paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant