constant fold string.concat#127655
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR experiments with enabling JIT constant-folding of string.Concat(string, string[, string[, string]]) into a single frozen string handle, aiming to produce direct “return embedded string” codegen for fully-constant concatenations.
Changes:
- Marks the string-only
String.Concatoverloads as[Intrinsic]and teaches the JIT to treat them as a special intrinsic candidate for folding. - Adds a new JIT-EE interface method (
tryAppendStrings) that concatenates frozen string handles into a new frozen string handle (best-effort). - Extends SuperPMI, NativeAOT (RyuJIT), and JIT interface thunking to record/replay the new API and enable AOT-side folding.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs | Marks string-only String.Concat overloads as [Intrinsic]. |
| src/coreclr/vm/jitinterface.cpp | Implements CEEInfo::tryAppendStrings to allocate/copy a concatenated frozen string. |
| src/coreclr/tools/superpmi/superpmi/icorjitinfo.cpp | Adds SuperPMI call logging + replay hook for tryAppendStrings. |
| src/coreclr/tools/superpmi/superpmi-shim-simple/icorjitinfo_generated.cpp | Forwards tryAppendStrings in the simple shim. |
| src/coreclr/tools/superpmi/superpmi-shim-counter/icorjitinfo_generated.cpp | Adds call counting instrumentation for tryAppendStrings. |
| src/coreclr/tools/superpmi/superpmi-shim-collector/icorjitinfo.cpp | Records/replays tryAppendStrings in the collector shim. |
| src/coreclr/tools/superpmi/superpmi-shared/methodcontext.h | Adds MethodContext packet + APIs for tryAppendStrings. |
| src/coreclr/tools/superpmi/superpmi-shared/methodcontext.cpp | Implements record/replay serialization for tryAppendStrings. |
| src/coreclr/tools/superpmi/superpmi-shared/lwmlist.h | Adds LWM map declaration for TryAppendStrings. |
| src/coreclr/tools/Common/JitInterface/ThunkGenerator/ThunkInput.txt | Updates thunk generator inputs for CORINFO_OBJECT_HANDLE* + new API. |
| src/coreclr/tools/Common/JitInterface/CorInfoImpl_generated.cs | Adds managed callback plumbing for tryAppendStrings. |
| src/coreclr/tools/aot/jitinterface/jitinterface_generated.h | Adds wrapper thunk for tryAppendStrings in AOT jitinterface. |
| src/coreclr/tools/aot/ILCompiler.RyuJit/JitInterface/CorInfoImpl.RyuJit.cs | Implements AOT-side best-effort folding by concatenating FrozenStringNode parts. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs | Adds stub tryAppendStrings implementation (currently returns null). |
| src/coreclr/jit/namedintrinsiclist.h | Adds NI_System_String_Concat named intrinsic id. |
| src/coreclr/jit/inline.def | Adds new inline fatal observation IS_FOLDABLE_INTRINSIC. |
| src/coreclr/jit/importercalls.cpp | Marks string-only String.Concat as special intrinsic; suppresses inlining when all args are constant strings. |
| src/coreclr/jit/ICorJitInfo_wrapper_generated.hpp | Adds wrapper entry for tryAppendStrings. |
| src/coreclr/jit/ICorJitInfo_names_generated.h | Adds API name entry for tryAppendStrings. |
| src/coreclr/jit/gentree.cpp | Adds morph-time folding of constant String.Concat into a single frozen string handle. |
| src/coreclr/jit/compiler.h | Declares VN-based folding helper for string concat. |
| src/coreclr/jit/assertionprop.cpp | Adds VN-based folding of String.Concat when args’ VNs resolve to constant object handles. |
| src/coreclr/inc/jiteeversionguid.h | Bumps JIT/EE interface version GUID. |
| src/coreclr/inc/icorjitinfoimpl_generated.h | Adds tryAppendStrings to generated EE-side ICorJitInfo impl header. |
| src/coreclr/inc/corinfo.h | Adds ICorStaticInfo::tryAppendStrings API contract documentation. |
| } | ||
| if (allCns) | ||
| { | ||
| inlineResult->NoteFatal(InlineObservation::CALLEE_IS_FOLDABLE_INTRINSIC); |
There was a problem hiding this comment.
inlineResult->NoteFatal(InlineObservation::CALLEE_IS_FOLDABLE_INTRINSIC) uses a CALLEE-targeted observation. In the inlining policy, CALLEE-targeted fatals become a NEVER decision, which can cause System.String.Concat to be treated as non-inlineable for all callsites after the first time this triggers. That contradicts the intent described above (allow inlining when any arg is non-constant). Consider making this a CALLSITE-targeted observation (or otherwise recording a callsite-only failure) so only this specific callsite is rejected.
| inlineResult->NoteFatal(InlineObservation::CALLEE_IS_FOLDABLE_INTRINSIC); | |
| inlineResult->NoteFatal(InlineObservation::CALLSITE_IS_FOLDABLE_INTRINSIC); |
| INLINE_OBSERVATION(IS_NOINLINE, bool, "noinline per IL/cached result", FATAL, CALLEE) | ||
| INLINE_OBSERVATION(IS_SYNCHRONIZED, bool, "is synchronized", FATAL, CALLEE) | ||
| INLINE_OBSERVATION(IS_VM_NOINLINE, bool, "noinline per VM", FATAL, CALLEE) | ||
| INLINE_OBSERVATION(IS_FOLDABLE_INTRINSIC, bool, "foldable intrinsic - keep as call", FATAL, CALLEE) |
There was a problem hiding this comment.
IS_FOLDABLE_INTRINSIC is declared with target CALLEE, which means any NoteFatal using it will set the inlining decision to NEVER (callee-wide) rather than failing just the current callsite. Given the goal is to avoid inlining only when the arguments are all constant strings, this should likely be a CALLSITE observation (and the use site updated accordingly).
| INLINE_OBSERVATION(IS_FOLDABLE_INTRINSIC, bool, "foldable intrinsic - keep as call", FATAL, CALLEE) | |
| INLINE_OBSERVATION(IS_FOLDABLE_INTRINSIC, bool, "foldable intrinsic - keep as call", FATAL, CALLSITE) |
| // Try to fold String.Concat(<cns>, <cns> [, <cns> [, <cns>]]) into a | ||
| // single frozen-string handle. We only see this case when the | ||
| // importer marked the call as a special intrinsic, which requires | ||
| // every argument to be a GT_CNS_STR. |
There was a problem hiding this comment.
The comment says the importer marking the call as a special intrinsic "requires every argument to be a GT_CNS_STR", but impIntrinsic marks the string-only overloads as special regardless of whether the args are constant; the folding logic below is what requires GT_CNS_STR. Consider adjusting the comment to avoid implying stronger invariants than actually enforced.
| bool isFrozen = false; | ||
| resultRef = AllocateString((DWORD)totalLength, /*preferFrozenHeap*/ true, &isFrozen); | ||
| if (isFrozen && (resultRef != NULL)) | ||
| { | ||
| WCHAR* dst = resultRef->GetBuffer(); | ||
| for (int i = 0; i < count; i++) | ||
| { | ||
| STRINGREF s = (STRINGREF)sourceRefs[i]; | ||
| DWORD len = s->GetStringLength(); | ||
| if (len > 0) | ||
| { | ||
| memcpyNoGCRefs(dst, s->GetBuffer(), len * sizeof(WCHAR)); | ||
| dst += len; | ||
| } | ||
| } | ||
| _ASSERTE(resultRef->GetBuffer()[totalLength] == W('\0')); | ||
|
|
||
| result = getJitHandleForObject((OBJECTREF)resultRef, /*knownFrozen*/ true); | ||
| } | ||
| // If the result didn't end up on the frozen heap, we drop it on | ||
| // the floor: we cannot keep a non-frozen handle alive across JIT | ||
| // boundaries and it's not safe to bake its address into codegen. |
There was a problem hiding this comment.
tryAppendStrings allocates a brand-new frozen string via AllocateString(..., preferFrozenHeap: true) for every fold. Because this bypasses the string literal map / interning, the same concatenated constant appearing in many methods can produce many duplicate frozen strings and permanently grow the frozen heap. Consider routing this through the existing string literal infrastructure (e.g., build an EEStringData and use the global string literal map to deduplicate) or otherwise adding a dedup/cache to avoid unbounded frozen-heap growth from repeated folds.
| bool isFrozen = false; | |
| resultRef = AllocateString((DWORD)totalLength, /*preferFrozenHeap*/ true, &isFrozen); | |
| if (isFrozen && (resultRef != NULL)) | |
| { | |
| WCHAR* dst = resultRef->GetBuffer(); | |
| for (int i = 0; i < count; i++) | |
| { | |
| STRINGREF s = (STRINGREF)sourceRefs[i]; | |
| DWORD len = s->GetStringLength(); | |
| if (len > 0) | |
| { | |
| memcpyNoGCRefs(dst, s->GetBuffer(), len * sizeof(WCHAR)); | |
| dst += len; | |
| } | |
| } | |
| _ASSERTE(resultRef->GetBuffer()[totalLength] == W('\0')); | |
| result = getJitHandleForObject((OBJECTREF)resultRef, /*knownFrozen*/ true); | |
| } | |
| // If the result didn't end up on the frozen heap, we drop it on | |
| // the floor: we cannot keep a non-frozen handle alive across JIT | |
| // boundaries and it's not safe to bake its address into codegen. | |
| // Build the concatenated contents in temporary storage and route | |
| // the lookup through the global string literal map so identical | |
| // folded constants are deduplicated instead of allocating a new | |
| // frozen string for every fold. | |
| WCHAR *buffer = (WCHAR*)_alloca((totalLength + 1) * sizeof(WCHAR)); | |
| WCHAR *dst = buffer; | |
| for (int i = 0; i < count; i++) | |
| { | |
| STRINGREF s = (STRINGREF)sourceRefs[i]; | |
| DWORD len = s->GetStringLength(); | |
| if (len > 0) | |
| { | |
| memcpyNoGCRefs(dst, s->GetBuffer(), len * sizeof(WCHAR)); | |
| dst += len; | |
| } | |
| } | |
| buffer[totalLength] = W('\0'); | |
| EEStringData stringData; | |
| stringData.Init(buffer, (DWORD)totalLength); | |
| STRINGREF literalRef = SystemDomain::GetGlobalStringLiteralMap()->GetInternedString( | |
| &stringData, | |
| /*preferFrozenObjHeap*/ true); | |
| if (literalRef != NULL) | |
| { | |
| resultRef = literalRef; | |
| result = getJitHandleForObject((OBJECTREF)resultRef, /*knownFrozen*/ true); | |
| } | |
| // If the runtime cannot provide a deduplicated frozen literal, we | |
| // drop the fold on the floor: we cannot keep a non-frozen handle | |
| // alive across JIT boundaries and it's not safe to bake its | |
| // address into codegen. |
| { | ||
| if (HandleToObject(strings[i]) is not FrozenStringNode frozenStr) | ||
| { | ||
| return null; | ||
| } | ||
| parts[i] = frozenStr.Data; | ||
| totalLength += parts[i].Length; | ||
| if (totalLength > 0x3FFFFFDF /* CORINFO_String_MaxLength */) | ||
| { | ||
| return null; |
There was a problem hiding this comment.
The 0x3FFFFFDF /* CORINFO_String_MaxLength */ literal is a hard-coded copy of the runtime max string length. This risks drifting if the runtime constant changes. Consider introducing a named constant in this file (or reusing an existing shared constant if available) and referencing that instead of embedding the magic number inline.
* fgbasic.cpp: skip the CALLEE_INTRINSIC inline observation for NI_System_String_Concat. Marking it as [Intrinsic] alone shouldn't bias the inliner toward inlining its caller — when the arguments are not all constants, the call survives as a non-trivial method call and the 'most likely lowered as single instructions' assumption doesn't hold. * vm/jitinterface.cpp: in CEEInfo::tryAppendStrings, bail out before the JIT/EE transition if the method being compiled lives in a collectible (unloadable) context. Frozen-heap objects live forever while collectible code can be unloaded, so any allocation we do for such a method would be dead weight on the frozen heap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
will see if I can implement it differently on top of #127659 |
Just a quick experiment.
new codegen for Foo: