RyuJIT bounds checks not eliminated for readonly arrays #11797

benaadams · 2019-01-11T03:57:47Z

Range analysis doesn't take into account readonly arrays and you need to make a function local reference to eliminate the range check.

https://github.com/dotnet/coreclr/issues/5371 regressed due to dotnet/coreclr#15756

It shouldn't need to be conservative for a readonly array?

/cc @briansull @AndyAyersMS @mikedn

category:cq
theme:bounds-checks
skill-level:expert
cost:medium
impact:medium

AndyAyersMS · 2019-01-25T21:24:49Z

Might be relatively low risk and modest benefit. Keeping in 3.0 for now until we can better assess potential benefit.

AndyAyersMS · 2019-02-15T21:54:41Z

Could be as simple as a well placed GTF_IND_INVARIANT, for ref type statics from initialized classes.

benaadams · 2019-02-15T22:19:49Z

Example knock on code change for this is in ConcurrentQueue<T> where _slots is a readonly Slot[] but needed to be taken to a local array to skip the checks dotnet/coreclr#21192 /cc @stephentoub

In particular this should do interesting things for ref typed readonly initialized statics, allowing reads to be hoisted out of loops, for instance. Addresses part of #21951. Note readonly instance fields cannot be given special treatment.

AndyAyersMS · 2019-02-16T05:45:16Z

Preview of jit changes here: master...AndyAyersMS:InitializedReadonlyStaticInvariance.

Unfortunately this won't help ConcurrentQueue<T> or the examples in #5990. We can't give special treatment to for readonly instance members, because we don't have any way of knowing when an object is fully constructed, and even if we did, any method that read the field would have to handle both the fully constructed and not fully constructed cases. Furthermore, we haven't blocked the various avenues for modifying readonly instance fields after construction is done (eg reflection).

But we can give special treatment to readonly static fields: we can tell if the associated .cctor has finished executing, and we have now blocked the ability to update the field via reflection.

There seem to be a fair number of these static cases about, and initial diffs look promising (Note this used --cctors to try and increase chances that statics were initialized when jittting).

Total bytes of diff: -22601 (-0.06% of base)
    diff is an improvement.
Top file regressions by size (bytes):
         374 : Microsoft.CSharp.dasm (0.12% of base)
          48 : System.Runtime.Serialization.Formatters.dasm (0.05% of base)
          22 : NuGet.Configuration.dasm (0.04% of base)
          18 : Microsoft.DotNet.ProjectModel.dasm (0.01% of base)
          18 : System.Net.WebSockets.dasm (0.05% of base)
Top file improvements by size (bytes):
      -12225 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.43% of base)
       -1871 : System.Private.Xml.dasm (-0.05% of base)
       -1244 : System.Net.Sockets.dasm (-0.65% of base)
        -975 : System.Net.Security.dasm (-0.58% of base)
        -675 : System.Data.Common.dasm (-0.05% of base)
63 total files with size differences (57 improved, 6 regressed), 66 unchanged.
Top method regressions by size (bytes):
         403 (13.12% of base) : Microsoft.CSharp.dasm - ExpressionBinder:GetStandardAndLiftedBinopSignatures(ref,ref):bool:this
         112 ( 2.67% of base) : System.Data.Common.dasm - DataTable:SerializeTableSchema(ref,struct,bool):this
         105 (20.00% of base) : Microsoft.CodeAnalysis.dasm - SmallDictionary`2:RightComplex(ref):ref (5 base, 6 diff methods)
          96 ( 4.25% of base) : Microsoft.DotNet.ProjectModel.dasm - PatternBuilder:Build(ref):ref:this
          70 ( 9.15% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VisualBasicWarningStateMap:CreateWarningStateEntries(struct):ref
Top method improvements by size (bytes):
       -5982 (-5.44% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:EnumerateTemplates(ref,ref):this
       -2190 (-4.84% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(ref,ref):this
       -1494 (-4.53% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrPrivateTraceEventParser:EnumerateTemplates(ref,ref):this
       -1167 (-4.08% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrTraceEventParser:EnumerateTemplates(ref,ref):this
        -640 (-3.48% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - AspNetTraceEventParser:EnumerateTemplates(ref,ref):this
Top method regressions by size (percentage):
         105 (20.00% of base) : Microsoft.CodeAnalysis.dasm - SmallDictionary`2:RightComplex(ref):ref (5 base, 6 diff methods)
         403 (13.12% of base) : Microsoft.CSharp.dasm - ExpressionBinder:GetStandardAndLiftedBinopSignatures(ref,ref):bool:this
          70 ( 9.15% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VisualBasicWarningStateMap:CreateWarningStateEntries(struct):ref
          28 ( 7.45% of base) : Microsoft.CodeAnalysis.CSharp.dasm - StackOptimizerPass1:VisitArrayInitialization(ref):ref:this
           4 ( 7.14% of base) : System.Data.Common.dasm - Operators:Priority(int):int
Top method improvements by size (percentage):
        -113 (-35.53% of base) : System.Net.HttpListener.dasm - CookieTokenizer:TokenFromName(bool):int:this
        -113 (-35.53% of base) : System.Net.Primitives.dasm - CookieTokenizer:TokenFromName(bool):int:this
         -56 (-25.57% of base) : System.Data.Common.dasm - ColumnTypeConverter:ConvertFrom(ref,ref,ref):ref:this
         -27 (-22.88% of base) : System.Data.Common.dasm - DataStorage:IsSqlType(ref):bool
         -17 (-22.67% of base) : System.Net.HttpListener.dasm - HttpListenerResponse:CanSendResponseBody(int):bool
850 total methods with size differences (757 improved, 93 regressed), 192512 unchanged.

Method with the biggest percent diff shows repeated access to a readonly static array which can now be optimized as if the array ref was hoisted to a local.

https://github.com/dotnet/corefx/blob/e0ba7aa8026280ee3571179cc06431baf1dfaaac/src/Common/src/System/Net/CookieParser.cs#L472-L488

Is it my imagination, or are there two identical copies of the CookieTokenizer?

AndyAyersMS · 2019-02-22T17:52:04Z

@benaadams do you think this is still interesting, even if it can't catch the cases you'd hoped?

benaadams · 2019-02-22T18:42:05Z

Yes definitely; I actually thought readonly static arrays did work this way

AndyAyersMS · 2019-02-22T18:56:09Z

Ok. Let me look into some of the regressions.

Running the above over the jitutils\fx set augmented with ASP.Net assemblies shows some hits here and there in ASP.Net. Need to add some sort of filter option to jit-analyze so I can summarize these better.

benaadams · 2019-02-22T18:58:13Z

Two examples from Kestrel would be

private static readonly Tuple<ulong, ulong, HttpMethod, int>[] _knownMethods =
    new Tuple<ulong, ulong, HttpMethod, int>[17];

private static readonly string[] _methodNames = new string[9];

Though there are probably better examples; not sure they are used in a way that would skip bounds checking

AndyAyersMS · 2019-02-22T20:07:16Z

Filtered view ...

Summary: (using filter '*AspNet*')
(Lower is better)

Total bytes of diff: -1212 (-0.04% of base)
    diff is an improvement.

Top file regressions by size (bytes):
          18 : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm (0.34% of base)
          12 : Microsoft.AspNetCore.Server.IIS.dasm (0.01% of base)
           5 : Microsoft.AspNetCore.Http.dasm (0.01% of base)

Top file improvements by size (bytes):
        -405 : Microsoft.AspNetCore.Diagnostics.dasm (-0.48% of base)
        -171 : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm (-0.05% of base)
        -128 : Microsoft.AspNetCore.Mvc.Razor.dasm (-0.21% of base)
        -111 : Microsoft.AspNetCore.Routing.dasm (-0.05% of base)
         -48 : Microsoft.AspNetCore.Identity.dasm (-0.09% of base)
         -48 : Microsoft.AspNetCore.Mvc.TagHelpers.dasm (-0.06% of base)
         -40 : Microsoft.AspNetCore.Cors.dasm (-0.24% of base)
         -32 : Microsoft.AspNetCore.Server.Kestrel.Core.dasm (-0.01% of base)
         -27 : Microsoft.AspNetCore.StaticFiles.dasm (-0.07% of base)
         -26 : Microsoft.AspNetCore.HttpOverrides.dasm (-0.20% of base)

30 total files with size differences (27 improved, 3 regressed), 44 unchanged.

Top method regressions by size (bytes):
          49 ( 3.65% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - ChunkingCookieManager:AppendResponseCookie(ref,ref,ref,ref):this
          17 ( 3.21% of base) : Microsoft.AspNetCore.Http.dasm - DefaultHttpResponse:Redirect(ref,bool):this
          17 ( 0.84% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - TagBuilder:MergeAttributes(ref,bool):this (5 methods)
          13 ( 1.59% of base) : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm - Win7Pbkdf2Provider:DeriveKey(ref,ref,int,int,int):ref:this
          12 ( 1.36% of base) : Microsoft.AspNetCore.Server.IIS.dasm - IISHttpContext:SetResponseHeaders():this
          10 ( 2.03% of base) : Microsoft.AspNetCore.Server.Kestrel.Core.dasm - Http2FrameWriter:Write100ContinueAsync(int):struct:this
           5 ( 0.66% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - CookieAuthenticationHandler:CloneTicket(ref,ref):ref:this
           5 ( 0.96% of base) : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm - Win8Pbkdf2Provider:PasswordToPbkdfKeyHandle(ref,ref,int):ref
           3 ( 0.62% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - NameAndIdProvider:CreateSanitizedId(ref,ref,ref):ref
           3 ( 0.68% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - NameAndIdProvider:GetFullHtmlFieldName(ref,ref):ref

Top method improvements by size (bytes):
        -266 (-0.80% of base) : Microsoft.AspNetCore.Diagnostics.dasm - <ExecuteAsync>d__0:MoveNext():this (3 methods)
        -139 (-1.87% of base) : Microsoft.AspNetCore.Diagnostics.dasm - <ExecuteAsync>d__1:MoveNext():this
         -48 (-6.00% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - Compiler`2:CompileFromSimpleMemberAccess(ref,ref):ref (5 methods)
         -48 (-3.81% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - Compiler`2:CompileForChainedMemberAccess(ref,ref):ref (5 methods)
         -43 (-5.99% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - ViewLocationCacheKey:GetHashCode():int:this
         -35 (-9.19% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - RazorViewEngineOptionsSetup:Configure(ref):this
         -34 (-0.94% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - <HandleSignInAsync>d__25:MoveNext():this
         -32 (-1.56% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - Compiler`2:CompileCapturedConstant(ref,ref):ref (5 methods)
         -27 (-0.94% of base) : Microsoft.AspNetCore.StaticFiles.dasm - HtmlDirectoryFormatter:GenerateContentAsync(ref,ref):ref:this
         -25 (-8.90% of base) : Microsoft.AspNetCore.Cors.dasm - CorsPolicy:.ctor():this

Top method regressions by size (percentage):
          49 ( 3.65% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - ChunkingCookieManager:AppendResponseCookie(ref,ref,ref,ref):this
          17 ( 3.21% of base) : Microsoft.AspNetCore.Http.dasm - DefaultHttpResponse:Redirect(ref,bool):this
          10 ( 2.03% of base) : Microsoft.AspNetCore.Server.Kestrel.Core.dasm - Http2FrameWriter:Write100ContinueAsync(int):struct:this
          13 ( 1.59% of base) : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm - Win7Pbkdf2Provider:DeriveKey(ref,ref,int,int,int):ref:this
          12 ( 1.36% of base) : Microsoft.AspNetCore.Server.IIS.dasm - IISHttpContext:SetResponseHeaders():this
           5 ( 0.96% of base) : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm - Win8Pbkdf2Provider:PasswordToPbkdfKeyHandle(ref,ref,int):ref
          17 ( 0.84% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - TagBuilder:MergeAttributes(ref,bool):this (5 methods)
           3 ( 0.68% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - NameAndIdProvider:GetFullHtmlFieldName(ref,ref):ref
           5 ( 0.66% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - CookieAuthenticationHandler:CloneTicket(ref,ref):ref:this
           3 ( 0.62% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - NameAndIdProvider:CreateSanitizedId(ref,ref,ref):ref

Top method improvements by size (percentage):
         -17 (-22.67% of base) : Microsoft.AspNetCore.Server.HttpSys.dasm - Response:CanSendResponseBody(int):bool
         -18 (-21.95% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - UrlResolutionTagHelper:IsCharWhitespace(ushort):bool
         -17 (-17.71% of base) : Microsoft.AspNetCore.HttpOverrides.dasm - ForwardedHeadersMiddleware:TryValidateScheme(ref):bool:this
         -35 (-9.19% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - RazorViewEngineOptionsSetup:Configure(ref):this
         -25 (-8.90% of base) : Microsoft.AspNetCore.Cors.dasm - CorsPolicy:.ctor():this
         -15 (-8.67% of base) : Microsoft.AspNetCore.Cors.dasm - CorsResult:.ctor():this
         -25 (-8.42% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - RazorViewEngineOptions:.ctor():this
          -6 (-7.79% of base) : Microsoft.AspNetCore.Identity.dasm - <>c__1`2:<AddIdentity>b__1_0(ref):this
          -5 (-6.94% of base) : Microsoft.AspNetCore.Http.Connections.dasm - TimerAwaitable:OnCompleted(ref):this
          -5 (-6.94% of base) : Microsoft.AspNetCore.Server.Kestrel.Core.dasm - StreamCloseAwaitable:OnCompleted(ref):this

98 total methods with size differences (88 improved, 10 regressed), 19129 unchanged.

AndyAyersMS · 2019-02-22T20:29:52Z

Note readonly for ref classes is just top-level immutability.

We don't yet drill into readonly static objects where top level immutability implies existence of immutable sub-parts -- if we did this we could surface new constants (say array length or string length or possibly even string contents).

We don't know what to do with accesses to readonly static structs just yet. My prototype changes have introduced a flag we could potentially use (GTF_FLD_INVARIANT) but I have yet to plumb this through. Not sure if we have sufficient safeguards in the system to prevent fields of a readonly static struct from being modified.

benaadams · 2019-02-22T21:10:42Z

Summary: (using filter '*AspNet*')

Very nice :)

AndyAyersMS · 2019-02-22T22:49:21Z

In GetStandardAndLiftedBinopSignatures the JIT introduces a CSE for the load of s_binopSignatures (readyonly static array). But this method already has a lot of register pressure so keeping that value in a register causes other spills.

;; before
;  V05 loc2         [V05,T01] ( 21, 46   )     ref  ->  r13         class-hnd

;; after
;  V05 loc2         [V05,T01] ( 21, 46   )     ref  ->  [rsp+0xF0]   class-hnd
;  V86 cse1         [V86,T58] (  3,  6   )     ref  ->  r13         "ValNumCSE"

In the after version, r13 is only referred to in the preheader and right at the top of the loop. The preheader reference is just for a bounds check. If not spilled r13 must stay live across the whole body, and the loop body is pretty large. So the lack of r13 causes a lot of turmoil.

Not sure why LSRA doesn't spill r13 -- but as a guess, the allocator probably picks shorter lifetimes as spill victims. Also if we did spill, becasuse there are quite a few continue statements in the body, there would be a large number of reload sites, as (AFAIK) the jit does not coalesce back edges in loops like this.

Chances are good that if we implement the follow-on optimization to propagate the array length as a jit-time constant then we'd fix this as the array ref would only need to be live within a the loop iteration, and similar may hold true of many of these "for loop over elements of a static readonly array" cases.

Something similar would happen if the user decided to manually CSE the static array by creating a local. So in addition to catching the explict references it would be good if we could propagate info about the reference to any local copies so we could better optimize cases that users have already tried to optimize.

In particular this should do interesting things for ref typed readonly initialized statics, allowing reads to be hoisted out of loops, for instance. Addresses part of #21951. Note readonly instance fields cannot be given special treatment.

benaadams · 2019-02-23T02:44:43Z

Chances are good that if we implement the follow-on optimization to propagate the array length as a jit-time constant then we'd fix this as...

There are a few readonly static array accesses that are done via enum that I thought wouldn't be effected by this; however if it is a propagated const length, then it could potentially also skip them there? (e.g. recognise in range)

benaadams · 2019-02-23T03:08:21Z

we can give special treatment to readonly static fields: we can tell if the associated .cctor has finished executing, and we have now blocked the ability to update the field via reflection.

Presumably any first pass blocks (if can't tell .cctor has finished, e.g. crossgen and first encounter); will get picked up by second pass in Tier1?

AndyAyersMS · 2019-02-25T19:17:48Z

There are a few readonly static array accesses that are done via enum that I thought wouldn't be effected by this; however if it is a propagated const length, then it could potentially also skip them there? (e.g. recognise in range)

I would certainly hope so.

Presumably any first pass blocks (if can't tell .cctor has finished, e.g. crossgen and first encounter); will get picked up by second pass in Tier1?

Right, at Tier1 there's a pretty good chance all the relevant .cctors have completed. So while the optimization is not going to be tiering specific, it should happen more frequently with tiering.

AndyAyersMS · 2019-02-25T19:21:30Z

As for the optimization itself -- I am going to add in the array length bit sometime soon; I think that plus the above should come out looking pretty solid.

But we're also trying to wrap things up for the release.

So am going to mark this as future. If the ongoing work looks encouraging, we can still consider it for 3.0.

Thealexbarney · 2019-04-08T04:34:02Z

Just leaving some performance numbers from using AES-NI with .NET Core 3.0. The AES round keys are stored in a private readonly array field in my AES class.

Accessing the array directly results in an encryption speed of 4.2 GB/s.

Creating a local reference to the array outside the hot loop and then accessing the last element of the array before accessing any other elements removes the remaining bounds checks and results in an encryption speed of 5.7 GB/s.

jakobbotsch · 2022-12-12T13:52:41Z

@EgorBo maybe your recent changes fixed this?

EgorBo · 2023-05-06T21:57:40Z

@EgorBo maybe your recent changes fixed this?

Yes, we now fold nullchecks for static readonly

AndyAyersMS self-assigned this Feb 15, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

ptr-null mentioned this issue Jul 5, 2022

RyuJIT: Tight loop performance regression #71628

Open

xtqqczze mentioned this issue Jan 23, 2023

Assist JIT in eliminating bounds checks #81036

Merged

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 24, 2023

EgorBo closed this as completed May 6, 2023

dotnet locked as resolved and limited conversation to collaborators Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT bounds checks not eliminated for readonly arrays #11797

RyuJIT bounds checks not eliminated for readonly arrays #11797

benaadams commented Jan 11, 2019 •

edited by BruceForstall

AndyAyersMS commented Jan 25, 2019

AndyAyersMS commented Feb 15, 2019

benaadams commented Feb 15, 2019 •

edited

AndyAyersMS commented Feb 16, 2019

AndyAyersMS commented Feb 22, 2019

benaadams commented Feb 22, 2019 •

edited

AndyAyersMS commented Feb 22, 2019

benaadams commented Feb 22, 2019 •

edited

AndyAyersMS commented Feb 22, 2019

AndyAyersMS commented Feb 22, 2019 •

edited

benaadams commented Feb 22, 2019

AndyAyersMS commented Feb 22, 2019

benaadams commented Feb 23, 2019

benaadams commented Feb 23, 2019

AndyAyersMS commented Feb 25, 2019

AndyAyersMS commented Feb 25, 2019

Thealexbarney commented Apr 8, 2019

jakobbotsch commented Dec 12, 2022

EgorBo commented May 6, 2023

RyuJIT bounds checks not eliminated for readonly arrays #11797

RyuJIT bounds checks not eliminated for readonly arrays #11797

Comments

benaadams commented Jan 11, 2019 • edited by BruceForstall

AndyAyersMS commented Jan 25, 2019

AndyAyersMS commented Feb 15, 2019

benaadams commented Feb 15, 2019 • edited

AndyAyersMS commented Feb 16, 2019

AndyAyersMS commented Feb 22, 2019

benaadams commented Feb 22, 2019 • edited

AndyAyersMS commented Feb 22, 2019

benaadams commented Feb 22, 2019 • edited

AndyAyersMS commented Feb 22, 2019

AndyAyersMS commented Feb 22, 2019 • edited

benaadams commented Feb 22, 2019

AndyAyersMS commented Feb 22, 2019

benaadams commented Feb 23, 2019

benaadams commented Feb 23, 2019

AndyAyersMS commented Feb 25, 2019

AndyAyersMS commented Feb 25, 2019

Thealexbarney commented Apr 8, 2019

jakobbotsch commented Dec 12, 2022

EgorBo commented May 6, 2023

benaadams commented Jan 11, 2019 •

edited by BruceForstall

benaadams commented Feb 15, 2019 •

edited

benaadams commented Feb 22, 2019 •

edited

benaadams commented Feb 22, 2019 •

edited

AndyAyersMS commented Feb 22, 2019 •

edited