Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT bounds checks not eliminated for readonly arrays #11797

Closed
benaadams opened this issue Jan 11, 2019 · 19 comments
Closed

RyuJIT bounds checks not eliminated for readonly arrays #11797

benaadams opened this issue Jan 11, 2019 · 19 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization
Milestone

Comments

@benaadams
Copy link
Member

benaadams commented Jan 11, 2019

Range analysis doesn't take into account readonly arrays and you need to make a function local reference to eliminate the range check.

https://github.com/dotnet/coreclr/issues/5371 regressed due to dotnet/coreclr#15756

It shouldn't need to be conservative for a readonly array?

/cc @briansull @AndyAyersMS @mikedn

category:cq
theme:bounds-checks
skill-level:expert
cost:medium
impact:medium

@AndyAyersMS
Copy link
Member

Might be relatively low risk and modest benefit. Keeping in 3.0 for now until we can better assess potential benefit.

@AndyAyersMS AndyAyersMS self-assigned this Feb 15, 2019
@AndyAyersMS
Copy link
Member

Could be as simple as a well placed GTF_IND_INVARIANT, for ref type statics from initialized classes.

@benaadams
Copy link
Member Author

benaadams commented Feb 15, 2019

Example knock on code change for this is in ConcurrentQueue<T> where _slots is a readonly Slot[] but needed to be taken to a local array to skip the checks dotnet/coreclr#21192 /cc @stephentoub

AndyAyersMS referenced this issue in AndyAyersMS/coreclr Feb 16, 2019
In particular this should do interesting things for ref typed readonly
initialized statics, allowing reads to be hoisted out of loops, for
instance.

Addresses part of #21951.

Note readonly instance fields cannot be given special treatment.
@AndyAyersMS
Copy link
Member

Preview of jit changes here: master...AndyAyersMS:InitializedReadonlyStaticInvariance.

Unfortunately this won't help ConcurrentQueue<T> or the examples in #5990. We can't give special treatment to for readonly instance members, because we don't have any way of knowing when an object is fully constructed, and even if we did, any method that read the field would have to handle both the fully constructed and not fully constructed cases. Furthermore, we haven't blocked the various avenues for modifying readonly instance fields after construction is done (eg reflection).

But we can give special treatment to readonly static fields: we can tell if the associated .cctor has finished executing, and we have now blocked the ability to update the field via reflection.

There seem to be a fair number of these static cases about, and initial diffs look promising (Note this used --cctors to try and increase chances that statics were initialized when jittting).

Total bytes of diff: -22601 (-0.06% of base)
    diff is an improvement.
Top file regressions by size (bytes):
         374 : Microsoft.CSharp.dasm (0.12% of base)
          48 : System.Runtime.Serialization.Formatters.dasm (0.05% of base)
          22 : NuGet.Configuration.dasm (0.04% of base)
          18 : Microsoft.DotNet.ProjectModel.dasm (0.01% of base)
          18 : System.Net.WebSockets.dasm (0.05% of base)
Top file improvements by size (bytes):
      -12225 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.43% of base)
       -1871 : System.Private.Xml.dasm (-0.05% of base)
       -1244 : System.Net.Sockets.dasm (-0.65% of base)
        -975 : System.Net.Security.dasm (-0.58% of base)
        -675 : System.Data.Common.dasm (-0.05% of base)
63 total files with size differences (57 improved, 6 regressed), 66 unchanged.
Top method regressions by size (bytes):
         403 (13.12% of base) : Microsoft.CSharp.dasm - ExpressionBinder:GetStandardAndLiftedBinopSignatures(ref,ref):bool:this
         112 ( 2.67% of base) : System.Data.Common.dasm - DataTable:SerializeTableSchema(ref,struct,bool):this
         105 (20.00% of base) : Microsoft.CodeAnalysis.dasm - SmallDictionary`2:RightComplex(ref):ref (5 base, 6 diff methods)
          96 ( 4.25% of base) : Microsoft.DotNet.ProjectModel.dasm - PatternBuilder:Build(ref):ref:this
          70 ( 9.15% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VisualBasicWarningStateMap:CreateWarningStateEntries(struct):ref
Top method improvements by size (bytes):
       -5982 (-5.44% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:EnumerateTemplates(ref,ref):this
       -2190 (-4.84% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(ref,ref):this
       -1494 (-4.53% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrPrivateTraceEventParser:EnumerateTemplates(ref,ref):this
       -1167 (-4.08% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrTraceEventParser:EnumerateTemplates(ref,ref):this
        -640 (-3.48% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - AspNetTraceEventParser:EnumerateTemplates(ref,ref):this
Top method regressions by size (percentage):
         105 (20.00% of base) : Microsoft.CodeAnalysis.dasm - SmallDictionary`2:RightComplex(ref):ref (5 base, 6 diff methods)
         403 (13.12% of base) : Microsoft.CSharp.dasm - ExpressionBinder:GetStandardAndLiftedBinopSignatures(ref,ref):bool:this
          70 ( 9.15% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VisualBasicWarningStateMap:CreateWarningStateEntries(struct):ref
          28 ( 7.45% of base) : Microsoft.CodeAnalysis.CSharp.dasm - StackOptimizerPass1:VisitArrayInitialization(ref):ref:this
           4 ( 7.14% of base) : System.Data.Common.dasm - Operators:Priority(int):int
Top method improvements by size (percentage):
        -113 (-35.53% of base) : System.Net.HttpListener.dasm - CookieTokenizer:TokenFromName(bool):int:this
        -113 (-35.53% of base) : System.Net.Primitives.dasm - CookieTokenizer:TokenFromName(bool):int:this
         -56 (-25.57% of base) : System.Data.Common.dasm - ColumnTypeConverter:ConvertFrom(ref,ref,ref):ref:this
         -27 (-22.88% of base) : System.Data.Common.dasm - DataStorage:IsSqlType(ref):bool
         -17 (-22.67% of base) : System.Net.HttpListener.dasm - HttpListenerResponse:CanSendResponseBody(int):bool
850 total methods with size differences (757 improved, 93 regressed), 192512 unchanged.

Method with the biggest percent diff shows repeated access to a readonly static array which can now be optimized as if the array ref was hoisted to a local.

https://github.com/dotnet/corefx/blob/e0ba7aa8026280ee3571179cc06431baf1dfaaac/src/Common/src/System/Net/CookieParser.cs#L472-L488

Is it my imagination, or are there two identical copies of the CookieTokenizer?

@AndyAyersMS
Copy link
Member

@benaadams do you think this is still interesting, even if it can't catch the cases you'd hoped?

@benaadams
Copy link
Member Author

benaadams commented Feb 22, 2019

Yes definitely; I actually thought readonly static arrays did work this way

@AndyAyersMS
Copy link
Member

Ok. Let me look into some of the regressions.

Running the above over the jitutils\fx set augmented with ASP.Net assemblies shows some hits here and there in ASP.Net. Need to add some sort of filter option to jit-analyze so I can summarize these better.

@benaadams
Copy link
Member Author

benaadams commented Feb 22, 2019

Two examples from Kestrel would be

private static readonly Tuple<ulong, ulong, HttpMethod, int>[] _knownMethods =
    new Tuple<ulong, ulong, HttpMethod, int>[17];

private static readonly string[] _methodNames = new string[9];

Though there are probably better examples; not sure they are used in a way that would skip bounds checking

@AndyAyersMS
Copy link
Member

Filtered view ...

Summary: (using filter '*AspNet*')
(Lower is better)

Total bytes of diff: -1212 (-0.04% of base)
    diff is an improvement.

Top file regressions by size (bytes):
          18 : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm (0.34% of base)
          12 : Microsoft.AspNetCore.Server.IIS.dasm (0.01% of base)
           5 : Microsoft.AspNetCore.Http.dasm (0.01% of base)

Top file improvements by size (bytes):
        -405 : Microsoft.AspNetCore.Diagnostics.dasm (-0.48% of base)
        -171 : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm (-0.05% of base)
        -128 : Microsoft.AspNetCore.Mvc.Razor.dasm (-0.21% of base)
        -111 : Microsoft.AspNetCore.Routing.dasm (-0.05% of base)
         -48 : Microsoft.AspNetCore.Identity.dasm (-0.09% of base)
         -48 : Microsoft.AspNetCore.Mvc.TagHelpers.dasm (-0.06% of base)
         -40 : Microsoft.AspNetCore.Cors.dasm (-0.24% of base)
         -32 : Microsoft.AspNetCore.Server.Kestrel.Core.dasm (-0.01% of base)
         -27 : Microsoft.AspNetCore.StaticFiles.dasm (-0.07% of base)
         -26 : Microsoft.AspNetCore.HttpOverrides.dasm (-0.20% of base)

30 total files with size differences (27 improved, 3 regressed), 44 unchanged.

Top method regressions by size (bytes):
          49 ( 3.65% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - ChunkingCookieManager:AppendResponseCookie(ref,ref,ref,ref):this
          17 ( 3.21% of base) : Microsoft.AspNetCore.Http.dasm - DefaultHttpResponse:Redirect(ref,bool):this
          17 ( 0.84% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - TagBuilder:MergeAttributes(ref,bool):this (5 methods)
          13 ( 1.59% of base) : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm - Win7Pbkdf2Provider:DeriveKey(ref,ref,int,int,int):ref:this
          12 ( 1.36% of base) : Microsoft.AspNetCore.Server.IIS.dasm - IISHttpContext:SetResponseHeaders():this
          10 ( 2.03% of base) : Microsoft.AspNetCore.Server.Kestrel.Core.dasm - Http2FrameWriter:Write100ContinueAsync(int):struct:this
           5 ( 0.66% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - CookieAuthenticationHandler:CloneTicket(ref,ref):ref:this
           5 ( 0.96% of base) : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm - Win8Pbkdf2Provider:PasswordToPbkdfKeyHandle(ref,ref,int):ref
           3 ( 0.62% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - NameAndIdProvider:CreateSanitizedId(ref,ref,ref):ref
           3 ( 0.68% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - NameAndIdProvider:GetFullHtmlFieldName(ref,ref):ref

Top method improvements by size (bytes):
        -266 (-0.80% of base) : Microsoft.AspNetCore.Diagnostics.dasm - <ExecuteAsync>d__0:MoveNext():this (3 methods)
        -139 (-1.87% of base) : Microsoft.AspNetCore.Diagnostics.dasm - <ExecuteAsync>d__1:MoveNext():this
         -48 (-6.00% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - Compiler`2:CompileFromSimpleMemberAccess(ref,ref):ref (5 methods)
         -48 (-3.81% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - Compiler`2:CompileForChainedMemberAccess(ref,ref):ref (5 methods)
         -43 (-5.99% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - ViewLocationCacheKey:GetHashCode():int:this
         -35 (-9.19% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - RazorViewEngineOptionsSetup:Configure(ref):this
         -34 (-0.94% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - <HandleSignInAsync>d__25:MoveNext():this
         -32 (-1.56% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - Compiler`2:CompileCapturedConstant(ref,ref):ref (5 methods)
         -27 (-0.94% of base) : Microsoft.AspNetCore.StaticFiles.dasm - HtmlDirectoryFormatter:GenerateContentAsync(ref,ref):ref:this
         -25 (-8.90% of base) : Microsoft.AspNetCore.Cors.dasm - CorsPolicy:.ctor():this

Top method regressions by size (percentage):
          49 ( 3.65% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - ChunkingCookieManager:AppendResponseCookie(ref,ref,ref,ref):this
          17 ( 3.21% of base) : Microsoft.AspNetCore.Http.dasm - DefaultHttpResponse:Redirect(ref,bool):this
          10 ( 2.03% of base) : Microsoft.AspNetCore.Server.Kestrel.Core.dasm - Http2FrameWriter:Write100ContinueAsync(int):struct:this
          13 ( 1.59% of base) : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm - Win7Pbkdf2Provider:DeriveKey(ref,ref,int,int,int):ref:this
          12 ( 1.36% of base) : Microsoft.AspNetCore.Server.IIS.dasm - IISHttpContext:SetResponseHeaders():this
           5 ( 0.96% of base) : Microsoft.AspNetCore.Cryptography.KeyDerivation.dasm - Win8Pbkdf2Provider:PasswordToPbkdfKeyHandle(ref,ref,int):ref
          17 ( 0.84% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - TagBuilder:MergeAttributes(ref,bool):this (5 methods)
           3 ( 0.68% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - NameAndIdProvider:GetFullHtmlFieldName(ref,ref):ref
           5 ( 0.66% of base) : Microsoft.AspNetCore.Authentication.Cookies.dasm - CookieAuthenticationHandler:CloneTicket(ref,ref):ref:this
           3 ( 0.62% of base) : Microsoft.AspNetCore.Mvc.ViewFeatures.dasm - NameAndIdProvider:CreateSanitizedId(ref,ref,ref):ref

Top method improvements by size (percentage):
         -17 (-22.67% of base) : Microsoft.AspNetCore.Server.HttpSys.dasm - Response:CanSendResponseBody(int):bool
         -18 (-21.95% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - UrlResolutionTagHelper:IsCharWhitespace(ushort):bool
         -17 (-17.71% of base) : Microsoft.AspNetCore.HttpOverrides.dasm - ForwardedHeadersMiddleware:TryValidateScheme(ref):bool:this
         -35 (-9.19% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - RazorViewEngineOptionsSetup:Configure(ref):this
         -25 (-8.90% of base) : Microsoft.AspNetCore.Cors.dasm - CorsPolicy:.ctor():this
         -15 (-8.67% of base) : Microsoft.AspNetCore.Cors.dasm - CorsResult:.ctor():this
         -25 (-8.42% of base) : Microsoft.AspNetCore.Mvc.Razor.dasm - RazorViewEngineOptions:.ctor():this
          -6 (-7.79% of base) : Microsoft.AspNetCore.Identity.dasm - <>c__1`2:<AddIdentity>b__1_0(ref):this
          -5 (-6.94% of base) : Microsoft.AspNetCore.Http.Connections.dasm - TimerAwaitable:OnCompleted(ref):this
          -5 (-6.94% of base) : Microsoft.AspNetCore.Server.Kestrel.Core.dasm - StreamCloseAwaitable:OnCompleted(ref):this

98 total methods with size differences (88 improved, 10 regressed), 19129 unchanged.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 22, 2019

Note readonly for ref classes is just top-level immutability.

We don't yet drill into readonly static objects where top level immutability implies existence of immutable sub-parts -- if we did this we could surface new constants (say array length or string length or possibly even string contents).

We don't know what to do with accesses to readonly static structs just yet. My prototype changes have introduced a flag we could potentially use (GTF_FLD_INVARIANT) but I have yet to plumb this through. Not sure if we have sufficient safeguards in the system to prevent fields of a readonly static struct from being modified.

@benaadams
Copy link
Member Author

Summary: (using filter '*AspNet*')

Very nice :)

@AndyAyersMS
Copy link
Member

In GetStandardAndLiftedBinopSignatures the JIT introduces a CSE for the load of s_binopSignatures (readyonly static array). But this method already has a lot of register pressure so keeping that value in a register causes other spills.

;; before
;  V05 loc2         [V05,T01] ( 21, 46   )     ref  ->  r13         class-hnd

;; after
;  V05 loc2         [V05,T01] ( 21, 46   )     ref  ->  [rsp+0xF0]   class-hnd
;  V86 cse1         [V86,T58] (  3,  6   )     ref  ->  r13         "ValNumCSE"

In the after version, r13 is only referred to in the preheader and right at the top of the loop. The preheader reference is just for a bounds check. If not spilled r13 must stay live across the whole body, and the loop body is pretty large. So the lack of r13 causes a lot of turmoil.

Not sure why LSRA doesn't spill r13 -- but as a guess, the allocator probably picks shorter lifetimes as spill victims. Also if we did spill, becasuse there are quite a few continue statements in the body, there would be a large number of reload sites, as (AFAIK) the jit does not coalesce back edges in loops like this.

Chances are good that if we implement the follow-on optimization to propagate the array length as a jit-time constant then we'd fix this as the array ref would only need to be live within a the loop iteration, and similar may hold true of many of these "for loop over elements of a static readonly array" cases.

Something similar would happen if the user decided to manually CSE the static array by creating a local. So in addition to catching the explict references it would be good if we could propagate info about the reference to any local copies so we could better optimize cases that users have already tried to optimize.

AndyAyersMS referenced this issue in AndyAyersMS/coreclr Feb 23, 2019
In particular this should do interesting things for ref typed readonly
initialized statics, allowing reads to be hoisted out of loops, for
instance.

Addresses part of #21951.

Note readonly instance fields cannot be given special treatment.
@benaadams
Copy link
Member Author

Chances are good that if we implement the follow-on optimization to propagate the array length as a jit-time constant then we'd fix this as...

There are a few readonly static array accesses that are done via enum that I thought wouldn't be effected by this; however if it is a propagated const length, then it could potentially also skip them there? (e.g. recognise in range)

@benaadams
Copy link
Member Author

we can give special treatment to readonly static fields: we can tell if the associated .cctor has finished executing, and we have now blocked the ability to update the field via reflection.

Presumably any first pass blocks (if can't tell .cctor has finished, e.g. crossgen and first encounter); will get picked up by second pass in Tier1?

@AndyAyersMS
Copy link
Member

There are a few readonly static array accesses that are done via enum that I thought wouldn't be effected by this; however if it is a propagated const length, then it could potentially also skip them there? (e.g. recognise in range)

I would certainly hope so.

Presumably any first pass blocks (if can't tell .cctor has finished, e.g. crossgen and first encounter); will get picked up by second pass in Tier1?

Right, at Tier1 there's a pretty good chance all the relevant .cctors have completed. So while the optimization is not going to be tiering specific, it should happen more frequently with tiering.

@AndyAyersMS
Copy link
Member

As for the optimization itself -- I am going to add in the array length bit sometime soon; I think that plus the above should come out looking pretty solid.

But we're also trying to wrap things up for the release.

So am going to mark this as future. If the ongoing work looks encouraging, we can still consider it for 3.0.

@Thealexbarney
Copy link

Just leaving some performance numbers from using AES-NI with .NET Core 3.0. The AES round keys are stored in a private readonly array field in my AES class.

Accessing the array directly results in an encryption speed of 4.2 GB/s.

Creating a local reference to the array outside the hot loop and then accessing the last element of the array before accessing any other elements removes the remaining bounds checks and results in an encryption speed of 5.7 GB/s.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@jakobbotsch
Copy link
Member

@EgorBo maybe your recent changes fixed this?

@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 24, 2023
@EgorBo
Copy link
Member

EgorBo commented May 6, 2023

@EgorBo maybe your recent changes fixed this?

Yes, we now fold nullchecks for static readonly

@EgorBo EgorBo closed this as completed May 6, 2023
@dotnet dotnet locked as resolved and limited conversation to collaborators Jun 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization
Projects
None yet
Development

No branches or pull requests

7 participants