Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RyuJIT] Recognize and optimize constant set membership tests #12477

Open
mikedn opened this issue Jun 27, 2017 · 5 comments

Comments

@mikedn
Copy link
Contributor

commented Jun 27, 2017

It would be useful if the JIT recognized tests like:

  • if ('0' <= c && c <= '9') - can be reduced to a single branch + compare + subtract
  • if (c == ' ' || c == '\t' || c == '\r' || c == '\n') - can be reduced to 2 branches + compare + bit test (BT instruction)

The second variant is sometimes implemented using a switch/jump table (in which case is much more simpler to recognize):

switch (c) {
case ' ':
case '\t':
case '\r':
case '\n':
    return true;
default:
    return false;
}

The reduction in number of conditional branches to 1 or 2 can make other optimizations more effective - if-conversion in particular.

Similar issue in the Roslyn repository: dotnet/roslyn#20375

The first variant can be implemented by managed compilers but getting good code for the second is problematic if the target arch bitness is not known. Also, the BT instruction is not available in IL so it has to be simulated with shift/and/compare which the JIT has to recognize.

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

@mikedn

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2017

An experimental implementation shows a ~2700 bytes improvement for the first variant and a ~170 bytes improvement for the second variant. A proper implementation may do better but I don't expect it to be significantly better.

Example codegen for the first variant:

; before
       cmp      eax, 16
       jl       SHORT G_M38823_IG03
       cmp      eax, 18
       jle      SHORT G_M38823_IG05
; after
       lea      edx, [rax-16]
       cmp      edx, 2
       jbe      SHORT G_M38823_IG05

Example codegen for the second variant:

; before
       cmp      ecx, 0x1F41
       je       G_M36321_IG13
       cmp      ecx, 0x1F42
       je       G_M36321_IG13
       cmp      ecx, 0x1F49
       je       G_M36321_IG13
; after
       mov      eax, ecx
       sub      eax, 0x1F41
       cmp      eax, 8
       ja       G_M36321_IG05
       mov      r9d, 259
       bt       r9, rcx
       jb       G_M36321_IG14
@BruceForstall

This comment has been minimized.

Copy link
Member

commented Jun 27, 2017

@mikedn

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2017

Switch to BT conversion yields a ~3700 bytes improvement. Example codegen:

; before
       cmp      eax, 3
       ja       SHORT G_M9431_IG07
       mov      eax, eax
       lea      rdx, [reloc @RWD00]
       mov      edx, dword ptr [rdx+4*rax]
       lea      rcx, G_M9431_IG02
       add      rdx, rcx
       jmp      rdx
; after
       cmp      eax, 3
       ja       SHORT G_M9431_IG07
       mov      edx, 3
       bt       edx, eax
       jb       SHORT G_M9431_IG05

It might be better but Roslyn seems to change rather quickly from switch table to binary search and that is much more difficult to recognize.

This transform is the cheapest of all in terms of compilation throughput since it's just a matter of looking at the number of switch arms.

@mikedn

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

FWIW I've updated my old experiment (mikedn@991e2a0) and run the diffs again:

Total bytes of diff: -3905 (-0.01% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          33 : System.Runtime.Numerics.dasm (0.05% of base)
           4 : System.IO.Compression.dasm (0.00% of base)
           4 : System.ObjectModel.dasm (0.02% of base)
           3 : System.IO.IsolatedStorage.dasm (0.02% of base)
           2 : System.Runtime.Serialization.Formatters.dasm (0.00% of base)
Top file improvements by size (bytes):
       -1437 : System.Private.CoreLib.dasm (-0.04% of base)
        -318 : System.Private.Xml.dasm (-0.01% of base)
        -248 : Newtonsoft.Json.dasm (-0.04% of base)
        -190 : System.Data.Common.dasm (-0.02% of base)
        -178 : System.Text.Encoding.CodePages.dasm (-0.25% of base)
66 total files with size differences (61 improved, 5 regressed), 63 unchanged.
Top method regressions by size (bytes):
          68 ( 1.84% of base) : System.Runtime.Numerics.dasm - Number:ParseNumber(byref,long,int,byref,ref,ref,bool):bool
          41 ( 4.20% of base) : System.Net.Primitives.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
          41 ( 4.20% of base) : System.Private.Uri.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
          19 (11.24% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Scanner:ScanMultilineTrivia():struct:this
          19 ( 2.18% of base) : System.Private.CoreLib.dasm - Convert:TryFromBase64Chars(struct,struct,byref):bool
Top method improvements by size (bytes):
        -113 (-1.66% of base) : System.Private.CoreLib.dasm - Number:NumberToStringFormat(byref,byref,struct,ref)
         -62 (-12.40% of base) : System.Private.CoreLib.dasm - DateTime:TryCreate(int,int,int,int,int,int,int,byref):bool
         -60 (-1.54% of base) : System.Private.Xml.dasm - XmlSchemaInference:InferSimpleType(ref,byref):int
         -59 (-4.20% of base) : System.Security.AccessControl.dasm - GenericAce:CreateFromBinaryForm(ref,int):ref
         -51 (-2.17% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendFormatHelper(ref,ref,struct):ref:this
Top method regressions by size (percentage):
           6 (15.38% of base) : Microsoft.CodeAnalysis.dasm - AssemblyIdentity:IsWhiteSpace(ushort):bool
           6 (15.38% of base) : System.Private.CoreLib.dasm - Convert:IsSpace(ushort):bool
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlBaseWriter:IsWhitespace(ushort):bool:this
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - EncodingStreamWrapper:IsWhitespace(ubyte):bool
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlJsonReader:IsWhitespace(ubyte):bool
Top method improvements by size (percentage):
         -49 (-53.85% of base) : System.Net.HttpListener.dasm - HttpListener:IsClientFault(int):bool
         -34 (-44.74% of base) : System.Data.Common.dasm - ExpressionNode:IsIntegerSql(int):bool
         -35 (-37.23% of base) : System.Private.Uri.dasm - Uri:IsBadFileSystemCharacter(ushort):bool:this
         -20 (-35.71% of base) : System.Data.Common.dasm - ExpressionNode:IsInteger(int):bool
         -20 (-35.71% of base) : System.Private.Xml.dasm - XmlEntity:IsValidChildType(int):bool:this
756 total methods with size differences (644 improved, 112 regressed), 145991 unchanged.

Most of it comes from recognizing if ('0' <= c && c <= '9'). Some 700 bytes of diff come from recognizing if (c == ' ' || c == '\t' || c == '\r' || c == '\n').

The experiment was mostly written to get an idea how many opportunities for such transforms exist. It has various problems:

  • bool b = ('0' <= c && c <= '9') isn't recognized because the second operand of && is just c <= '9' rather than c <= '9' ? true : false. The current implementation only looks at compares involved in flow control.
  • ('0' <= c && c <= '9') only works with constant bounds. Something like (0 <= c && c < x) should also work.
  • (c == ' ' || c == '\t' || c == '\r' || c == '\n') generates its own range check (since the bitmask has to have at most 64 bits) while the code may already contain a suitable range check (e.g. (c < ' ' && (c == ' ' || c == '\t' || c == '\r' || c == '\n')))
  • (c == ' ' || c == '\t' || ...) also fails if the range exceeds 64 and the checks aren't suitably ordered.
  • It's not clear where's the best place to do these transforms. I've done this after lowering mostly because this experiment was done together with if-conversion. It also makes use of the BT instruction (which is available only in LIR) but that could be replaced by RSZ and AND in HIR.

@AndyAyersMS Any opinions about implementing such optimizations in the JIT?

@AndyAyersMS

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

I think doing it relatively late makes sense.

Running it earlier might reduce time and space needed by some analyses, but it fires pretty rarely, and the transformation doesn't seem to unblock any higher-level opts.

cc @dotnet/jit-contrib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.