Skip to content

RyuJIT: RegexDNA bechmark: morphing of args could leave dead/unnecessary temps behind when method exceeds tracked local limit #7310

Description

@sivarv

Many such dead/unnecessary locals occur in Regexdna bechmark hot method System.dll!System.Text.RegularExpressions.RegexInterpreter.Go() which accounts for nearly 50% of the execution time of the benchmark. Go() is a big method with big switch statement inside a while-loop. It has many calls of the following form:

C#
this.SetOperator(this.runcodes[this.runcodepos]);

IR:
               [002620] ------------             *  stmtExpr  void  
               [002618] I-CXG-------             \--*  call      void   RegexInterpreter.SetOperator 
               [002613] ------------ this in rcx    +--*  lclVar    ref    V00 this         
               [002616] ------------                |  /--*  const     int    0
               [002617] ---XG------- arg1           \--*  []        int   
               [002615] ---XG-------                   \--*  field     ref    runcodes
               [002614] ------------                      \--*  lclVar    ref    V00 this

Morphing of arg1 = this.runcodes[this.runcodePos] introduces the following comma tree that has an assignment

GT_COMMA(GT_COMMA(tmp = &(this.runcodes), bounds-chk on runcodes), GT_IND to read array elem)

Since this arg has an assignment, ArgsComplete() routine marks all the previous args needing to be evaluated into temps. In the above case "this" arg gets marked to be evaluated to a temp. The new IR would look as follows at a high-level

tmpForThis = this
call RegexInterpreter.SetOperator 
   arg1 = GT_COMMA(GT_COMMA(tmp = &(this.runcodes), bounds-chk on runcodes),  GT_IND to read array elem)
  arg0 = tmpForThis

Later CopyProp replaces tmpForThis with actual this (V00). Since Go() method has nearly 700+ locals/temps that exceeds tracked lcl limit of 512, 'tmpForThis' is not treated as tracked due to its low refCnt (=1). As a result, the dead assignment to tmpForThis doesn't get eliminated.

As an experiment, I have commented out the logic in ArgsComplete() that marks all the previous args needing to be evaluated into temp if any of the args has an assignment. The resulting Go() method has only 706 locals and RegexDna execution perf beats Jit64 by 5%. That is the original Go() method has at least 60 dead/unecessary locals along with their dead assignments. These unnecessary temps also occupy stack frame space and zero initialized in prolog.

category:cq
theme:morph
skill-level:expert
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issue

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions