Many such dead/unnecessary locals occur in Regexdna bechmark hot method System.dll!System.Text.RegularExpressions.RegexInterpreter.Go() which accounts for nearly 50% of the execution time of the benchmark. Go() is a big method with big switch statement inside a while-loop. It has many calls of the following form:
C#
this.SetOperator(this.runcodes[this.runcodepos]);
IR:
[002620] ------------ * stmtExpr void
[002618] I-CXG------- \--* call void RegexInterpreter.SetOperator
[002613] ------------ this in rcx +--* lclVar ref V00 this
[002616] ------------ | /--* const int 0
[002617] ---XG------- arg1 \--* [] int
[002615] ---XG------- \--* field ref runcodes
[002614] ------------ \--* lclVar ref V00 this
Morphing of arg1 = this.runcodes[this.runcodePos] introduces the following comma tree that has an assignment
GT_COMMA(GT_COMMA(tmp = &(this.runcodes), bounds-chk on runcodes), GT_IND to read array elem)
Since this arg has an assignment, ArgsComplete() routine marks all the previous args needing to be evaluated into temps. In the above case "this" arg gets marked to be evaluated to a temp. The new IR would look as follows at a high-level
tmpForThis = this
call RegexInterpreter.SetOperator
arg1 = GT_COMMA(GT_COMMA(tmp = &(this.runcodes), bounds-chk on runcodes), GT_IND to read array elem)
arg0 = tmpForThis
Later CopyProp replaces tmpForThis with actual this (V00). Since Go() method has nearly 700+ locals/temps that exceeds tracked lcl limit of 512, 'tmpForThis' is not treated as tracked due to its low refCnt (=1). As a result, the dead assignment to tmpForThis doesn't get eliminated.
As an experiment, I have commented out the logic in ArgsComplete() that marks all the previous args needing to be evaluated into temp if any of the args has an assignment. The resulting Go() method has only 706 locals and RegexDna execution perf beats Jit64 by 5%. That is the original Go() method has at least 60 dead/unecessary locals along with their dead assignments. These unnecessary temps also occupy stack frame space and zero initialized in prolog.
category:cq
theme:morph
skill-level:expert
cost:medium
Many such dead/unnecessary locals occur in Regexdna bechmark hot method
System.dll!System.Text.RegularExpressions.RegexInterpreter.Go()which accounts for nearly 50% of the execution time of the benchmark.Go()is a big method with big switch statement inside a while-loop. It has many calls of the following form:Morphing of
arg1 = this.runcodes[this.runcodePos]introduces the following comma tree that has an assignmentGT_COMMA(GT_COMMA(tmp = &(this.runcodes), bounds-chk on runcodes), GT_IND to read array elem)Since this arg has an assignment, ArgsComplete() routine marks all the previous args needing to be evaluated into temps. In the above case "this" arg gets marked to be evaluated to a temp. The new IR would look as follows at a high-level
Later CopyProp replaces
tmpForThiswith actualthis (V00). SinceGo()method has nearly 700+ locals/temps that exceeds tracked lcl limit of 512, 'tmpForThis' is not treated as tracked due to its lowrefCnt (=1). As a result, the dead assignment totmpForThisdoesn't get eliminated.As an experiment, I have commented out the logic in ArgsComplete() that marks all the previous args needing to be evaluated into temp if any of the args has an assignment. The resulting
Go()method has only 706 locals and RegexDna execution perf beats Jit64 by 5%. That is the originalGo()method has at least 60 dead/unecessary locals along with their dead assignments. These unnecessary temps also occupy stack frame space and zero initialized in prolog.category:cq
theme:morph
skill-level:expert
cost:medium