Multidimensional array performance #5481

colgreen · 2016-04-03T13:40:33Z

There appears to be a long running issue with the performance of multidimensional arrays compared to a comparable jagged array. Are there any plans to look into this?

Some existing discussion here:

http://stackoverflow.com/questions/468832/why-are-multi-dimensional-arrays-in-net-slower-than-normal-arrays
http://stackoverflow.com/questions/5476000/c-multi-dimensional-array-vs-one-dimensional

I ran a quick test using .Net 4.6.1 in x64 mode (i.e. using ryuJIT), and the issue appears to still be present.

category:cq
theme:md-arrays
skill-level:expert
cost:large

cmckinsey · 2016-04-11T18:41:44Z

Comparing the performance of various array representations is complex and depends on a lot of workload specifics. I won't attempt to spend time to cover it in this thread. But for this specific example, each implementation should each largely be able to perform the same speed for the specific dimensions sizes.

I'm going to use this issue to cover what is missing work in RyuJIT handling rectangular arrays to match .NET 4.5.3 (JIT64) performance for the sum(double[,], d) method.

Timings with RyuJIT:
sum took 00:00:01.7960414
sum took 00:00:03.2205602 // over 2x slower than vector version
sum took 00:00:01.7834780
sum took 00:00:01.7827718
sum took 00:00:03.2271725 // over 2x slower than vector version
sum took 00:00:01.7818470

Timings with JIT64:
sum took 00:00:01.4961448
sum took 00:00:01.6203301
sum took 00:00:01.5992218
sum took 00:00:01.4797479
sum took 00:00:01.6191445
sum took 00:00:01.5965326

The main reason for the deficiency is the failure of the d[I,..] calculation to be hoisted out of the inner loop, the array reference only varies on the inner-most dimension

G_M11570_IG04:        ; offs=000039H, size=0035H, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, isz

IN0010: 000039 mov      r8d, edx
IN0011: 00003C sub      r8d, dword ptr [rsi+24]
IN0012: 000040 cmp      r8d, dword ptr [rsi+16]
IN0013: 000044 jae      SHORT G_M11570_IG08
IN0014: 000046 mov      r9d, ecx
IN0015: 000049 sub      r9d, dword ptr [rsi+28]
IN0016: 00004D cmp      r9d, dword ptr [rsi+20]
IN0017: 000051 jae      SHORT G_M11570_IG08
IN0018: 000053 mov      r10d, dword ptr [rsi+20]
IN0019: 000057 imul     r10, r8
IN001a: 00005B mov      r8, r9
IN001b: 00005E add      r8, r10
IN001c: 000061 addsd    xmm6, qword ptr [rsi+8*r8+32]
IN001d: 000068 inc      ecx
IN001e: 00006A cmp      ecx, eax
IN001f: 00006C jl       SHORT G_M11570_IG04

The internal representation of the rectangular array is done through the arrMD&[] operator and is not decomposed until the Lower phase which runs after the global and loop optimizers. It is only some of these individual addressing calculations that are invariant, but not the entire operator hence it cannot be fully optimized unless it is lowered earlier.

From the dumps you can see this happening during the Lowering phase in lowerxarch:


Lowering ArrElem
============
N001 (  1,  1) [000042] ------------             /--*  lclVar    ref    V00 arg0         u:2 $80
N002 (  1,  1) [000043] ------------             +--*  lclVar    int    V04 loc3         u:4 $380
N003 (  1,  1) [000044] ------------             +--*  lclVar    int    V05 loc4         u:4 $381
N004 ( 13,  9) [000045] ---X--------             *  arrMD&[,] byref  $3c0

Results of lowering ArrElem:
     (  1,  1) [000140] ------------                   /--*  const     long   0
N001 (  1,  1) [000042] ------------                   |  /--*  lclVar    ref    V00 arg0         u:2 $80
N002 (  1,  1) [000043] ------------                   |  +--*  lclVar    int    V04 loc3         u:4 $380
     (  3,  3) [000141] ---X--------                   +--*  arrMDIdx[i, ] int   
     (  1,  1) [000142] ------------                   +--*  lclVar    ref    V00 arg0         
     (  5,  5) [000143] ---X--------                /--*  arrMDOffs[i, ] long  
     (  1,  1) [000144] ------------                |  /--*  lclVar    ref    V00 arg0         
N003 (  1,  1) [000044] ------------                |  +--*  lclVar    int    V05 loc4         u:4 $381
     (  3,  3) [000145] ---X--------                +--*  arrMDIdx[*,j] int   
     (  1,  1) [000146] ------------                +--*  lclVar    ref    V00 arg0         
     (  9,  9) [000147] ---X--------             /--*  arrMDOffs[*,j] long  
     (  1,  1) [000148] ------------             +--*  lclVar    ref    V00 arg0         
     ( 11, 11) [000149] ---X----R---             *  lea(b+(i*8)+32) byref 

Resulting statement:
     ( 22, 20) [000050] ------------             *  stmtExpr  void  (top level) (IL 0x023...0x02E)
     (  1,  1) [000140] ------------             |                 /--*  const     long   0
N001 (  1,  1) [000042] ------------             |                 |  /--*  lclVar    ref    V00 arg0         u:2 $80
N002 (  1,  1) [000043] ------------             |                 |  +--*  lclVar    int    V04 loc3         u:4 $380
     (  3,  3) [000141] ---X--------             |                 +--*  arrMDIdx[i, ] int   
     (  1,  1) [000142] ------------             |                 +--*  lclVar    ref    V00 arg0         
     (  5,  5) [000143] ---X--------             |              /--*  arrMDOffs[i, ] long  
     (  1,  1) [000144] ------------             |              |  /--*  lclVar    ref    V00 arg0         
N003 (  1,  1) [000044] ------------             |              |  +--*  lclVar    int    V05 loc4         u:4 $381
     (  3,  3) [000145] ---X--------             |              +--*  arrMDIdx[*,j] int   
     (  1,  1) [000146] ------------             |              +--*  lclVar    ref    V00 arg0         
     (  9,  9) [000147] ---X--------             |           /--*  arrMDOffs[*,j] long  
     (  1,  1) [000148] ------------             |           +--*  lclVar    ref    V00 arg0         
     ( 11, 11) [000149] ---X----R---             |        /--*  lea(b+(i*8)+32) byref 
N005 ( 15, 13) [000046] ---X--------             |     /--*  indir     double $141
N006 (  1,  2) [000041] ------------             |     +--*  lclVar    double V01 loc0         u:7 (last use) $341
N007 ( 21, 19) [000047] ---X--------             |  /--*  +         double $302
N009 ( 22, 20) [000049] DA-X--------             \--*  st.lclVar double V01 loc0         d:8

The work item here is to move the rectangular array lowering to run before the optimizer so that the invariant components of the array calculation are exposed to invariant code motion. There may be additional work in invariant code motion to properly hoist the arrMDIdx operator correctly.

colgreen · 2019-07-13T07:42:16Z

Is this still an issue in dotnet core 3? If so is there a process I can follow to get it triaged/tagged appropriately? Thanks.

djsell · 2019-08-23T19:58:54Z

The benchmark is inaccurate for Release mode. What is actually slow is the call to multidimensional.GetLength(i). If you change that code to use the constant size 1024x1024, multidimensional is on par with the others (in .NET Core 2.2.6 on my MBP).

Release with d.GetLength(0) and d.GetLength(1)

sum took 00:00:01.3568572
sum took 00:00:03.0242274
sum took 00:00:01.4116762
sum took 00:00:01.3788118
sum took 00:00:03.0524923
sum took 00:00:01.3533231

Release with constant 1024x1024

sum took 00:00:01.3522575
sum took 00:00:01.3834065
sum took 00:00:01.3678978
sum took 00:00:01.3548391
sum took 00:00:01.3873318
sum took 00:00:01.3667302

Multidimensional array access is still considerably slower in Debug mode even with that change, though.

BruceForstall · 2022-07-05T21:23:08Z

This is addressed in #70271. There are follow-up items there to continue to improve MD array performance.

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

AndyAyersMS mentioned this issue May 6, 2020

Codegen for multi-dimensional arrays is poor #35293

Closed

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

gfoidl mentioned this issue Nov 9, 2020

Optimize HTTP2 HPack huffman decoding #43603

Merged

BruceForstall mentioned this issue Oct 12, 2021

JIT: inefficient multidimensional array access vis Jit32 (and in general) #8569

Closed

BruceForstall mentioned this issue Oct 22, 2021

RyuJIT: Improving multi-dimensional array accesses #60785

Closed

BruceForstall closed this as completed Jul 5, 2022

ghost locked as resolved and limited conversation to collaborators Aug 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multidimensional array performance #5481

Multidimensional array performance #5481

colgreen commented Apr 3, 2016

cmckinsey commented Apr 11, 2016

colgreen commented Jul 13, 2019

djsell commented Aug 23, 2019

BruceForstall commented Jul 5, 2022

Multidimensional array performance #5481

Multidimensional array performance #5481

Comments

colgreen commented Apr 3, 2016

cmckinsey commented Apr 11, 2016

colgreen commented Jul 13, 2019

djsell commented Aug 23, 2019

BruceForstall commented Jul 5, 2022