Extending optimized JIT helpers to Buffer.MemoryCopy (#9786) · dotnet/coreclr@c6372c5

glenn-slayden · 2018-02-11T23:02:46Z

Just curious why all these improvements to Buffer.Memmove (plus more recent work) aren't deployed in the IL cpblk instruction instead? It appears that this commit c6372c5 removes a comment to the effect that cpblk is where all this effort naturally belongs (that is, in case the proliferation of conditional #if code here wasn't already somewhat suggestive), and I agree.

I know everyone wants there to be one central nexus for obtaining an optimal memmove, with proper alignment, chunking, tuning, and everything else done right; shouldn't cpblk be that place?

I also note that this commit removes all address alignment behavior. Now while it's true that the alignment code in ea9bee5 was a bit meek in only considering the destination address--at the possible expense of source alignment:

coreclr/src/mscorlib/src/System/Buffer.cs

Lines 474 to 503 in ea9bee5

    
                       if (((int)dest & 3) != 0) 
        
                       { 
        
                           if (((int)dest & 1) != 0) 
        
                           { 
        
                               *(dest + i) = *(src + i); 
        
                               i += 1; 
        
                               if (((int)dest & 2) != 0) 
        
                                   goto IntAligned; 
        
                           } 
        
                           *(short*)(dest + i) = *(short*)(src + i); 
        
                           i += 2; 
        
                       } 
        
                   IntAligned: 
        
           #if BIT64 
        
                       // On 64-bit IntPtr.Size == 8, so we want to advance to the next 8-aligned address. If 
        
                       // (int)dest % 8 is 0, 5, 6, or 7, we will already have advanced by 0, 3, 2, or 1 
        
                       // bytes to the next aligned address (respectively), so do nothing. On the other hand, 
        
                       // if it is 1, 2, 3, or 4 we will want to copy-and-advance another 4 bytes until 
        
                       // we're aligned. 
        
                       // The thing 1, 2, 3, and 4 have in common that the others don't is that if you 
        
                       // subtract one from them, their 3rd lsb will not be set. Hence, the below check. 
        
                       if ((((int)dest - 1) & 4) == 0) 
        
                       { 
        
                           *(int*)(dest + i) = *(int*)(src + i); 
        
                           i += 4; 
        
                       } 
        
           #endif // BIT64

...at least this was something, and a reasonably motivated approach since misaligned store penalties are typically higher than for read.

But in this commit there's neither. Did the experiments in #9786 empirically determine that physically aligning these operations is always a waste of time? For severe misalignment between source/target, it would be possible to write a split-access loop, albeit complex, that marshals quad-aligned reads to quad-aligned writes. Was this type of thing evaluated before taking the decision to abandon all efforts at alignment?

Finally, note that by insisting on alignment to the destination address exclusively, the alignment code (that was removed in this commit, see above) may have been ruining a naturally ideal source alignment, with the overall result of making things worse. This could have unfairly penalized test results for the existing code, when it would be a trivial matter to only do that adjustment when src/dest are jointly aligned to some appropriate level, e.g., wrap the above code with:

if ((((int)src ^ (int)dest) & 0x7) == 0)        // (or some other alignment value)
{
    // ...

As far as I can tell, there has never been code in buffer.cs that considers the source and destination physical alignment jointly, in order to then proceed sensibly on the basis of 3 broadly defined cases:

Input src/dst physical addresses already jointly aligned (or len less than max align size), proceed as-is;
Joint alignment can be improved (to some degree) prior to starting the gang-operation (code removed in c6372c5)
Antagonistic src/dst alignment detected; either proceed as-is, or, if worthwhile, use split-access loop to fully remediate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

1 comment on commit `c6372c5`

glenn-slayden commented on `c6372c5` Feb 11, 2018 •

edited

Commit

There are no files selected for viewing

1 comment on commit c6372c5

glenn-slayden commented on c6372c5 Feb 11, 2018 • edited

Choose a reason for hiding this comment

1 comment on commit `c6372c5`

glenn-slayden commented on `c6372c5` Feb 11, 2018 •

edited