RyuJIT drops unused volatile reads. #6280

pgavlin · 2016-07-07T21:26:12Z

If the following program is compiled with optimizations enabled (e.g. the /o+ flag is provided on the csc command line), RyuJIT will drop the volatile read of a:

using System.Runtime.CompilerServices;

static class C
{
    static volatile int a;
    static int b;

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void LightweightStoreFence()
    {
        a = 1;
        var dummy = a;
        b = 2;
    }

    static void Main()
    {
        LightweightStoreFence();
    }
}�

The IL as reported by a JIT dump is:

IL_0000  17                ldc.i4.1    
IL_0001  fe 13             volatile.   
IL_0003  80 01 00 00 04    stsfld       0x4000001
IL_0008  fe 13             volatile.   
IL_000a  7e 01 00 00 04    ldsfld       0x4000001
IL_000f  26                pop         
IL_0010  18                ldc.i4.2    
IL_0011  80 02 00 00 04    stsfld       0x4000002
IL_0016  2a                ret         �

The generated assembly is:

G_M23768_IG01:        ; func=00, offs=000000H, size=0000H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
G_M23768_IG02:        ; offs=000000H, size=0014H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref

IN0001: 000000 mov      dword ptr [reloc classVar[0xd2155a38]], 1
IN0002: 00000A mov      dword ptr [reloc classVar[0xd2155a58]], 2

G_M23768_IG03:        ; offs=000014H, size=0001H, epilog, nogc, emitadd

IN0003: 000014 ret      �

From the dump, it appears that the importer drops the volatile ldsfld on the floor when it imports the pop instruction. This is not correct per the ECMA spec, which states that volatile operations may not be removed or coalesced.

Thanks to @omariom for reporting the original issue in #6257.

category:correctness
theme:volatile
skill-level:expert
cost:medium

The text was updated successfully, but these errors were encountered:

omariom · 2016-07-07T21:38:24Z

It doesn't happen if the result of volatile read is put in an observable location.

benaadams · 2016-07-07T22:41:08Z

If it happens in stack space, did it really happen? (Understand the motivation is for a barrier)

GSPP · 2016-07-11T12:09:54Z

Is it ever observable if an unused volatile read is removed? ECMA surely does not say they can't be dropped because it's not detectable that they are dropped. Coalescing also is fine because it's not detectable. It just looks like the two reads happened without any interleaved writes which was possible before.

So even if ECMA says that that must refer to the abstract machine, not to the generated x86. Under the as-if rule this is OK.

mikedn · 2016-07-11T12:37:22Z

ECMA surely does not say they can't be dropped because it's not detectable that they are dropped

The rationale provided by ECMA references mapped hardware registers and as such it's pretty clear that there's no such thing an "unused volatile read".

GSPP · 2016-07-11T14:20:28Z

I did not know that ECMA assigned hardware port semantics to volatile! This seems like a bad hybrid of C++ volatile and memory barriers. ECMA should be changed to not support hardware registers in memory and only assign barrier semantics to volatile. .NET is not being used in kernel mode. That's unrealistic.

Indeed, I.12.6.7 makes this optimization illegal. I think this is a design defect in ECMA. This should be changed and the JIT should ignore that since adhering to that paragraph can't possibly do any good.

This is not a breaking change since hardware registers never were accessible to user mode. Nobody could have made use of this rule.

mikedn · 2016-07-11T14:48:21Z

This seems like a bad hybrid of C++ volatile and memory barriers

More or less. It's what VC++ does with volatile and is an extension of the C++ standard.

ECMA should be changed to not support hardware registers in memory and only assign barrier semantics to volatile.

I'm not sure what actual problem that would solve.

GSPP · 2016-07-11T15:06:55Z

It would allow the JIT to coalesce volatile accesses. Currently, that is illegal (although I think the JIT should just not comply).

C++ compilers already do this for std::atomic.

pgavlin · 2016-07-11T15:51:07Z

Even if the ECMA memory model only dealt with barrier semantics, the removal of a volatile read is potentially observable. For example, given the code in the original post, if the load is removed, the compiler is free to reorder the non-volatile write to b with respect to the volatile write to a that precedes it. If another thread were to read b before reading a (ensuring ordering using volatile reads), that thread may observe the reordered writes. The generated code does not exhibit that sort of code motion, but it is theoretically possible (though unlikely given the types of optimizations currently performed by the JIT).

GSPP · 2016-07-11T15:54:37Z

Yes, the barrier must stay but the load can go. For example, it can be a dummy load from a stack location. Ideally, the compiler had an internal representation for a partial barrier. Code gen could then pick the cheapest instructions for that.

pgavlin · 2016-07-11T17:47:28Z

Yes, the barrier must stay but the load can go. For example, it can be a dummy load from a stack location. Ideally, the compiler had an internal representation for a partial barrier. Code gen could then pick the cheapest instructions for that.

Yes, that's a good point.

This is not a breaking change since hardware registers never were accessible to user mode. Nobody could have made use of this rule.

I'm not terribly familiar with the intricacies of user-mode drivers, but this statement only holds so long as there is no way for a user-mode program to access device memory directly. If such memory has ever been accessible to a user-mode .NET program (e.g. via a call to MmMapLockedPagesSpecifyCache in a cooperating kernel-mode driver), then allowing the removal of unused volatile reads may break programs that expect them to be side-effecting.

mikedn · 2016-07-12T06:44:26Z

It would allow the JIT to coalesce volatile accesses. Currently, that is illegal (although I think the JIT should just not comply). C++ compilers already do this for std::atomic.

Coalesce as in "eliminate redundant loads"? Again, I'm not sure what actual problem that would solve.

Also, I don't remember seeing a C++ compiler doing that. Which is not surprising considering that it's not exactly trivial to do so correctly.

GSPP · 2016-07-12T10:16:57Z

Here are some resources that I liked:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html
https://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations
There was an article demonstrating that GCC optimized a loop using std::atomic. I'm pretty sure it was on http://preshing.com/ but I can't find it right now.

Other compilers are starting to build understanding about (partial) barriers into their engines. I'm personally not sure that time is well spent for .NET since .NET code usually does not spend much time with such low level primitives. But it's useful to understand what's possible and what approaches are being taken by others.

pgavlin · 2016-07-12T17:18:14Z

@GSPP these are good resources; thanks for sharing them!

Regarding the original issue: the likelihood of changes to ECMA or a decision to disregard the semantics it prescribes is remote. A more realistic (but not necessarily likely) path forward for enabling some of these transformations is a new set of APIs similar to the Volatilte.* APIs that provide the desired barrier semantics. If you think that such APIs would be valuable, I'd encourage you to file an issue to that effect.

mikedn · 2016-07-12T17:33:44Z

Here are some resources that I liked:

Exaggerating a bit those resources can be summed up by the following quote from N4455:

We aren’t aware of compilers performing this optimization yet, but it is being discussed. std::atomic_signal_fence could be used to prevent this reordering and fusion, or one could use a stronger memory ordering for the operations: this optimization is only valid on relaxed operations which aren’t ordered with respect to each other.

That "being discussed" discussion is a link to a LLVM bug created some 3 years ago. After more than a year someone replied and asked for a use case. Nothing happened since then.

I'm not saying that there aren't any such optimizations that are useful but it's very difficult to justify changing an existing spec based on what at the moment looks more like speculation. As already pointed out by @pgavlin above this would be better handled by adding new APIs, similar to what C++ did with std::atomic. Hacking volatile just won't do.

mikedn · 2016-07-12T17:53:30Z

Even if the ECMA memory model only dealt with barrier semantics, the removal of a volatile read is potentially observable. For example, given the code in the original post, if the load is removed, the compiler is free to reorder the non-volatile write to b with respect to the volatile write to a that precedes it

And even on x86 where stores aren't reordered you can still run into issues if a load is removed because that load might have been there to prevent load/store reordering which x86 does allow. Consider Intel's load/store reordering example:

x = 1       y = 1
r1 = y      r2 = y

It is possible to get r1 == 0 and r2 == 0. What if we don't want to happen? Let's add a load from the previous store location:

x = 1       y = 1
t = x       t = y
r1 = y      r2 = y

The added load can't be reordered with the previous store because they access the same location. It also can't be reordered with the existing load because x86 (supposedly) doesn't reorder loads.

Now, imagine what happens if the JIT comes in and eliminates the added load because t isn't used anywhere.

That said, fixing JIT's volatile load elimination doesn't actually fix this. But that's another story, at least then JIT will follow Intel's MM even if that doesn't accurately match .NET's MM.

VSadov · 2022-03-30T17:53:43Z

I believe this can be closed.

ECMA reserves the possibility of using device memory where reads may have nontrivial sideeffects, but a possibility of such scenario in combination with GC seems very doubtful.
If device memory is supported one day, it would need to be very explicit unmanaged access and most likely via special intrinsics. (i.e. Device.Read<T>(T* ptr)).

For the regular case we should treat effects of volatile reads as once we read, we may observe a different value and not as once we read, a bulb lights up on the wall.
Therefore unused volatile read has no observable effects.

With the above said, the JIT is right. Unused volatile reads can be dropped.

VSadov · 2022-12-27T01:10:40Z

Per memory document it is permitted to drop unused reads (volatile or ordinary)

NB: coalescing adjacent volatile reads is not permitted, but dropping unused ones is ok.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

AndyAyersMS mentioned this issue Jan 31, 2020

JIT: volatile compares should not be foldable #10268

Open

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

AntonLapounov mentioned this issue Nov 17, 2020

Improve performance of DateTime.UtcNow on Windows #44771

Closed

5 tasks

stephentoub mentioned this issue Jan 21, 2021

Redundant load-acquire elimination discrepancy between CoreCLR and Clang #47261

Closed

GSPP mentioned this issue Apr 1, 2022

Document the memory model guaranteed by dotnet/runtime runtimes #63474

Closed

VSadov self-assigned this Sep 5, 2022

VSadov added the memory model issues associated with memory model label Sep 7, 2022

VSadov closed this as completed Dec 27, 2022

ghost locked as resolved and limited conversation to collaborators Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT drops unused volatile reads. #6280

RyuJIT drops unused volatile reads. #6280

pgavlin commented Jul 7, 2016

omariom commented Jul 7, 2016

benaadams commented Jul 7, 2016

GSPP commented Jul 11, 2016 •

edited

Loading

mikedn commented Jul 11, 2016

GSPP commented Jul 11, 2016 •

edited

Loading

mikedn commented Jul 11, 2016

GSPP commented Jul 11, 2016

pgavlin commented Jul 11, 2016

GSPP commented Jul 11, 2016

pgavlin commented Jul 11, 2016 •

edited

Loading

mikedn commented Jul 12, 2016

GSPP commented Jul 12, 2016

pgavlin commented Jul 12, 2016

mikedn commented Jul 12, 2016

mikedn commented Jul 12, 2016

VSadov commented Mar 30, 2022

VSadov commented Dec 27, 2022

RyuJIT drops unused volatile reads. #6280

RyuJIT drops unused volatile reads. #6280

Comments

pgavlin commented Jul 7, 2016

omariom commented Jul 7, 2016

benaadams commented Jul 7, 2016

GSPP commented Jul 11, 2016 • edited Loading

mikedn commented Jul 11, 2016

GSPP commented Jul 11, 2016 • edited Loading

mikedn commented Jul 11, 2016

GSPP commented Jul 11, 2016

pgavlin commented Jul 11, 2016

GSPP commented Jul 11, 2016

pgavlin commented Jul 11, 2016 • edited Loading

mikedn commented Jul 12, 2016

GSPP commented Jul 12, 2016

pgavlin commented Jul 12, 2016

mikedn commented Jul 12, 2016

mikedn commented Jul 12, 2016

VSadov commented Mar 30, 2022

VSadov commented Dec 27, 2022

GSPP commented Jul 11, 2016 •

edited

Loading

GSPP commented Jul 11, 2016 •

edited

Loading

pgavlin commented Jul 11, 2016 •

edited

Loading