Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: .NET memory model and lightweight barrier for writes #6257

Closed
omariom opened this issue Jul 5, 2016 · 19 comments
Closed

Question: .NET memory model and lightweight barrier for writes #6257

omariom opened this issue Jul 5, 2016 · 19 comments
Assignees
Labels
memory model issues associated with memory model question Answer questions and provide assistance, not an issue with source code or documentation.

Comments

@omariom
Copy link
Contributor

omariom commented Jul 5, 2016

@stephentoub @jkotas

.NET .volatile writes don't prevent any writes from floating above them - that's the semantics of store-release. So in the following scenario..

Voltile.Write(ref locationA, 1);
locationB = 2; // normal write

those 2 writes can be reordered.

@JPWatson suggested putting a dummy volatile read from the just written location to form a single barrier that doesn't pass reads/writes below itself nor allows writes to float above it.

Voltile.Write(ref locationA, 1);
_dummy = Voltile.Read(ref locationA);
locationB = 2; // normal write

The reasoning is this:

  1. volatile reads can't be reordered with the writes to the same memory location
  2. volatile operations can't be eliminated
  3. no memory operations can be reordered with a previous volatile read
  4. no memory operations can be reordered with a following volatile write

It looks legitimate according to ECMA CLI

A volatile read has “acquire semantics” meaning that the read is guaranteed to occur prior to any
references to memory that occur after the read instruction in the CIL instruction sequence. A
volatile write has “release semantics” meaning that the write is guaranteed to happen after any
memory references prior to the write instruction in the CIL instruction sequence.
...
An optimizing compiler that converts CIL to native code shall not remove any volatile operation,
nor shall it coalesce multiple volatile operations into a single operation.

For reads this trick won't work because in such scenario, thanks to store buffer forwarding, reads can be reordered with the dummy read. @joeduffy explained it here and here.

Is it a legitimate way to create a barrier for following writes or we have to use full memory barrier?

@jkotas
Copy link
Member

jkotas commented Jul 5, 2016

@jkotas
Copy link
Member

jkotas commented Jul 5, 2016

The volatile read in your example prevents reordering of the writes. However, making the second write volatile instead should have about the same effect:

// These 2 writes cannot be reordered
locationA = 1;
Volatile.Write(ref locationB, 2);

If your context is more complex, it may be useful to show the code for the other processor and then ask whether certain result is possible. Kind of like like the examples in "Examples Illustrating the Memory-Ordering Principles" chapter in the Intel manual.

@omariom
Copy link
Contributor Author

omariom commented Jul 5, 2016

It all started here

Volatile.Write(ref length, 1);
// bunch of normal writes to other memory locations
Volatile.Write(ref length, 2);

The idea is to try not to use heavyweight MemoryBarrier between first write to length and normal writes of data, relying only on ECMA CLI.

@omariom
Copy link
Contributor Author

omariom commented Jul 5, 2016

I can't check what code is generated for ARM.
This is how it looks on .NET CLR 4.6.1 x64.

static volatile int a;
static int b, dummy;

[MethodImpl(MethodImplOptions.NoInlining)]
static void LightweightStoreFence()
{
    a = 1;
    dummy = a;
    b = 2;
}
mov         dword ptr [7FE988F47A0h],1  
mov         eax,dword ptr [000007FE988F47A0h]  
mov         dword ptr [000007FE988F47A8h],eax  
mov         dword ptr [7FE988F47A4h],2  
ret

Everything seems ok. b = 2; after a = 1; and x86 doesn't reorder writes.

But will it be the same on other CPUs? For example, if we read to a local var

a = 1;
var dummy = a;
b = 2;

JIT removes the volatile read completely.

mov         dword ptr [7FE989047A0h],1
mov         dword ptr [7FE989047A4h],2
ret  

And it is not clear if JIT did it because it realized it can remove the dummy read still preserving the order dictated by CLI rules or the order just happened to be preserved.

@jkotas
Copy link
Member

jkotas commented Jul 5, 2016

The JIT does not do any reordering over volatile ops today - https://github.com/dotnet/coreclr/blob/master/src/jit/importer.cpp#L11296

@mikedn
Copy link
Contributor

mikedn commented Jul 5, 2016

The JIT does not do any reordering over volatile ops today

Unless it happens to be buggy :)

@omariom
Copy link
Contributor Author

omariom commented Jul 5, 2016

@mikedn That's not counted :)

@jkotas

The JIT does not do any reordering over volatile ops today

So we can rely on that dummy volatile read to a local var to be a barrier for following writes? In the future and across different CPUs because this is how ECMA CLI rules must be understood?

@jkotas
Copy link
Member

jkotas commented Jul 5, 2016

I think so.

@jkotas jkotas closed this as completed Jul 7, 2016
@omariom
Copy link
Contributor Author

omariom commented Jul 7, 2016

@jkotas
Would be nice to have simple tests that validate JIT against the model described in ECMA spec.
How similar conditions are usually tested by the team?

@jkotas
Copy link
Member

jkotas commented Jul 7, 2016

cc @dotnet/jit-contrib

There are number of tests that are validating this in the repo (look for volatile under tests\src\JIT). If you believe that there is anything particular missing, we would be happy to take PR with improvements.

Fully validating it is hard problem. It would need to prove that no combination of different optimizations violates the invariants ... https://en.wikipedia.org/wiki/Formal_methods or https://en.wikipedia.org/wiki/Model_checking are there to help.

@pgavlin
Copy link
Contributor

pgavlin commented Jul 7, 2016

@omariom: the ECMA volatile semantics are read-acquire and write-release, so the volatile read followed by a volatile write will do what you want. To be more explicit, memory operations that occur after a volatile read in program order cannot be moved above that volatile read, and memory operations that occur before a volatile write in program order cannot be moved after a volatile write. So if you have a took the sequence you mentioned:

Volatile.Write(ref length, 1);
// bunch of normal writes to other memory locations
Volatile.Write(ref length, 2);

And performed the transformation you suggested:

Volatile.Write(ref length, 1);
Volatile.Read(ref length);
// bunch of normal writes to other memory locations
Volatile.Write(ref length, 2);
  1. The bunch of normal writes would not be allowed to move above the volatile read due to read-acquire semantics
  2. The bunch of normal writes would not be allowed to move below the volatile write due to write-release semantics.
  3. The fact that the same memory location is being accessed by the volatile read and the volatile write prevents the volatile read from being moved above the volatile write.

(3) implies that the read-acquire behavior of the volatile read applies transitively to the preceding volatile write, which means that the bunch of normal writes would not be allowed to move above the first volatile write. IIRC, if a different memory location was used for the volatile read, the third property would not be in play and the bunch of normal writes could still be reordered w.r.t. the first volatile write.

HTH.

@pgavlin
Copy link
Contributor

pgavlin commented Jul 7, 2016

(@ericeil can point out any flaws in my understanding and/or logic 😄)

@CarolEidt
Copy link
Contributor

@pgavlin - I don't believe that you need condition 3 to ensure that the volatile read is not moved above the volatile write. The first paragraph of I.12.6.4 in the spec ensures that:

Conforming implementations of the CLI are free to execute programs using any technology that guarantees, within a single thread of execution, that side-effects and exceptions generated by a thread are visible in the order specified by the CIL. For this purpose only volatile operations (including volatile reads) constitute visible side-effects.

@pgavlin
Copy link
Contributor

pgavlin commented Jul 7, 2016

Ah, perfect! I couldn't find that language under I.12.6.7, which covers volatile read and write semantics specifically. Thanks! In that case, any volatile read will do.

@pgavlin
Copy link
Contributor

pgavlin commented Jul 7, 2016

@omariom: @RussKeldorph pointed out that I managed to skim over the assembly dump you posted. That assembly (which I've reproduced locally) indicates that there is a bug in the JIT that is causing us to drop the volatile read, as ECMA specifically disallows removing or coalescing volatile operations. I've opened dotnet/coreclr#6172 to track.

@omariom
Copy link
Contributor Author

omariom commented Jul 7, 2016

@CarolEidt

side-effects and exceptions generated by a thread are visible in the order specified by the CIL. For this purpose only volatile operations (including volatile reads) constitute visible side-effects.

Does it mean a volatile write can't be reordered with a following volatile read from a different location?

@omariom
Copy link
Contributor Author

omariom commented Jul 7, 2016

@pgavlin wow! I thought that was legal.

@CarolEidt
Copy link
Contributor

@omariom - yes, the way I would read that section is that it would be illegal to reorder any volatile operations relative to one another, even if they involve different locations. Though, to be fair, it is often been observed that the section in question is not as rigorous as it could be.

@mikedn
Copy link
Contributor

mikedn commented Jul 12, 2016

yes, the way I would read that section is that it would be illegal to reorder any volatile operations relative to one another, even if they involve different locations.

This is not what happens currently. The JIT doesn't use any fences for volatile load/stores and x86 allows some load/store reordering. That said, on x86 stores aren't reordered so in this particular case the load isn't actually needed.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2020
@VSadov VSadov self-assigned this Sep 7, 2022
@VSadov VSadov added the memory model issues associated with memory model label Sep 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
memory model issues associated with memory model question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

6 participants