### Lecture 15 — Memory Consistency

Patrick Lam & Jeff Zarnett patrick.lam@uwaterloo.ca

Department of Electrical and Computer Engineering University of Waterloo

April 4, 2019

ECE 459 Winter 2019 1/25

#### **Memory Models**

Sequential program: statements execute in order.
Your expectation for concurrency: sequential consistency.

"... the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program." — Leslie Lamport

#### In brief:

- 1 for each thread: in-order execution;
- interleave the threads' executions.

No one has it: too expensive; recall the worked example for **flush** last time.

ECE 459 Winter 2019 2/

# Memory Models: Sequential Consistency

Another view of sequential consistency:

- each thread induces an execution trace.
- always: program has executed some prefix of each thread's trace.

ECE 459 Winter 2019 3/2!

# The Blind Men and Elephant



But unfortunately, threads have their own view of the world.

ECE 459 Winter 2019 4/25

Compilers and processors may reorder non-interfering memory operations.

$$T1: x = 1; r1 = y;$$

If two statements are independent:

- OK to execute them in either order.
- (equivalently: publish their results to other threads).

Reordering is a major compiler tactic to produce speedup.

ECE 459 Winter 2019 5/

### **Memory Consistency Models**

#### Sequential consistency:

■ No reordering of loads/stores.

Sequential consistency for datarace-free programs:

■ If your program has no data races, then sequential consistency.

Relaxed consistency (only some types of reorderings):

- Loads can be reordered after loads/stores; and
- Stores can be reordered after loads/stores.

#### Weak consistency:

■ Any reordering is possible.

Still, **reorderings** only allowed if they look safe in current context (i.e. independent; different memory addresses).

ECE 459 Winter 2019 6 / 25

#### 2011 Final Exam Question

```
x = y = 0

/* thread 1 */

x = 1;

r1 = y;

x = 0

/* thread 2 */

y = x;

r2 = x;
```

Assume architecture not sequentially consistent (weak consistency).

Show me all possible (intermediate and final) memory values and how they arise.

ECE 459 Winter 2019 7/25

#### 2011 Final Exam Question: Solution

Must include every permutation of lines (since they can be in any order); then iterate over all the values.

Probably too long, but shows how memory reorderings complicate things.

ECE 459 Winter 2019 8 / 25

## The Compiler Reorders Memory Accesses

When it can prove safety, the **compiler** may reorder instructions (not just the hardware).

**Example:** want thread 1 to print value set in thread 2.

■ If thread 2 reorders its instructions, will we get our intended result?

No.

ECE 459 Winter 2019 9/25

## The Compiler Reorders Memory Accesses

When it can prove safety, the **compiler** may reorder instructions (not just the hardware).

**Example:** want thread 1 to print value set in thread 2.

■ If thread 2 reorders its instructions, will we get our intended result?

No.

ECE 459 Winter 2019 9/25

#### **KEEP OUT**



Image Credit: MB298

## **Preventing Memory Reordering**

A **memory fence** prevents memory operations from crossing the fence (also known as a **memory barrier**).

■ Now prevents reordering; get expected result.

ECE 459 Winter 2019 11/25

## **Preventing Memory Reordering in Programs**

Step 1: Don't use volatile on C/C++ variables <sup>1</sup>. Syntax depends on the compiler.

■ Microsoft Visual Studio C++ Compiler:

```
_ReadWriteBarrier()
```

■ Intel Compiler:

```
__memory_barrier()
```

■ GNU Compiler:

```
__asm__ __volatile__ ("" ::: "memory");
```

The compiler also shouldn't reorder across e.g. pthreads mutex calls.

### Aside: gcc Inline Assembly

Just as an aside, here's gcc's inline assembly format

Last slide used \_\_volatile\_\_ with \_\_asm\_\_. This isn't the same as the normal C volatile. It means:

■ The compiler may not reorder this assembly code and put it somewhere else in the program.

ECE 459 Winter 2019 13/25

# Memory Fences: Preventing HW Memory Reordering

Memory barrier: no access after the barrier becomes visible to the system (i.e. takes effect) until after all accesses before the barrier become visible.

Note: these are all x86 asm instructions.

#### mfence:

 All loads and stores before the fence finish before any more loads or stores execute.

#### sfence:

■ All stores before the fence finish before any more stores execute.

#### I fence:

■ All loads before the fence finish before any more loads execute.

ECE 459 Winter 2019 14/25

## Preventing Hardware Memory Reordering (Option 2)

Some compilers also support preventing hardware reordering:

■ Microsoft Visual Studio C++ Compiler:

```
MemoryBarrier();
```

Solaris Studio (Oracle) Compiler:

```
__machine_r_barrier();
__machine_w_barrier();
__machine_rw_barrier();
```

■ GNU Compiler:

```
___sync_synchronize();
```

ECE 459 Winter 2019 15 / 25

### **Memory Barriers and OpenMP**

Fortunately, an OpenMP **flush** (or, better yet, mutexes) also preserve the order of variable accesses.

Stops reordering from both the compiler and hardware.

For GNU, flush is implemented as \_\_sync\_synchronize();

**Note:** proper use of memory fences makes volatile not very useful (again, volatile is not meant to help with threading, and will have a different behaviour for threading on different compilers/hardware).

ECE 459 Winter 2019 16/25

## **Atomic Operations**



https://commons.wikimedia.org/w/index.php?curid=1675352

ECE 459 Winter 2019 17 / 25

### **Atomic Operations**

We saw the **atomic** directive in OpenMP, plus C++11 atomics.

Most OpenMP atomic expressions map to atomic hardware instructions.

Other atomic instructions exist.

ECE 459 Winter 2019 18 / 25

Also called **compare and exchange** (cmpxchg instruction).

```
int compare_and_swap (int* reg, int oldval, int newval) {
  int old_reg_val = *reg;
  if (old_reg_val == oldval)
    *reg = newval;
  return old_reg_val;
}
```

- Afterwards, you can check if it returned oldval.
- If it did, you know you changed it.

ECE 459 Winter 2019 19 / 25

### Implementing a Spinlock

Use compare-and-swap to implement spinlock:

```
void spinlock_init(int* lock) { *lock = 0; }

void spinlock_lock(int* lock) {
    while(compare_and_swap(lock, 0, 1) != 0) {}
    __asm__ ("mfence");
}

void spinlock_unlock(int* lock) {
    __asm__ ("mfence");
    *lock = 0;
}
```

You'll see **cmpxchg** quite frequently in the Linux kernel code.

ECE 459 Winter 2019 20 / 25

#### **ABA Problem**

Sometimes you'll read a location twice.

If the value is the same, nothing has changed, right?

ECE 459 Winter 2019 21/25

#### **ABA Problem**

Sometimes you'll read a location twice.

If the value is the same, nothing has changed, right?

No. This is an ABA problem.

You can combat this by "tagging": modify value with nonce upon each write.

Can keep value separately from nonce; double compare and swap atomically swaps both value and nonce.

ECE 459 Winter 2019 21/25



The ABA problem is not any sort of acronym nor a reference to this: https://www.youtube.com/watch?v=Sj\_9CiNkkn4

ECE 459 Winter 2019 22 / 25

It's a value that is A, then changed to B, then changed back to A.

The ABA problem is a big mess for the designer of lock-free Compare-And-Swap routines.

- 1  $P_1$  reads  $A_i$  from location  $L_i$ .
- 2  $P_k$  interrupts  $P_1$ ;  $P_k$  stores the value B into  $L_i$ .
- $P_j$  stores the value  $A_i$  into  $L_i$ .
- $P_1$  resumes; it executes a false positive CAS.

ECE 459 Winter 2019 23 / 25

#### False Positive

It's a "false positive" because  $P_1$ 's compare-and-swap operation succeeds even though the value at  $L_i$  has been modified in the meantime.

If this doesn't seem like a bad thing, consider this.

If you have a data structure that will be accessed by multiple threads, you might be controlling access to it by the compare-and-swap routine.

What should happen is the algorithm should keep trying until the data structure in question has not been modified by any other thread in the meantime.

But with a false positive we get the impression that things didn't change, even though they really did.

ECE 459 Winter 2019 24 / 25

### Napoleon was defeated...

You can combat this by "tagging": modify value with nonce upon each write.

You can also keep the value separately from the nonce; double compare and swap atomically swaps both value and nonce.

Another example of this: Java ConcurrentModificationException is detected by checking the modification count of a collection.

ECE 459 Winter 2019 25 / 25