### CppCon 2017: Fedor Pikus “C++ atomics, from basic to advanced. What do they really do?”

https://www.youtube.com/watch?v=ZQFzMfHIxng&list=PLR2BwNxHx0z8ccXrXKzuTnsfB17fidY3M&t=2962s&index=10

#### 2:25 https://youtu.be/ZQFzMfHIxng?t=145

In [1]:
#include <atomic>
#include <cstddef>
#include <iostream>
#include <mutex>



In [2]:
// Program A:
{
    const size_t N = 5;
    long a[N] = { 1, 2, 3, 4, 5 }; // can be very long

    std::atomic<long> sum{};
    auto do_work = [&](size_t N, long* a) {
        for (size_t i = 0; i < N; ++i) {
            sum += a[i];
        }
    };

    do_work(N, a);
    std::cout << sum << '\n';
}

15




In [3]:
// Program B:
{
    const size_t N = 5;
    long a[N] = { 1, 2, 3, 4, 5 }; // can be very long

    long sum(0);
    std::mutex M;
    auto do_work = [&](size_t N, long* a) {
        long s = 0;
        for (size_t i = 0; i < N; ++i) {
            s += a[i];
        }
        std::lock_guard<std::mutex> L(M);
        sum += s;
    };

    do_work(N, a);
    std::cout << sum << '\n';
}

15




**2:32 and 3:07** shows "lock free" 要慢很多！why?

### Is lock-free faster?
- Algorithm rules supreme **算法最关键**
- "Wait-free" has nothing to do with time
  - Wait-free refers to the number of compute "steps"
  - Steps do not have to be of the same duration
- **Atomic operations do not guarantee good performance**
- There is no substitute for understanding what you're doing
  - This class is the next best thing

#### 4:24 What is an atomic operation?
#### 8:50 Date shareing in C++

In [4]:
{
    std::atomic<int> x(0); // good
    // std::atomic<int> x = 0; // bad - but compiles! cling bug?

    ++x; // atomic!!!
    std::cout << x;
}


1



#### 10:44 What types can be made atomic?
- Any **trivially copyable** type can be made atomic
- What is trivially copyable?
  - Continues chunk of memory
  - Copying the object means copying all bits (memcpy)
  - No virtual functions, noexcept constructor

#### 11:55 What operations can be done on `std::atomic<int>`?

One of these is not the same as the others:
```cpp
++x;
x++;
x += 1;
x != 2;
x *= 2;
int y = x * 2;
x = y + 1;
x = x + 1;
x = x * 2;
```

- `x *= 2;` // this does not compile! There is no automic multiply in most hardware.只要能编译，C++需要保证该操作是atomic。
- 最后两个也不是atomic，尽管它们能编译
  - 一行里面有两个原子操作！Atomic read followed by atomic write!
  - 另外一个线程可以在这两个原子操作之间改变这个原子变量


#### `std::atomic<T>` and overloaded operators

- `std::atomic<T>`只对可以原子运算的操作提供重载，否则不编译
- 注意：包含原子变量的表达式还是可以编译的，问题在于整个表达式未必是一个原子操作。**这非常容易导致错误！**

#### 15:41 What "other operations" can be done on `std::atomic<T>`?

- Explicit reads/writes
```cpp
T y = x.load(); // same as T y = x;
x.store(y); // same as x = y;
```
- Atomic exchange:
```cpp
T z = x.exchange(y); // Atomically: z = x; x = y;
```

- Compare-and-swap (conditional exchange):
```cpp
bool success = x.compare_exchange_strong(y, z); // T& y;
    // if x==y, make x=z and return true;
    // Otherwise, set y=x and return false
```

- CAS is the **key to most lock-free algorithms**

#### 17:07 What is so special about CAS?

##### Example: atomic increment with CAS:

In [5]:
{
    std::atomic<int> x{0};
    int x0 = x;
    while(!x.compare_exchange_strong(x0, x0+1)) {}
    
    std::cout << "x = " << x << ", x0 = " << x0;
}

x = 1, x0 = 0



上面例子里：
- 如果没有其它线程在操作这个原子变量x，直接就能成功，返回true，循环退出
- 如果有其它线程也有CAS在操作x，**x0会再一次赋值成改变后的x**，返回false，让循环重复，知道这个CAS beats 其它人的CAS
- 这里是**lock-free，不是wait-free**

某些lock-free更简单，但是这个能做任何事情：
- increment doubles
- multiply integers
- and may more
```cpp
while(!x.compare_exchange_strong(x0, x0*2)) {}
```

#### 18:54 What "other operations" can be done on `std::atomic<T>`?

- fetch_add

In [6]:
{
    std::atomic<int> x{1};
    int y = 2;
    int z = x.fetch_add(y); // same as x += y, but return old x
    std::cout << "x = " << x << ", z = " << z;
}

x = 3, z = 1



- fetch_sub, fetch_and, fetch_or, fetch_xor
  - same as +=, -=, etc. operators
  
- more verbose, but **less error-prone than operators and expressions**
  - 原因是operators and expressions不容易发现整个表达式是由多个atomic操作组成的，但整体并不是atomic
  - 但是如果有多个这些function calls，更容易让人理解成对应于单个的atomic操作，直觉上并不觉得组合起来是atomic

#### 21:22 How fast are atomic operations?
- Performance should be measured
- 硬件相关！编译器相关！

#### 22:30 atomic vs. non-atomic 比较结果
- atomic 略微慢一点

#### 23:19 atomic vs. locks 比较结果
- mutex 要慢不少
- spinlock几乎和atomic差不多
- 26:23: CAS比atomic/spinlock慢一些，但是比mutex快，介于两者之间

#### 26:30 Is  atomic the same as lock-free?
std::atomic隐藏了一个天大的秘密，并不总是lock-free

In [7]:
{
    struct A {long x;};
    struct B {long x; long y;};
    struct C {long x; long y; long z;};
    std::cout << "A lock-free: " << std::atomic<A>{}.is_lock_free() << '\n'
              << "B lock-free: " << std::atomic<B>{}.is_lock_free() << '\n'; // maybe
            //<< "C lock-free: " << std::atomic<C>{}.is_lock_free() << '\n'; // cling error, should return 0
}

A lock-free: 1
B lock-free: 1




- is_lock_free() 是runtime function，为什么不是compile time?
  - 原因是alignment
- c++17提供了一个compile time function:
  - constexpr is_always_lock_free()

#### 29:43 Do atomic operations wait on each other?
Testing of 3 cases:
1. shared: `std::atomic<int> x;` ++x in two threads
2. not shared: like above one, but in different cachelines
3. non-shared (false sharing): `std::atomic<int> x[2];` ++x[0] in thread1, ++x[1] in thread2
  - this is actually falsed shared, because x[0] and x[1] are in the same cacheline

The testing result is at 31:52
- case 1 and 3 is worse than 2

**结论**：
-  原子操作确实要互相等待，需要等待cache line的访问
  - 这是date sharing without races要付出的代价
  - 即使对不同的原子变量访问，也可能会落到相同的 cache line 上（false sharing），仍然付出real-time penalty

#### 33.03 Strong and weak CAS

- x.compare_exchange_strong(old_x, new_x); // T& old_x
```cpp
if (x == old_x) { x = new_x; return true; }
else { old = x; return false; }
```
- x.compare_exchange_weak(old_x, new_x);
  - same thing, but can "spuriously fail" and return false even if x == old_x
  - what is the value of old_x if this happens?
  - if weak CAS correctly returns x == old_x, why would it fail?

##### CAS, concepturally (pseudo-code):
```cpp
bool compare_exchange_strong(T& old_v, T new_v) {
    Lock L;        // Get exclusive access
    T tmp = value; // Current value of the atomic
    if (tmp != old_v) { old_v = tmp; return false; }
    value = new_v;
    return true;
}
```