![Erudio logo](../img/erudio-logo-small.png)

# Thread Data and Race Conditions

We saw at the end of the last lesson that threads can access to share data. This is useful to share configuration, but it introduces problems. The main problem that arises is _[**race conditions**](https://en.wikipedia.org/wiki/Race_condition)_. Let us create an example.

In [1]:
from threading import Thread, current_thread, Lock
from time import sleep, time
from sys import stderr

## Shared state

Previously we had created a dictionary and had issues arrive when different threads update values.  But actually, one simple share scalar value suffices to see the problem.

In [2]:
def increment(n):
    global counter
    for _ in range(n):
        counter += 1

In [3]:
counter, nthread, nloop = 0, 100, 55_000 
threads = [Thread(target=increment, args=(nloop,)) for _ in range(nthread)]
for t in threads:
    t.start()

# Make sure they have finished before we report
while alive := sum(t.is_alive() for t in threads):
    sleep(2)
    print("Num threads alive:", sum(t.is_alive() for t in threads))
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")    

55,000 loops X 100 threads -> counter is 5,500,000


### Where we went wrong?

A subtlety in our program is that the single augmented assignment, `+=` is not **atomic**. 

Behind the scenes, `+=` runs 2 (or more) operations. In pseudo-code, this looks like:

```
add(counter, 1) TO <temp>  # A
store("counter", <temp>)   # B
```

Multiple threads are executing concurrently. Hence one or more may hold a value that becomes outdated between the time when the addition is performed (A) and the time when the name `counter` is rebound (B).

To be more exact, we can actually look at the Python bytecode (wordcode).  A thread may be suspended after any single instruction, and there are **four** instructions involved in the single augmented assignment.

In [4]:
import dis
dis.dis(increment)

  3           0 LOAD_GLOBAL              0 (range)
              2 LOAD_FAST                0 (n)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                 6 (to 22)
             10 STORE_FAST               1 (_)

  4          12 LOAD_GLOBAL              1 (counter)
             14 LOAD_CONST               1 (1)
             16 INPLACE_ADD
             18 STORE_GLOBAL             1 (counter)
             20 JUMP_ABSOLUTE            4 (to 8)

  3     >>   22 LOAD_CONST               0 (None)
             24 RETURN_VALUE


## Thread Synchronization

<img src="../img/recording_studio_light.png" width="25%" align="right"/>How can we fix the race condition? 

We need a way to keep the threads from stepping onto each other's data, some signal that a resource is **busy**.

*(Example of INEs studios, a recording light is on, the studio is busy, nobody will enter the room)*

<img src="../img/recording_studio_light.png" width="25%" align="right"/>The most basic synchronization mechanism is a [Lock](https://en.wikipedia.org/wiki/Lock_(computer_science)), or a Mutex (mutual exclusion lock). Python includes the very intuitive `threading.Lock` class. 

A Lock works like the RECORDING light pictured. The first person (thread) that "arrives" turns on the light (acquires the lock). Anyone else has to wait for the person/thread to turn the light off and make the room (resource) available again.

## Locking

In [5]:
lock = Lock()

def lock_hogger(lock, wait=5):
    name = current_thread().name
    print(f"{name}: acquiring lock.")
    lock.acquire()
    print(f"{name}: Lock acquired, sleeping")
    sleep(wait)
    print(f"{name}: Woke up, releasing lock")
    lock.release()

Thread(target=lock_hogger, args=(lock,)).start()

Thread-105 (lock_hogger): acquiring lock.
Thread-105 (lock_hogger): Lock acquired, sleeping


We can ask about the state of a lock.

In [6]:
print("Is lock currently being used?", lock.locked())

# Claim the lock
lock.acquire()
print("Lock acquired?", lock.locked())

# We are done now
lock.release()
print("Still being used?", lock.locked())

Is lock currently being used? True
Thread-105 (lock_hogger): Woke up, releasing lock
Lock acquired? True
Still being used? False


If we only ever ran sequential code, the lock would just be equivalent to a Boolean value.  But it is shared across all threads that have it in scope.  When another thread tries to acquire a lock that is in use, it will block until the lock becomes free.

It is important (and sometimes tricky) to get the sequence of actions right, since a released lock will raise an exception if you try to release it again.

In [7]:
lock = Lock()
t = Thread(target=lock_hogger, args=(lock, 0.1))
t.start()
print("Releasing lock in MainThread...")
lock.release()

Thread-106 (lock_hogger): acquiring lock.
Thread-106 (lock_hogger): Lock acquired, sleeping
Releasing lock in MainThread...


Exception in thread Thread-106 (lock_hogger):
Traceback (most recent call last):
  File "C:\Users\Laisha\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Laisha\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Laisha\AppData\Local\Temp\ipykernel_7092\1266546085.py", line 10, in lock_hogger
RuntimeError: release unlocked lock


Thread-106 (lock_hogger): Woke up, releasing lock


Where we get an exception among the competing `lock.release()` calls depends on the timings of threads.

In [8]:
lock = Lock()
t = Thread(target=lock_hogger, args=(lock, 0))
t.start()
sleep(0.1)
try:
    print("Releasing lock in MainThread...")
    lock.release()
except Exception as err:
    print(f"{repr(err)} in {current_thread().name}", file=stderr)

Thread-107 (lock_hogger): acquiring lock.
Thread-107 (lock_hogger): Lock acquired, sleeping
Thread-107 (lock_hogger): Woke up, releasing lock
Releasing lock in MainThread...


RuntimeError('release unlocked lock') in MainThread


## Fixing the counter

Now that we know about locks, we can use them to fix our counter example:

In [9]:
lock, counter, nthread, nloop = Lock(), 0, 100, 70_000 

def increment(n, lock):
    global counter
    for _ in range(n):
        lock.acquire()
        counter += 1
        lock.release()

threads, now = [], time()
for _ in range(nthread):
    t = Thread(target=increment, args=(nloop, lock))
    threads.append(t)
    t.start()

In [10]:
while alive := sum(t.is_alive() for t in threads):
    sleep(5)
    print("Num threads alive:", sum(t.is_alive() for t in threads))
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")
print(f"Running everything took {time()-now:.2f} seconds!")

Num threads alive: 100
Num threads alive: 100
Num threads alive: 99
Num threads alive: 99
Num threads alive: 99
Num threads alive: 99
Num threads alive: 99
Num threads alive: 98
Num threads alive: 0
70,000 loops X 100 threads -> counter is 7,000,000
Running everything took 45.93 seconds!


## Problems with synchronization

Locks are acquired before accessing what we call "Critical Sections"; important sections in our code that can potentially introduce race conditions. 

The problem is that locks are "cooperative", but you're not obliged to use them. If just one function in the codebase does lock management wrong, the problems can propagate. 

Possible problems:

1. You fail to recognize that there is a "critical section".
2. You fail to acquire the lock before entering the critical section.
3. The critical section might use resources NOT protected by the lock.
4. You fail to release the lock (e. g. code breaks before releasing).
5. Deadlocks! (more in next lesson).

### Exceptions

What if our function doesn't run has hoped?

In [11]:
lock = Lock()

def faulty_lock_handler(lock, wait=10):
    print("\t\tThread: Acquiring lock.")
    lock.acquire()
    print("\t\tThread: Lock acquired")
    sleep(wait)
    print("\t\tThread: Woke up, releasing lock")
    lock.release()

In [12]:
# The `wait` param is incorrect, should be a number
Thread(target=faulty_lock_handler, args=(lock, 'x')).start()

		Thread: Acquiring lock.

Exception in thread Thread-208 (faulty_lock_handler):
Traceback (most recent call last):
  File "C:\Users\Laisha\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Laisha\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Laisha\AppData\Local\Temp\ipykernel_7092\1416126950.py", line 7, in faulty_lock_handler
TypeError: 'str' object cannot be interpreted as an integer



		Thread: Lock acquired


Trying to acquire the lock will block **FOREVER**:

In [None]:
lock.acquire()

We could potentially add a _timeout_ to our acquire method, it will block for the specified number of seconds; if it hasn't acquired the lock, it returns `False`:

In [None]:
lock.acquire(timeout=2)

We can make it non-blocking, if it is not able to acquire the lock, it will release it immediately:

In [None]:
lock.acquire(blocking=False)

We have a handle on the lock variable, so we can release.  This is dangerous though: what if the thread really **is** still using the resource?

In [None]:
lock.release()
lock.acquire(blocking=False)

## Context Manager

To greatly ease this problem, we can use locks as Context Managers. This will release the lock **even if** something goes wrong within the critical section:

In [None]:
def fixed_lock_handler(lock, wait=10):
    print("\t\tThread: Acquiring lock.")
    with lock:
        print("\t\tThread: Lock acquired")
        sleep(wait)
    print("\t\tThread: Woke up, releasing lock")

In [None]:
lock = Lock()
Thread(target=fixed_lock_handler, args=(lock, 5)).start()

In [None]:
lock.acquire()

In [None]:
lock.release()

In [None]:
# The `wait` param is incorrect, should be a number
Thread(target=fixed_lock_handler, args=(lock, 'x')).start()

Is the lock still acquired?

In [None]:
lock.locked()

In [None]:
lock.acquire()

In [None]:
lock.release()

The critical section failed with an exception, but the lock was released before exiting. Using `with` and the lock as context manager is syntactic sugar for this pattern:

```python
lock.acquire()
try:
    critical_section()
finally:
    lock.release()  # We'll release the lock no matter what
```

## Improving the threaded counter

The last touch for our counter should be to use the context manager protocol of the lock object:

In [None]:
lock, counter, nthread, nloop = Lock(), 0, 100, 50_000 

def increment(n, lock):
    global counter
    for _ in range(n):
        with lock:
            counter += 1
            
threads, now = [], time()
for _ in range(nthread):
    t = Thread(target=increment, args=(nloop, lock))
    threads.append(t)
    t.start()

In [None]:
while alive := sum(t.is_alive() for t in threads):
    sleep(5)
    print("Num threads alive:", sum(t.is_alive() for t in threads))
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")
print(f"Running everything took {time()-now:.2f} seconds!")

## Coarser Locks

The threaded program is a **whole lot** slower than a serial version! When we do the augmented assignment, `+=` it is not *atomic*, but it is still a lot less work than acquiring and releasing the lock.  The program is dominated by juggling locks.

In [None]:
%%time
counter, nthread, nloop = 0, 100, 50_000 

def increment(nthread, nloop):
    global counter
    for n in range(nloop):
        for m in range(nthread):
            counter += 1
            
increment(nthread, nloop)

Even the threaded version that did not use locks will be slower than the serial version, but not hundreds of times slower.  By releasing locks less often, and doing more work within a thread, we can make a compromise.

In [None]:
lock, counter, nthread, nloop = Lock(), 0, 100, 50_000
coarseness = 1000

def increment(nloop, lock, coarsness):
    assert nloop % coarseness == 0, "Cannot evenly divide work"
    global counter
    for chunk in range(nloop//coarseness):
        with lock:
            for _ in range(coarseness):
                counter += 1

threads, now = [], time()
for _ in range(nthread):
    t = Thread(target=increment, args=(nloop, lock, coarseness))
    threads.append(t)
    t.start()

In [None]:
while alive := sum(t.is_alive() for t in threads):
    sleep(5)
    print("Num threads alive:", sum(t.is_alive() for t in threads))
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")
print(f"Running everything took {time()-now:.2f} seconds!")

## Summary:

We've seen the importance of keeping our critical sections safe, to avoid race conditions. But there's no free lunch. To prevent race conditions we have to use synchronization mechanisms, and as we saw, that can carry other issues.

In the next section we'll explore one of the many things that can go wrong with manual synchronization: one of the the scariest words in computer science: **Deadlocks**.

# Exercise

## Description

Continuing the theme of the last two lesson exercises, we again work with a collection of 1000 files, each of which contain 20 integers, one per line.  As in the earlier exercise, the threaded style on these small files is unlikely to prove faster than a purely serial approach (but with larger files or slower reads, the balance could change).

You should operate on all the data using 20 threads.  The calculation we make in this exercise is a bit different than prior ones.  Considering the data files in strictly alphabetical order, we want to perform a thousand operations; but this will be 4 basic operations, repeated on a cummulative result after each 5 numbers (i.e. 250 times for the exercise).

The sequence will be add, multiply, exponentiate, then modulo.  For example, for each file listed in the header

| Line# | AA | BB | (+)    | CC | (\*)   | DD  | (\*\*)      | EE | (%)    | FF  | (+)    | GG  | (\*)
|-------|----|----|--------|----|--------|-----|-------------|----|--------|-----|--------|-----|------
|  _1_  |  7 |  8 | **15** |  5 | **45** |  4  | **4100625** | 13 | **9**  | 3   | **12** | ... | ...
|  _2_  |  2 |  2 | **4**  |  3 | **12** |  2  | **144**     | 99 | **45** | 5   | **50** | ... | ...
|  _3_  | ...| ...| ...    | ...| ...    | ... | ...         | ...| ...    | ... | ...    | ... | ...

Notice that since the first repetition of 4 binary operations takes 5 numbers, but each subsequent repetition works on that accumlated result, we do not wish to perform the final modulo operation for which we do not have a modulus.

Each of the 20 threads should do this calculation for the 1000 numbers taken from corresponding lines of each file.  The results need to be stored in a global list whose order correctly corresponds to the line numbers of each calculation.  The trick, obviously, is that you do not want multiple threads to cause a race condition on that global list, and yet you also want the calculations to happen concurrently.

## Setup

In [None]:
from threading import Thread, Lock
from generate import create_files

create_files('lesson-3')

results_list = []

# Use additional shared state as needed
other_global_var = ...

def calculate():
    results_list.append(4100625)

# After running all threads, `results_list` should contain right answers
threads = [Thread(target=calculate) for _ in range(20)]

## Solution

In [None]:
from glob import glob
from time import sleep

locks = [Lock() for _ in range(20)]
results_list = []

def file_reader():
    global results_list
    results_list = [None for _ in range(20)]
    
    # Lock all the data lines until we read numbers
    [lock.acquire() for lock in locks]
    numbers = [open(nums).readlines() 
               for nums in sorted(glob('tmp-*.numbers'))]
    for i in range(20):
        line_across = [int(seq[i]) for seq in numbers]
        # Get the ith element of each sequence in line_across
        # Put in results_list, then release lock on that offset
        results_list[i] = line_across
        locks[i].release() 
        
def calculate(i):
    # Perform calculation on ith line of data
    # Block until that ith line is released by file_reader
    sleep(0.001)  # Be EXTRA sure file_reader gets lock first
    with locks[i]:
        # Here's a trick: the final modulus is something larger
        # than largest possible value accumulated
        largest = ((99 + 99) * 99) ** 99
        my_data = results_list[i] + [largest]
        # After calculation put single integer in results_list[i]
        assert isinstance(my_data, list)
        # Start accumulator with first number
        accum = my_data[0]
        # Get each chunk of 4 new values
        for j in range(1, len(my_data), 4):
            b, c, d, e = my_data[j:j+4]
            accum = (((accum + b) * c) ** d) % e
        results_list[i] = accum
        
threads = [Thread(target=file_reader)]
for i in range(20):
    threads.append(Thread(target=calculate, args=(i,)))

## Test Cases

In [None]:
def test_thread_count():
    assert len(threads) >= 20
    
test_thread_count()

In [None]:
def test_threads():
    [t.start() for t in threads]
    [t.join() for t in threads]
    correct = [int(n) for n in open('answers.txt').readlines()]
    for i in range(20):
        assert results_list[i] == correct[i], f"Mismatch on line {i+1}"
        
test_threads()

-------------
Materials licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by the authors