![Erudio logo](../img/erudio-logo-small.png)

## Deadlocks

Deadlocks are the Charybdis to the Scylla of race conditions.  That is, avoiding one puts you in danger of suffering the other.

In [3]:
from threading import Thread, current_thread, Lock
from time import sleep
from random import randint

## A simple example

Let's start by analyzing a simple example: transfers between two "bank accounts":

In [4]:
def move_funds(from_, to_):
    global accounts, kill_switch
    initial_total = accounts[from_] + accounts[to_] 
    name = current_thread().name
    
    for n in range(1_000_000):
        transfer = randint(1, 100)
        accounts[from_] -= transfer
        accounts[to_] += transfer        
        total = accounts[from_] + accounts[to_]
        
        # Exit if balance wrong or if another thread thinks so
        if total != initial_total:
            print(f"{name} inconsistent balance: ${total:,} ({n:,} transactions)")
            kill_switch = True
            break
        elif kill_switch:
            print(f"{name} other thread flagged: ${total:,} ({n:,} transactions)")
            break
    else:
        print(f'{name} reached iteration limit. Stopping...')            

The augmented assigments, `accounts[from_] -= amount` and `accounts[to_] += amount` can potentially introduce race conditions.

In [5]:
kill_switch = False
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds, args=('acc1', 'acc2'))
t2 = Thread(target=move_funds, args=('acc2', 'acc1'))

t1.start()
t2.start()
t1.join()
t2.join()

print("Balances:", accounts)

Thread-5 (move_funds) reached iteration limit. Stopping...
Thread-6 (move_funds) reached iteration limit. Stopping...
Balances: {'acc1': 8188, 'acc2': 191812}


## Adding Locks

In the last lesson, you learned about locks. We can use those to try synchronizing access to the accounts. We'll create 2 locks, one for each account:

In [6]:
def move_funds2(from_, lock_from, to_, lock_to):
    initial_total = accounts[from_] + accounts[to_]
    name = current_thread().name
    
    for n in range(1_000_000):
        amount = randint(1, 100)
        with lock_from, lock_to:
            accounts[from_] -= amount
            accounts[to_] += amount

            total = accounts[from_] + accounts[to_]
            if total != initial_total:
                print(f"{name} inconsistent balance: ${total:,} ({n:,} transactions)")
                break
    else:
        print(f'{name} reached iteration limit. Stopping...')

In [7]:
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds2, args=('acc1', lock_acc1, 'acc2', lock_acc2))
t2 = Thread(target=move_funds2, args=('acc2', lock_acc1, 'acc1', lock_acc2))

t1.start()
t2.start()

while t1.is_alive() or t2.is_alive():
    print("t1 alive?", t1.is_alive(), "| t2 alive?", t2.is_alive(), "|", accounts)
    sleep(3)

print("acc1 locked?", lock_acc1.locked(), "| acc2 locked?", lock_acc2.locked())

t1 alive? True | t2 alive? True | {'acc1': -136853, 'acc2': 336853}
t1 alive? True | t2 alive? True | {'acc1': -746345, 'acc2': 946345}
t1 alive? True | t2 alive? True | {'acc1': 646309, 'acc2': -446309}
t1 alive? True | t2 alive? True | {'acc1': -236987, 'acc2': 436987}
Thread-7 (move_funds2) reached iteration limit. Stopping...Thread-8 (move_funds2) reached iteration limit. Stopping...

acc1 locked? False | acc2 locked? False


It worked (this time)! Access to the accounts is protected by the locks. 

But there is a danger lurking here. We succeeded more or less accidentally because of the order in which locks were acquired.  In fact, the initial code is conceptually wrong because it used `lock_acc1` as the protection of `acc2`, for example.  

If we make a small change, altering the order of the locks that are passed to our threads, we will find ourselves deadlocked:

In [8]:
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds2, args=('acc1', lock_acc1, 'acc2', lock_acc2))
t2 = Thread(target=move_funds2, args=('acc2', lock_acc2, 'acc1', lock_acc1))
print("Threads created:", accounts)
t1.start()
t2.start()
print("Threads started:", accounts)

for _ in range(8):
    print("{t1.name} alive?", t1.is_alive(), "| {t2.name} alive?", t2.is_alive(), "|", accounts)
    sleep(3)

Threads created: {'acc1': 100000, 'acc2': 100000}
Threads started: {'acc1': -648918, 'acc2': 848918}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': -648918, 'acc2': 848918}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': -648918, 'acc2': 848918}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': -648918, 'acc2': 848918}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': -648918, 'acc2': 848918}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': -648918, 'acc2': 848918}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': -648918, 'acc2': 848918}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': -648918, 'acc2': 848918}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': -648918, 'acc2': 848918}


Both locks remain locked.

In [9]:
lock_acc1.locked(), lock_acc2.locked()

(True, True)

These threads will never die, and the locks will never be released.  We can do something a bit hack-ish to repeatedly release the locks until both threads crash and end.  This would almost never be a good idea in a production program, but we do it just for teaching.

In [15]:
nerr, err_info = 0, None
while t1.is_alive() or t2.is_alive():
    try:
        lock_acc1.release(), lock_acc2.release()
    except Exception as err:
        sleep(1e-9)
        err_info = repr(err)
        nerr += 1
        
print(f"{nerr:,} errors of type {err_info}")
# Status of threads and locks
t1.is_alive(), lock_acc1.locked(), t2.is_alive(), lock_acc2.locked()

Exception in thread Exception in thread Thread-18:
Traceback (most recent call last):
  File "/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Thread-19:
Traceback (most recent call last):
  File "/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
  File "/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
self.run()
  File "/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)  File "<ipython-input-4-4b94722f1175>", line 14, in move_funds2

  File "<ipython-input-4-4b94722f1175>", line 14, in move_funds2
RuntimeErrorRuntimeError: release unlocked lock
: release unlocked lock


25 errors of type RuntimeError('release unlocked lock')


(False, False, False, False)

## What are Deadlocks?

A deadlock occurs when multiple  threads (or processes, or nodes in a cluster) are mutually suspended while waiting for each other to signal processing may continue; none can get there because of the circular dependency though.  As soon as there are at least two resources that might be needed, deadlocks are a danger.


| Step | Status       | Thread-1            | Thread-2            | Thread-3 
|:-----|:-------------|:--------------------|:--------------------|:--------------------
| 1    | Success      | Acquire *lock_A*    | Acquire *lock_B*    | Acquire *lock_C*        
| 2    | **Blocked**  | **Wait for lock_C** | **Wait for lock_A** | **Wait for lock_B**
| 3    | Cannot Reach | Wait for lock_B     | Wait for lock_C     | Wait for lock_A
| 4    | Cannot Reach | Work with A/B/C     | Work with A/B/C     | Work with A/B/C
| 5    | Cannot Reach | Release *lock_B*    | Release *lock_C*    | Release *lock_A*
| 6    | Cannot Reach | Release *lock_C*    | Release *lock_A*    | Release *lock_B*
| 7    | Cannot Reach | Release *lock_A*    | Release *lock_B*    | Release *lock_C*

In *Operating Systems*, Avi Silberschatz describes a non-computing example:

> Perhaps the best illustration of a deadlock can be drawn from a law passed by the Kansas legislature early in the 20th century. It said, in part: “When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.”

### How to prevent deadlocks

The unfortunate truth is that it is **very hard** to prevent deadlocks. 

One simple technique is to always use timeouts when trying to acquire locks. If you are trying to acquire N shared locks, if you can't acquire all N, you can release them all and start over. We can refactor the funds transfer this way.

In [9]:
def move_funds3(from_, lock_from, to_, lock_to):
    name, T = current_thread().name, LOCK_TIMEOUT
    all_locks = [lock_from, lock_to]
    
    for n in range(10_000):
        amount = randint(1, 100)
        
        # Acquire all locks, if failure, release and keep trying
        while not all(locks_good := [l.acquire(timeout=T) for l in all_locks]):
            for i, acquired in enumerate(locks_good):
                if acquired:
                    all_locks[i].release()
        
        # Perform the action on locked resources
        # ...omit validation on 'accounts' in earlier versions
        accounts[from_] -= amount
        accounts[to_] += amount
        
        # Release all locks
        for lock in all_locks:
            lock.release()

    print(f'{name} reached iteration limit. Stopping...')

In [10]:
LOCK_TIMEOUT = .001
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds3, args=('acc1', lock_acc1, 'acc2', lock_acc2))
t2 = Thread(target=move_funds3, args=('acc2', lock_acc2, 'acc1', lock_acc1))

t1.start()
t2.start()

while t1.is_alive() or t2.is_alive():
    print("t1 alive?", t1.is_alive(), "| t2 alive?", t2.is_alive(), "|", accounts)
    sleep(3)

print("acc1 locked?", lock_acc1.locked(), "| acc2 locked?", lock_acc2.locked())
print(f"Validation: {sum(accounts.values()):,} == 200,000")

t1 alive? True | t2 alive? True | {'acc1': -163645, 'acc2': 363645}
t1 alive? True | t2 alive? True | {'acc1': -162924, 'acc2': 362924}
t1 alive? True | t2 alive? True | {'acc1': -155605, 'acc2': 355605}
t1 alive? True | t2 alive? True | {'acc1': -140102, 'acc2': 340102}
Thread-10 reached iteration limit. Stopping...
Thread-11 reached iteration limit. Stopping...
acc1 locked? False | acc2 locked? False
Validation: 200,000 == 200,000


## Thread Synchronization Summary

Other synchronization mechanisms outside this course include `Semaphores`, `Conditions`, `Events`, `Barriers`, etc. These follow the same broad principles as locks, but vary in specifics.

An unfortunate but necessary takeaway from this lesson is: **synchronization is HARD, and error/bug prone**. Even the most experience developers create bugs when writing synchronized code..

However, synchronization is a necessary evil as well. Race conditions corrupt data, and many problems demand concurrency. In the next lessons we look at additional approaches to concurrent code that can ease the pitfalls of synchronization.



# Exercise
## Description

In this exercise we continue with a similar setup to the other exercises.  We generate 50 files on disk, each of them containing 20 Natural Numbers below 100, one per line.  In a similar manner to the prior exercise, our program wishes to operate with 20 threads, each of them dealing with one line number of each corresponding file.

For this task, each thread, named 'Line-N' (where N is the number of the line it handles), will read a series of instructions that consist of:

1. Read the content of three specified files, 'A', 'B', and 'C'.
2. Shuffle the values on line N between the files in a "clockwise" fashion.
3. Write each file to disk with adjusted content.

For example, if line 17 of the files initially contain A=23, B=14, and C=99 , after one operation, the several line 17s will become A=99, B=23, and C=14.

As with other examples, for these small files on a fast disk, a serial approach remains faster.  But as the problem scales to larger files with slower access, threads would begin to win.

There are twin dangers here.  If you simply write to the same file from different threads, without locks, you will most likely encounter a race condition where one thread's shuffle overwrites a file that should have a different line shuffled in another thread.  On the other hand, if every file is locked before use, different threads may try to acquire competing resources in a circular manner, causing deadlock.

The `oplist` variable contains entries like below, describing an action.  Each thread should only act on those actions addressed to it and disregard all others (as mentioned, a serial approach would achieve the functional purpose, but is not what this exercise is for).  This poses no danger because reading through `oplist` in each thread is **read-only** and no race condition can hence occur. Thinking about the problem will let you realize that shuffles must be performed in the exact listed order, per line, to obtain the same final state.  Operation descriptions look something like this:

```python
[...,
 ['Line-19', 'tmp-Abume.numbers', 'tmp-DTfsx.numbers', 'tmp-jXmRn.numbers'],
 ['Line-2', 'tmp-TiyKw.numbers', 'tmp-QwKin.numbers', 'tmp-pFocs.numbers'],
 ['Line-13', 'tmp-DTSWm.numbers', 'tmp-yJmoQ.numbers', 'tmp-DTfsx.numbers'],
 ...
]
```

Create a suitable `shuffle()` function that will neither deadlock nor create a race condition.

## Setup

In [10]:
from threading import Thread, Lock
from pathlib import Path

from generate import operations
names, oplist = operations('lesson-4')

# Create lock for each file
locks = {name: Lock() for name in names}

def shuffle(lineno):
    # This version will DEADLOCK!
    line_name = f"Line-{lineno}"
    for opnum, op in enumerate(oplist):
        if op[0] == line_name:
            A, B, C = op[1:]
            with locks[A], locks[B], locks[C]:
                linesA = Path(A).read_text().split('\n')
                linesB = Path(B).read_text().split('\n')
                linesC = Path(C).read_text().split('\n')
                # 1-based line numbers, 0-based list
                linesA[lineno-1] = linesC[lineno-1]
                linesB[lineno-1] = linesA[lineno-1]
                linesC[lineno-1] = linesB[lineno-1]
                # Write the shuffled data back
                Path(A).write_text('\n'.join(linesA))
                Path(B).write_text('\n'.join(linesB))
                Path(C).write_text('\n'.join(linesC))

# After running all threads, `results_list` should contain right answers
threads = [Thread(target=shuffle, args=(i,), name=f"Line-{i}") for i in range(1, 21)]

# Should be able to run these to get correct modifications
# [t.start() for t in threads]
# [t.join() for t in threads]

## Solution

In [11]:
# Note, the solution presented here is the SIMPLEST refactoring
# For extra credit consider other approaches!
#  * For example, one thread could handle I/O while others shuffle
#  * Or, first pull off all "line N" values, and operate each
#       shuffle thread on independent data (assemble at end)
#  * Or ....

def shuffle(lineno):
    line_name = f"Line-{lineno}"
    for opnum, op in enumerate(oplist):
        if op[0] == line_name:
            A, B, C = op[1:]
            all_locks = (locks[A], locks[B], locks[C])
            
            # Keep trying to get all locks
            while not all(locks_good := [l.acquire(timeout=0.01) for l in all_locks]):
                for i, acquired in enumerate(locks_good):
                    if acquired:
                        all_locks[i].release()
            
            linesA = Path(A).read_text().split('\n')
            linesB = Path(B).read_text().split('\n')
            linesC = Path(C).read_text().split('\n')
            # 1-based line numbers, 0-based list
            linesA[lineno-1] = linesC[lineno-1]
            linesB[lineno-1] = linesA[lineno-1]
            linesC[lineno-1] = linesB[lineno-1]
            # Write the shuffled data back
            Path(A).write_text('\n'.join(linesA))
            Path(B).write_text('\n'.join(linesB))
            Path(C).write_text('\n'.join(linesC))
            
            # Release all locks
            for lock in all_locks:
                lock.release()

## Test Cases

In [12]:
def test_thread_count():
    assert len(threads) >= 20
    
test_thread_count()

### NOTE 
If you are running this code in windows, the below function will fail since the ALARM signal called internally by timeout is not implemented in windows.

In [13]:
def test_deadlock():
    from timeout import timeout
    
    # Setup original state of files
    names, oplist = operations('lesson-4')
    # Create fresh threads
    threads = [Thread(target=shuffle, args=(i,), name=f"Line-{i}") for i in range(1, 21)]
    
    # Try to process all operations
    with timeout:
        try:
            [t.start() for t in threads]
            [t.join() for t in threads]
        except TimeoutError as err:
            assert False, "Timeout indicates probable deadlock"
        
test_deadlock()

AttributeError: module 'signal' has no attribute 'SIGALRM'

In [14]:
def test_modifications():
    # DO NOT run this test until test_deadlock() passes!
    from hashlib import md5
    from pathlib import Path
    # Setup original state of files
    names, oplist = operations('lesson-4')
    
    # Should NEVER fail since it only verifies utility func
    cat = ''.join(Path(name).read_text() for name in names)
    hash_orig = md5(cat.encode()).hexdigest()
    assert hash_orig == '32f11af64b0391f624d954b2988695c6', f"Wrong MD5sum {hash_orig}"

    # Create fresh threads
    threads = [Thread(target=shuffle, args=(i,)) for i in range(1, 21)]
    [t.start() for t in threads]
    [t.join() for t in threads]
    cat = ''.join(Path(name).read_text() for name in names)
    hash_mod = md5(cat.encode()).hexdigest()
    assert hash_mod == 'f1fc8076c8905c27a196fb41d8454b2a', f"Wrong MD5sum {hash_mod}"  
    
test_modifications()

-------------
Materials licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by the authors