# Description

Continuing the theme of the last two lesson exercises, we again work with a collection of 1000 files, each of which contain 20 integers, one per line.  As in the earlier exercise, the threaded style on these small files is unlikely to prove faster than a purely serial approach (but with larger files or slower reads, the balance could change).

You should operate on all the data using 20 threads.  The calculation we make in this exercise is a bit different than prior ones.  Considering the data files in strictly alphabetical order, we want to perform a thousand operations; but this will be 4 basic operations, repeated on a cummulative result after each 5 numbers (i.e. 250 times for the exercise).

The sequence will be add, multiply, exponentiate, then modulo.  For example, for each file listed in the header

| Line# | AA | BB | (+)    | CC | (\*)   | DD  | (\*\*)      | EE | (%)    | FF  | (+)    | GG  | (\*)
|-------|----|----|--------|----|--------|-----|-------------|----|--------|-----|--------|-----|------
|  _1_  |  7 |  8 | **15** |  5 | **45** |  4  | **4100625** | 13 | **9**  | 3   | **12** | ... | ...
|  _2_  |  2 |  2 | **4**  |  3 | **12** |  2  | **144**     | 99 | **45** | 5   | **50** | ... | ...
|  _3_  | ...| ...| ...    | ...| ...    | ... | ...         | ...| ...    | ... | ...    | ... | ...

Notice that since the first repetition of 4 binary operations takes 5 numbers, but each subsequent repetition works on that accumlated result, we do not wish to perform the final modulo operation for which we do not have a modulus.

Each of the 20 threads should do this calculation for the 1000 numbers taken from corresponding lines of each file.  The results need to be stored in a global list whose order correctly corresponds to the line numbers of each calculation.  The trick, obviously, is that you do not want multiple threads to cause a race condition on that global list, and yet you also want the calculations to happen concurrently.

# Setup



In [2]:
from threading import Thread, Lock
from generate import create_files

create_files('lesson-3')

results_list = []

# Use additional shared state as needed
other_global_var = ...

def calculate():
    results_list.append(4100625)

# After running all threads, `results_list` should contain right answers
threads = [Thread(target=calculate) for _ in range(20)]

# Solution

In [3]:
from glob import glob
from time import sleep

locks = [Lock() for _ in range(20)]
results_list = []

def file_reader():
    global results_list
    results_list = [None for _ in range(20)]
    
    # Lock all the data lines until we read numbers
    [lock.acquire() for lock in locks]
    numbers = [open(nums).readlines() 
               for nums in sorted(glob('tmp-*.numbers'))]
    for i in range(20):
        line_across = [int(seq[i]) for seq in numbers]
        # Get the ith element of each sequence in line_across
        # Put in results_list, then release lock on that offset
        results_list[i] = line_across
        locks[i].release() 
        
def calculate(i):
    # Perform calculation on ith line of data
    # Block until that ith line is released by file_reader
    sleep(0.001)  # Be EXTRA sure file_reader gets lock first
    with locks[i]:
        # Here's a trick: the final modulus is something larger
        # than largest possible value accumulated
        largest = ((99 + 99) * 99) ** 99
        my_data = results_list[i] + [largest]
        # After calculation put single integer in results_list[i]
        assert isinstance(my_data, list)
        # Start accumulator with first number
        accum = my_data[0]
        # Get each chunk of 4 new values
        for j in range(1, len(my_data), 4):
            b, c, d, e = my_data[j:j+4]
            accum = (((accum + b) * c) ** d) % e
        results_list[i] = accum
        
threads = [Thread(target=file_reader)]
for i in range(20):
    threads.append(Thread(target=calculate, args=(i,)))

# Test Cases

In [4]:
def test_thread_count():
    assert len(threads) >= 20
    
test_thread_count()

In [5]:
def test_threads():
    [t.start() for t in threads]
    [t.join() for t in threads]
    correct = [int(n) for n in open('answers.txt').readlines()]
    for i in range(20):
        assert results_list[i] == correct[i], f"Mismatch on line {i+1}"
        
test_threads()