Consider a list of integers:

```
[2, 3, 9, 0, 0, 1]
```

Suppose we are interested in finding the sum of the elements in this list. A sequential algorithm would step through each element of the list, and keep track of the sum $S_i$ after $i$ elements have been processed. Initially, $S_0 = 0$, and when the next element is encountered, we add it to $S_0$ to get $S_1$, and finally $S_n$ will be returned, where $n$ is the length of the input list. In total, this algorithm would take $S_{n + 1}$ steps to execute.

A parallel algorithm does not seem hard to imagine. We can split the input into pieces, each of length $k$, and assig each piece to a separate thread. Each thread can sequentially sum its piece, giving us a list of numbers $n/k$ long. We can this list into pieces of length $k$, and then repeat the process until we have only one element. For example, the above sum list could be parallelized by splitting it into pieces of length $2$:

```

 [ 2, 3, 3, 0, 0, 1 ]
   │  │  │  │  │  │
   └─┬┘  └─┬┘  └─┬┘
     │     │     │
     5     3     1
     │     │     │
     └──┬──┘     │
        │        │
        8        1
        │        │
        └────┬───┘
             │
             9

```

The parallel solution would execute in $O(\textrm{log}_k(n))$ steps, which is an improvement over the $O(n)$ performance of the sequential algorithm. 

Now, suppose that we were interested in producing not only the total, but a *prefix sum*. Given a list of $n$ numbers, produce a list of $n$ numbers, so that the $i$th element of the list is sum of elements with index, $0, ..., i$ from the input list. The last element of such a list would be the the total. The sequential solution is easy: simply push each $S_i$ into a list, and return this, instead of only $S_{n}$. It requires a little more thought to parallelize the algorithm, however, because now there is some state dependency. If $L_n$ is the result list, then clearly $L_i$ depends upon $L_{i - 1}$, and it would seem then that the problem must be executed sequentially, as we must compute $L_{i - 1}$ to get $L_i$. However, a little more thought can convince us that we could still break the problem down into pieces, and combine the result of each piece into the final solution. Let $;$ represent the "compose" operation, which combines the two prefix sum lists. The operation is very much like list or string concatenation, and the following cases are relevant:

```
0) (empty list composed with some other list) 
   [] ; [x_0, x_1, ..., x_n] 
== [x_0, x_1, ..., x_n]

   [x_0, x_1, ..., x_n] ; []
== [x_0, x_1, ..., x_n]
        
1) (two non-empty lists combined) 
   [x_0, x_1, ..., x_m] ; [y_0, y_1, ..., y_n] 
== [x_0, x_1, ..., x_m, y_0 + x_m, y_1 + x_m, ..., y_n + x_m]
        
2) ((non-)importance of order of combination)
   [x_0,  ..., x_l] ; ([y_0, ..., y_m] ; [z_0, ..., z_n])
== [x_0, ..., x_l] ; [y_0, ..., y_m, z_0 + y_m, ..., z_n + y_m]
== [x_0, ..., x_l, y_0 + x_l, ..., y_m + x_l, z_0 + y_m + x_l, z_1 + y_m + x_l, ..., z_n + y_m + x_l]
== [x_0, ..., x_l, y_0 + x_l, ..., y_m + x_l] ; [z_0, ..., z_n]
== ([x_0, ..., x_l] ; [y_0, ..., y_m]) ; [z_0, ..., z_n]
```

Note that the `;` operation is not **symmetric** (also known as **commutative**); that is `x ; y != y ; x`, unlike addition where `x + y = y + x`. With addition, `0 + x = x`, so $0$ is considered an **identity**, as operating with it always returns the other number back; similarly $;$ has as "identity" an element `[]`. Finally, like addition, $;$ is **associative**: the order of operations does not matter, so $x ; (y ; z) = (x ; y) ; z$. Now, with the $;$ operation in hand, paralellizing prefix sum is just as easy, conceptually, as parallelizing total sum. 

There is another problem, which is easy to manage sequentially, but is tricky to parallelize. Consider a string of parentheses. For example:

```
()()
```

A parentheses string is balanced, if each `(` has a matching `)`; all parentheses strings considered in this document will be balanced, so we shall omit the qualifier "balanced" and write only "parentheses string". The empty parentheses string is also a balanced string, and we denote it `x`. 

The parentheses matching problem requires resolving the nested (tree-like) structure conveyed by a parentheses string. Each open bracket has an enclosing `(`, and each `)` has a matching `(`; we want to find the indices of these enclosing/matching parentheses within the string. Here are some example parentheses strings, their tree structures (`x` marks an empty string, and it is also the default parent of the first `(` in the string), and the output produced by a program which resolves the tree structure (`-1` is the index of the root):

```
1) string:  <empty string>

   indices:     
   tree:    x 
   output: 

2) string:  ()
                (   )   
   indices:     0   1   
   tree:    x◄──(◄──)
                :   :   
   output:     -1   0   


2) string:  (())
                (   )   (   )
   indices:     0   1   2   3
   tree:    x◄──(◄──────────)
                ▲   :   :   :
                └───(◄──)   :
                :   :   :   :
   output:     -1   0   1   2


3) string:  ()()
                (   )   (   )
   indices:     0   1   2   3
   tree:    x◄──(◄──)   :   :
            ▲   :   :   :   :
            └───────────(◄──)
                :   :   :   :
   output:     -1   0   1   2

```

A sequential solution would step through each element in the input string, by using a stack to keep track of unmatched open parantheses as we step through the parentheses string. 

In [98]:
# A stack is like a list: we can push an element on top, and pop the topmost element. Python lists already have pop,
# so let us define the methods push (an alias for append), and last, for convenience.
class Stack(list):
    def push(self, x):
        self.append(x)
        
    def last(self):
        return self[-1]

In [99]:
from copy import copy

def stack_match(inp):
    state = Stack([-1])
    out = []
    states = []
    
    for ix, p in enumerate(inp):
        states.append(copy(state))
        out.append(state.last())
        if p == "(":
            state.push(ix)
        else:
            state.pop()
    
    return out, states

def print_out_states(out, states):
    out_str = f"output: {out}"
    index_str = ""
    k = 0
    for i, c in enumerate(out_str):
        if c == "," or c == "]":
            index_str += f"{k}"
            k += 1
        else:
            index_str += " "
    index_str = index_str[1:]
    print(index_str)
    print(out_str)
    r = "states: [\n"
    for i, state in enumerate(states):
        r += f"       {i}: {state},\n"
    r += "        ]"
    print(r)

In [100]:
out, states = stack_match("(()(()))")
print_out_states(out, states)

          0  1  2  3  4  5  6  7
output: [-1, 0, 1, 0, 3, 4, 3, 0]
states: [
       0: [-1],
       1: [-1, 0],
       2: [-1, 0, 1],
       3: [-1, 0],
       4: [-1, 0, 3],
       5: [-1, 0, 3, 4],
       6: [-1, 0, 3],
       7: [-1, 0],
        ]


Recall that our stack keeps track of unmatched `(` encountered as we traverse the parentheses string, and `(` are parent items in the corresponding tree structure of our parentheses string. So, there is a close connection between the state of the stack, and the output format we chose. The output allows us to reconstruct the state of the stack at any given step: starting from item $i$ in the output, if we follow the parents, we can reconstruct the state of the stack at the `i`th step. For example, follow the parents from `output[5]`, and keep track of the parents encountered so far:
```
out[5] == 4    [4]
out[4] == 3    [3, 4]
out[3] == 0    [0, 3, 4]
out[0] == -1   [-1, 0, 3, 4]
```
The stack at step `5` is precisely `[-1, 0, 3, 4]`. This observation suggests that a useful monoid for constructing a parallel algorithm might involve keeping track of the stack. We can then combine the stack from various steps to generate the output. 

Examining the stack at each step, notice that we can construct the $j$th stack, from the $i$th stack, for any $i < j$, by first performing a number of pops on the $i$th stack, and then a number of pushes. This suggests an element of our monoid needs to keep track of two pieces of information, pops and pushes: `<num_pops, push_list>`. The identity must be `<0, []>`: pop zero elements and push nothing. `<0, [i]>` models encountering `(` at position `i`, while `<1, []>` models encountering `)`. 

Given two stack-transform elements `(a, xs)` and `(b, ys)`, how should they be combined? Suppose our stack is currently `S`. The first element will pop the stack `a` times, and then push `xs`. Now, the second element will pop the stack `b` times. Suppose `b >= len(xs)`, then we would pop all the indices in `xs` from `S`, and perform `(b - len(xs))` additional pops on `S`. In total, `S` would be popped `a + (b - len(xs))` times. Then, we would push on `ys`. So, the net transformation would be `<a + (b - len(xs)), ys>`. If `b < len(xs)`, we effectively only push `xs[:len(xs) - b]` indices onto the stack (i.e. `xs` without the last `b` elements. Then, we push `ys`. So, the net transformation would be `<a, xs[:(len(xs) - b)] + ys>`. Putting the two cases together:
```
<a, xs> ; <b, ys> == <a + b - min(len(xs), b), xs[:max(0, len(xs) - b)] + ys>
``` 

For our stack-transform elements form a monoid under `;` operation, we must verify that `;` is associative. The `min` and `max` operations in our definition of stack-transform composition make it tedious to prove associativity using that definition. Instead, 
observe that the `min`/`max` operations exist to handle the cases where `b < len(xs)` and `b >= len(xs)`. So, if want to prove that `<a, xs> ; <b ; ys> ; <c ; zs>` are associative, we need to handle the following four cases:

```
0) assume: b < len(xs) && c < len(ys)
   (!) b < len(xs) ==> <a, xs> ; <b, ys> == <a, xs[:(len(xs) - b)] + ys>
   (!) c < len(ys) ==> <b, ys> ; <c, zs> == <b, ys[:(len(ys) - c)] + zs>
   therefore:
       (<a, xs> ; <b, ys>) ; <c, zs>
   ==  <a, xs[:(len(xs) - b)] + ys> ; <c, zs>
   ==  <a + c - min(len(xs[:(len(xs) - b)] + ys), c), (xs[:(len(xs) - b)] + ys)[:max(0, len(xs[:(len(xs) - b)] + ys) - c)] + zs>
       { len(us + vs) = len(us) + len(vs) }
   ==  <a + c - min(len(xs[:(len(xs) - b)]) + len(ys), c), (xs[:(len(xs) - b)] + ys)[:max(0, len(xs[:(len(xs) - b)]) + len(ys) - c)] + zs>
       { len(xs[:(len(xs) - b)]) >= 0 && (len(ys) - c) >= 0 }
   ==  <a + c - c, (xs[:(len(xs) - b)] + ys)[:(len(xs[:(len(xs) - b)]) + len(ys) - c)] + zs>
   ==  <a, (xs[:(len(xs) - b)] + ys)[:(len(xs[:(len(xs) - b)]) + len(ys) - c)] + zs>
   ==  <a, xs[:(len(xs) - b)] + ys[:(len(ys) - c)] + zs>
       {by assumption, b < len(xs)}
   ==  <a + b - min(len(xs), b), xs[:max(0, len(xs) - b)] + ys[:(len(ys) - c)] + zs>
   ==  <a, xs> ; <b, ys[:(len(ys) - c)] + zs>
   ==  <a, xs> ; (<b, ys> ; <c, zs>)
   

1) assume: b >= len(xs) && c < len(ys)
   (!) b >= len(xs) ==> <a, xs> ; <b, ys> == <a + b - len(xs), ys>
   (!) c <  len(ys) ==> <b, ys> ; <c, zs> == <b, ys[:(len(ys) - c)] + zs>
   therefore:
       (<a, xs> ; <b, ys>) ; <c, zs>
   ==  <a + b - len(xs), ys> ; <c, zs>
   ==  <a + b - len(xs) - min(len(ys), c), ys[:max(0, len(ys) - c)] + zs>
   ==  <a + b - len(xs) - c, ys[:(len(ys) - c)] + zs>
       {by assumption, min(len(xs), b) == len(xs) && xs[:max(0, len(xs) - b)] == []}
   ==  <a + b - min(len(xs), b), xs[:max(0, len(xs) - b)] + ys[:(len(ys) - c)] + zs>
   ==  <a, xs> ; <b, ys[:(len(ys) - c)] + zs>
   ==  <a, xs> ; (<b, ys> ; <c, zs>)

2a) b < len(xs) && c >= len(ys) && len(xs) - b + len(ys) < c
   (!) b < len(xs) ==> <a, xs> ; <b, ys> == <a, xs[:(len(xs) - b)] + ys>
   (!) c >= len(ys) ==> <b, ys> ; <c, zs> == <b + c - len(ys), zs>
   therefore:
       (<a, xs> ; <b, ys>) ; <c, zs>
   ==  <a, xs[:(len(xs) - b)] + ys> ; <c, zs>
   ==  <a + c - min(len(xs[:(len(xs) - b)] + ys), c), (xs[:(len(xs) - b)] + ys)[:max(0, len(xs[:(len(xs) - b)] + ys) - c)] + zs>
       { len(xs[:(len(xs) - b)]) == len(xs) - b (from definition of slicing), len(us + vs) == len(us) + len(vs) }
   ==  <a + c - min(len(xs) - b + len(ys)), c), (xs[:(len(xs) - b)] + ys)[:max(0, len(xs) - b + len(ys) - c)] + zs>
       { by assumption, min(len(xs) - b + len(ys), c) == len(xs) - b + ys }
   ==  <a + c - len(xs) + b - len(ys), zs>
       { by assumption, min(len(xs), b + c - len(ys)) == len(xs) }
   ==  <a + b + c - len(ys) - min(len(xs), b + c - len(ys)), xs[:max(0, len(xs) - b - c + len(ys)] + zs>
   ==  <a, xs> ; <b + c - len(ys), zs>
   ==  <a, xs> ; (<b, ys> ; <c, zs>)
   
2b) b < len(xs) && c >= len(ys) && len(xs) - b + len(ys) >= c
   (!) b < len(xs) ==> <a, xs> ; <b, ys> == <a, xs[:(len(xs) - b)] + ys>
   (!) c >= len(ys) ==> <b, ys> ; <c, zs> == <b + c - len(ys), zs>
   therefore:
       (<a, xs> ; <b, ys>) ; <c, zs>
   ==  <a, xs[:(len(xs) - b)] + ys> ; <c, zs>
   ==  <a + c - min(len(xs[:(len(xs) - b)] + ys), c), (xs[:(len(xs) - b)] + ys)[:max(0, len(xs[:(len(xs) - b)] + ys) - c)] + zs>
       { len(xs[:(len(xs) - b)]) == len(xs) - b (from definition of slicing), len(us + vs) == len(us) + len(vs) }
   ==  <a + c - min(len(xs) - b + len(ys)), c), (xs[:(len(xs) - b)] + ys)[:max(0, len(xs) - b + len(ys) - c)] + zs>
       { by assumption, min(len(xs) - b + len(ys), c) == c }
   ==  pending
       { by assumption, min(len(xs), b + c - len(ys)) == len(xs) }
   ==  <a + b + c - len(ys) - min(len(xs), b + c - len(ys)), xs[:max(0, len(xs) - b - c + len(ys)] + zs>
   ==  <a, xs> ; <b + c - len(ys), zs>
   ==  <a, xs> ; (<b, ys> ; <c, zs>)
   
3) b >= len(xs) && c >= len(ys)
    (pending)
```

Having convinced ourselves that there is some merit in this monoid, we can now examine how we might combine stack-transforms and derive from them, eventually, the total output.

In [101]:
class StackTransform:
    def __init__(self, n_pops, pushes):
        self.n_pops = n_pops
        self.pushes = pushes
        self.n_pushes = len(pushes)
        
    def combine(self, rhs):
        new_pops = self.n_pops + rhs.n_pops - min(self.n_pushes, rhs.n_pops)
        new_pushes = self.pushes[:max(0, self.n_pushes - rhs.n_pops)] + rhs.pushes
        return StackMonoid(new_pops, new_pushes)
    
    def __str__(self):
        return f'({self.n_pops}, {self.pushes})'
    
    def __repr__(self):
        return self.__str__()

Let us consider an example: `(())`. Applying the sequential stack algorithm:

In [102]:
out, states = stack_match("(())")
print_out_states(out, states)

          0  1  2  3
output: [-1, 0, 1, 0]
states: [
       0: [-1],
       1: [-1, 0],
       2: [-1, 0, 1],
       3: [-1, 0],
        ]


Our parentheses string will map to a list of stack-transform elements: `[(0, [0]), (0, [1]), (1, []), (1[])]`. Our monoid is associative

In [103]:

def map_input(inp):
    r = []
    for ix, p in enumerate(inp):
        if p == '(':
            r.append(StackMonoid(0, [ix]))
        else:
            r.append(StackMonoid(1, []))
    return r
    
inp = "()()"
expected_out = [-1, 0, 1, 0]
inp = map_input(inp)

current = [StackMonoid(0, [])] + inp
states = [copy(current)]
while len(current) > 1:
    current = [current[ix].combine(current[ix + 1]) if ix + 1 < len(current) else StackMonoid(0, []).combine(current[ix]) for ix in range(0, len(current), 2) ]
    states.append(copy(current))
    
print_states(states)
print(output)

TypeError: print_states() missing 1 required positional argument: 'states'