Consider a list of integers:

```
[2, 3, 9, 0, 0, 1]
```

Suppose we are interested in finding the sum of the elements in this list. A sequential algorithm would step through each element of the list, and keep track of the sum $S_i$ after $i$ elements have been processed. Initially, $S_0 = 0$, and when the next element is encountered, we add it to $S_0$ to get $S_1$, and finally $S_n$ will be returned, where $n$ is the length of the input list. In total, this algorithm would take $S_{n + 1}$ steps to execute.

A parallel algorithm does not seem hard to imagine. We can split the input into pieces, each of length $k$, and assig each piece to a separate thread. Each thread can sequentially sum its piece, giving us a list of numbers $n/k$ long. We can this list into pieces of length $k$, and then repeat the process until we have only one element. For example, the above sum list could be parallelized by splitting it into pieces of length $2$:

```

 [ 2, 3, 3, 0, 0, 1 ]
   │  │  │  │  │  │
   └─┬┘  └─┬┘  └─┬┘
     │     │     │
     5     3     1
     │     │     │
     └──┬──┘     │
        │        │
        8        1
        │        │
        └────┬───┘
             │
             9

```

The parallel solution would execute in $\textrm{log}_k(n)$ steps, which is an improvement over the sequential algorithm. 

Now, suppose that we were interested in producing not only the total, but a *prefix sum*. Given a list of $n$ numbers, produce a list of $n$ numbers, so that the $i$th element of the list is sum of elements with index, $0, ..., i$ from the input list. The last element of such a list would be the the total. The sequential solution is easy: simply push each $S_i$ into a list, and return this, instead of only $S_{n}$. It requires a little more thought to parallelize the algorithm, however, because now there is some state dependency. If $L_n$ is the result list, then clearly $L_i$ depends upon $L_{i - 1}$, and it would seem then that the problem must be executed sequentially, as we must compute $L_{i - 1}$ to get $L_i$. However, a little more thought can convince us that we could still break the problem down into pieces, and combine the result of each piece into the final solution. Let $;$ represent the "compose" operation, which combines the two prefix sum lists. The operation is very much like list or string concatenation, and the following cases are relevant:

```
0) (empty list composed with some other list) 
  [] ; [x_0, x_1, ..., x_n] 
= [x_0, x_1, ..., x_n]
	
  [x_0, x_1, ..., x_n] ; []
= [x_0, x_1, ..., x_n]
        
1) (two non-empty lists combined) 
  [x_0, x_1, ..., x_m] ; [y_0, y_1, ..., y_n] 
= [x_0, x_1, ..., x_m, y_0 + x_m, y_1 + x_m, ..., y_n + x_m]
        
2) ((non-)importance of order of combination)
  [x_0,  ..., x_l] ; ([y_0, ..., y_m] ; [z_0, ..., z_n])
= [x_0, ..., x_l] ; [y_0, ..., y_m, z_0 + y_m, ..., z_n + y_m]
= [x_0, ..., x_l, y_0 + x_l, ..., y_m + x_l, z_0 + y_m + x_l, z_1 + y_m + x_l, ..., z_n + y_m + x_l]
= [x_0, ..., x_l, y_0 + x_l, ..., y_m + x_l] ; [z_0, ..., z_n]
= ([x_0, ..., x_l] ; [y_0, ..., y_m]) ; [z_0, ..., z_n]
```

Note that the `;` operation is not **symmetric** (also known as **commutative**); that is `x ; y != y ; x`, unlike addition where `x + y = y + x`. With addition, `0 + x = x`, so $0$ is considered an **identity**, as operating with it always returns the other number back; similarly $;$ has as "identity" an element `[]`. Finally, like addition, $;$ is **associative**: the order of operations does not matter, so $x ; (y ; z) = (x ; y) ; z$. Now, with the $;$ operation in hand, paralellizing prefix sum is just as easy, conceptually, as parallelizing total sum. 

There is a closely related problem that is instructive. Consider a string of parentheses. For example:

```
()()
```

A parentheses string is balanced, if each `(` has a matching `)`; all parentheses strings considered in this document will be balanced, so we shall omit the qualifier in "balanced parentheses string" and write "parentheses string". 
The parentheses matching problem requires finding, for each parenthesis in a parentheses string, the index of the corresponding matching parenthesis. For example:

```
index:   0  1  2  3
input:   (  )  (  )
output:  1  0  3  2
```

A sequential solution would step through each element in the input string, and keeping track of unmatched open parantheses, and matching any newly encountered close parentheses with the last encountered open parentheses. Typically, this algorithm would use a stack. 

In [156]:
# A stack is like a list: we can push an element on top, and pop the topmost element. Python lists already have pop,
# so let us just add push, and last, for convenience.
class Stack(list):
    def push(self, x):
        self.append(x)
        
    def last(self):
        return self[-1]

In [157]:
from copy import deepcopy

def stack_match(inp):
    state = Stack([])
    out = [-1 for ix in range(len(inp))]
    states = [deepcopy(state)]
    
    for ix, p in enumerate(inp):
        if p == "(":
            state.push(ix)
        else:
            match = state.last()
            out[ix] = match
            out[match] = ix
            state.pop()
        states.append(deepcopy(state))
    
    return out, states

def print_states(states):
    r = "[\n"
    for state in states:
        r += f" {state},\n"
    r += "]"
    print(r)

In [158]:
out, states = stack_match("(()(()))")
print(out)
print_states(states)

[7, 2, 1, 6, 5, 4, 3, 0]
[
 [],
 [0],
 [0, 1],
 [0],
 [0, 3],
 [0, 3, 4],
 [0, 3],
 [0],
 [],
]


This stack based sequential algorithm does not seem so easy to parallelize, but again, let us try to use the trick of finding a suitable monoid. First, note that in the prefix sum problem, the $i$th element of the result only depends upon preceding elements, but in the matching problem we have described so far, the $i$th element could refer to a succeeding element. When a `(` is encountered, we record the index of the matching `)`, which necessarily appears later. When a `)` is encountered, what we only need to record something that we encountered previously. Note that the current solution has some "symmetry" in it, if we have a `(` at position `i`, such that $result[i] = i + k$ (i.e. the matching `)` is at position `i + k`), then $result[i + k] = i$. If we only recorded matches to `)`, then we need not record matches to `(`, because it is easy to get that information, since it is mirrored. This eliminates the problem of some elements of the result being dependent upon information that comes after that particular index, but it would still be nice to store something for `(` brackets. Looking at the state of our stack per step above, we see that each open bracket in the stack is preceded by open bracket that immediately encloses it (i.e. is one level up in the nesting hierarchy suggested by the parentheses). In the above example, the open parenthesis at `1` is enclosed immediately by the open parenthesis at `0`, `3` is immediately enclosed by `0` as well, and `4` is immediately enclosed by `3`. Let us define `-1` to be the open parenthesis that immediately encloses `0`.  

In [159]:
def stack_match2(inp):
    state = Stack([-1])
    # we no longer need to pre-initialize the output, we may simply build it up
    # as we proceed through the list, since we eliminated successor reference
    out = []
    states = [deepcopy(state)]
    
    for ix, p in enumerate(inp):
        out.append(state.last())
        if p == "(":
            state.push(ix)
        else:
            state.pop()
        states.append(deepcopy(state))
    
    return out, states

In [160]:
out, states = stack_match2("(()(()))")
print(outs[-1])
print_states(states)

[-1, 0, 1, 0, 3, 4, 3, 0]
[
 [-1],            [],
 [-1, 0],         [-1],
 [-1, 0, 1],      [-1, 0],
 [-1, 0],         [-1, 0, 1],
 [-1, 0, 3],      [-1, 0, 1, 0],
 [-1, 0, 3, 4],   [-1, 0, 1, 0, 3],
 [-1, 0, 3],      [-1, 0, 1, 0, 3, 4],
 [-1, 0],         [-1, 0, 1, 0, 3, 4, 3],
 [-1],            [-1, 0, 1, 0, 3, 4, 3, 0],
]


Above, we see printed the stack at each step on the left, and the corresponding state of the output on the right. Notice that for any `stack[i]`, and for some `j > i`, we have that `stack[j]` can be obtained from `stack[i]` by first performing a number of pops, and then a number of pushes. This seems like a promising lead to follow, to design a relevant monoid. We need to keep track of two pieces of information, pops and pushes: `(num_pops, push_list)`. 

The identity must be `(0, [])`: pop nothing, and push nothing. `(0, [i])` models encountering `(` at position `i`. `(1, [])` models encountering `)`. 

In [None]:
import math

# log base 2
def log(x):
    return math.log(x, 2)

def par_match(inp):
    k = round(log(inp))
    xs = []
    out = []
    
    for ix, p in enumerate(inp):
        if p == '(':
            xs.append((0, [ix]))
        if p == ')':
            xs.append((1, []))
            
    # unfinished
        