## Exercise 1.11

*The recursive doubling algorithm for summing the elements of an array is:*

```
for (s=2; s<2*n; s*=2)
    for (i=0; i<n-s/2; i+=s)
        x[i] += x[i+s/2]
```
*Analyze bank conflicts for this algorithm. Assume $n=2^p$ and banks have $2^k$ elements where $k<p$.*

Below is an implementation of the recursive doubling algorithm in python.

In [47]:
def recursive_doubling(p, k):

    # Initializing parameters
    s = 2
    n = 2 ** p
    bank_conflicts = 0
    bank_elements = 2 ** k

    # Runs the algorithm with the above paramters, skipping the actual calculations of array elements
    while s < 2 * n:
        for i in range(0, n - s // 2, s):

            # Checks if the two elements of the array are being accessed from the same bank
            if (i % bank_elements) == (i + s/2 % bank_elements):
                bank_conflicts += 1

        s *= 2

    print(f"Bank conflicts (p = {p}, k = {k}):", bank_conflicts)

recursive_doubling(20, 10)
recursive_doubling(15, 10)
recursive_doubling(15, 5)
recursive_doubling(22, 8)

Bank conflicts (p = 20, k = 10): 10
Bank conflicts (p = 15, k = 10): 5
Bank conflicts (p = 15, k = 5): 10
Bank conflicts (p = 22, k = 8): 14


Varying the values of $p$ and $k$ shows that the number of bank conflicts with this algorithm is simply $p-k$.

In [48]:
import numpy as np

def parallel_doubling(p, k):

    # Initializing parameters
    s = 2
    n = 2 ** p
    bank_elements = 2 ** k
    accessed_indices = np.zeros(bank_elements)

    # Runs the algorithm with the above paramters, skipping the actual calculations of array elements
    while s < 2 * n:
        for i in range(0, n - s // 2, s):
            accessed_indices[i % bank_elements] += 1
            accessed_indices[(i + s // 2) % bank_elements] += 1

        s *= 2

    print(f"Bank conflicts (p = {p}, k = {k}):", np.sum(accessed_indices), f"(Min: {np.min(accessed_indices)})")

parallel_doubling(20, 10)
parallel_doubling(15, 10)
parallel_doubling(15, 5)
parallel_doubling(22, 8)

Bank conflicts (p = 20, k = 10): 2097150.0 (Min: 1024.0)
Bank conflicts (p = 15, k = 10): 65534.0 (Min: 32.0)
Bank conflicts (p = 15, k = 5): 65534.0 (Min: 1024.0)
Bank conflicts (p = 22, k = 8): 8388606.0 (Min: 16384.0)


When the algorithm is parallel (and thus all operations in the loop can be done simultaneously), the number of bank conflicts is significantly higher but only seems to depend on the value of $p$ and not on the value of $k$. In fact, it seems like the total number of bank conflicts is simply $2 + 2^{n+1}$.

Now we can conduct both of the same tests for the recursive halving algorithm.

In [53]:
def recursive_halving(p, k):

    # Initializing parameters
    n = 2 ** p
    s = (n + 1) / 2
    bank_conflicts = 0
    bank_elements = 2 ** k

    # Runs the algorithm with the above paramters, skipping the actual calculations of array elements
    while s > 1:
        for i in range(n):

            # Checks if the two elements of the array are being accessed from the same bank
            if (i % bank_elements) == (i + s % bank_elements):
                bank_conflicts += 1

        s /= 2

    print(f"Bank conflicts (p = {p}, k = {k}):", bank_conflicts)

recursive_halving(20, 10)
recursive_halving(15, 10)
recursive_halving(15, 5)
recursive_halving(22, 8)
recursive_halving(22, 2)

Bank conflicts (p = 20, k = 10): 0
Bank conflicts (p = 15, k = 10): 0
Bank conflicts (p = 15, k = 5): 0
Bank conflicts (p = 22, k = 8): 0
Bank conflicts (p = 22, k = 2): 0


In [58]:
def parallel_halving(p, k):

    # Initializing parameters
    n = 2 ** p
    s = (n + 1) // 2
    bank_elements = 2 ** k
    accessed_indices = np.zeros(bank_elements)

    # Runs the algorithm with the above paramters, skipping the actual calculations of array elements
    while s > 1:
        for i in range(n):
            accessed_indices[i % bank_elements] += 1
            accessed_indices[(i + s) % bank_elements] += 1

        s = s // 2

    print(f"Bank conflicts (p = {p}, k = {k}):", np.sum(accessed_indices), f"(Min: {np.min(accessed_indices)})")

parallel_halving(20, 10)
parallel_halving(15, 10)
parallel_halving(15, 5)
parallel_halving(22, 8)

Bank conflicts (p = 20, k = 10): 39845888.0 (Min: 38912.0)
Bank conflicts (p = 15, k = 10): 917504.0 (Min: 896.0)
Bank conflicts (p = 15, k = 5): 917504.0 (Min: 28672.0)
Bank conflicts (p = 22, k = 8): 176160768.0 (Min: 688128.0)


For the non-parallel case, the number of bank conflicts with recursive halving is 0, even with lower numbers of $k$. This makes it a much better algorithm than recursive doubling (in terms of bank conflicts). However, in the parallel case, the recursive halving algorithm performs worse than recursive doubling. Also of note, the recursive halving algorithm takes longer to complete despite fewer bank conflicts. So, I believe there are other aspects of performance to consider.