-----

## Batcher Sorting Networks: Bitonic Sort

Burton Rosenberg

_Creation Date:_ June 2023

_Last update:_ 5 October 2024

&copy; Copyright 2024 Burton Rosenberg. All rights reserved.


----


### Table of contents.

1. <a href="#bitonic">Batcher Bitonic Sorting Network</a>
1. <a href="#bitonic-python">Batcher Bitonic Sort in Python</a>


-----

### <a name=bitonic>Batcher Bitonic Sort</a>

----



In a 1968 report, Ken Batcher presented two sorting networks that have $O((\log n)^2)$ layers. Since each layer is computed in unit time, either as a circuit or on a GPU, the time to sort is also $O((\log n)^2)$. In the circuit model $O(n\,(\log n)^2)$ swap units are needed. In the GPU model, $O(n)$ threads are needed in each thead launch.


__Definition__ A _bitonic sequence_ is a sequence of integers that can increase or decrease, and can change direction at most twice.

__Definition__ A _bitonic sorter_ $B_n$ on $n$ integers is a device that takes a bitonic sequence of $2n$ integers 
and splits it into two $n$ integer sequences, each bitonic and all the numbers in the first half are as
least as large as any number in the second half.

THe bitonic sorter will be the fundamental unit of the network from which we build the following two networks.

__Definition__ A _half sorter_ $S_n$ on $n$ integers is a device tha takes a bitonic sequence of $2n$ integers and returns the sequence sorted. 

A half sorter can be built by a recursive construction of a $B_n$ sorter followed by a $B_{n/2}$ network on
the top half and a $B_{n/2}$ network on the bottom half.


The final sorter for $2n$ integers is a built by building up from $n=1$ until a sorted sequence of $n$ integers,
and as well a sorted seqence of $n$ integers however with the opposite sort order. Concatenated this is a 
bitonic sequence of $2n$ integers, which is input to a half sorter.


#### Bitonic sequences

The definition of biotonic is nuanced. It says that for some rotation the sequence is moves in a direction, asecending or descending, that changes direction at most once. A sort sequence is bitonic, and so is a concatenation of an a sequence assorted upwards with a sequence assorted downwards. 

However, a sequence can ascend, descend and then ascend again and still be bitonic, if it is possible to rotate the sequence such that the two ascending sections can be merged.

__Example:__ The sequence 

$$
0, 1, 2, 1, 0, -1, -2, -1
$$

is bitonic, as it can be rotated to the form 

$$
2, 1, 0, -1, -2, -1, 0, 1
$$

or

$$
-2, -1, 0, 1, 2, 1, 0, -1
$$

We will rely on the zero-one principle. Hence we are only concerned with bitonic seqences restricted to the integers 0 and 1. There are only four such seqeuences,

- The constants sequences of all 1's or all zeros.
- The sequences with a single change, so $i$ 1's followed by $n-i$ 0's, or else $i$ 0's followed by $n-i$ 1's.
- Sequences with two changes. These are sequences with $i_1$ 0's to begin, $i_2$ 0's to end, and $n-i_1-i_2$ 1's in the middle; or reversing the role of 0's and 1's.

The bitonic sorter is a network that creates swaps between corresponding wires in the upper and lower halves of the inputs to the sorter. Hes is an example for 8 inputs.


<pre> 
 A -----+--------- min(A,a)
        |
 B -------+------- min(B,b)
        | |
 C ---------+----- min(C,c)
        | | |
 D -----------+--- min(D,d)
        | | | |
        | | | |       
 a -----+--------- max(A,a)
          | | |
 b -------+------- max(B,b)
            | |
 c ---------+----- max(C,c)
              |
 d -----------+--- max(D,d)
</pre> 



#### Proof of the bitonic sorter

One considers all cases and proves that $B_n$ works.

When the input bitonic sequence has no or one change, or if there are two changes but both changes occur in the first half or the second half of the sequence, the output is either the same as the input, or the top and bottom halves exchanged, with gives the correctness for the sorting order between top and bottom.

The final case is where there is exactly one change in the first half and exactly one change in the second half. It is helpful to figure out whether overall there are more 0's than 1's, more 1's than 0's, or they are equal.

In the case of equality, the result is two constant halves; otherwise consider which is fewer and there will be a bitonic sequence mixes those with the majoriy value in one of the halves, the other have being constant the majority value. And the sorting property will be correct.


#### Example:

<pre>
    B_4( 1 1 0 0 0 0 0 1 ) = min( 1 1 0 0 , 0 0 0 1 ) | max ( 1 1 0 0 , 0 0 0 1 ) 
                           = 0 0 0 0 1 1 0 1
</pre>

#### Recursive structure

The simple single layer described above is denoted $S_n$, where $n$ is the number of wires both input and output. The circuit takes a bitonic sequence and splits it into two half-length sequences, each bitonic, with any number in the upper sequence at least as large as any number in th lower sequence. A sorting unit $B_n$ on $n$ inputs, which takes a bitonic sequence as input and outputs the values sorted, is recursively define as,

<pre>
               -----  B_n -----
              +-----+
              |     |    +-----+
              |     |    |     |   
              |     | => |B_n/2| =>|
              |     |    |     |   |
              |     |    +-----+   |
   bitonic => | S_n |              | => sorted
              |     |    +-----+   |
              |     |    |     |   |
              |     | => |B_n/2| =>|
              |     |    |     |   
              |     |    +-----+           
              +-----+
</pre>

With the basis case of $B_1$ being a straight wire and $S_2$ being a single swap unit.

We also define $B'_n$ which is $B_n$ with the order of the sort reversed.

We construct a merge structure to create from two $n$ length bitonic sequences one $2n$ length bitonic sequence by one instance of $B_{n}$ and one instance of $B'_{n}$ stacked to oppose their sorting direction,

<pre>
              +-----+
              |     |
   bitonic => | B_n | =>|
              |     |   |
              +-----+   |
                        | => bitonic
              +-----+   |
              |     |   |
   bitonic => |B'_n | =>|
              |     |    
              +-----+
</pre>

So an entire sort is depicted here,

<pre>
       +----+   
   ----|    |   +----+
       |B_2 |---|    |
   ----|    |   |    |    
       +----+   |    |   +----+
                |B_4 |---|    |
       +----+   |    |   |    |        
   ----|    |   |    |   |    |            
       |B_2'|---|    |   |    |            
   ----|    |   +----+   |    |              
       +----+            |    |
                         |B_8 | => sorted
       +----+            |    |
   ----|    |   +----+   |    |        
       |B_2 |---|    |   |    |        
   ----|    |   |    |   |    |                 
       +----+   |    |   |    |           
                |B_4'|---|    |       
       +----+   |    |   +----+   
   ----|    |   |    |   
       |B_2'|---|    |
   ----|    |   +----+      
       +----+  
</pre>




### <a name="bitonic-python">Batcher Bitonic Sort in Python</a>

The challenge is to navigate the double recursion and know where the swaps should be based only on the thread index and some level global variables.

The two parameters $i$ and $j$ are interpreted with the thread index $t$ as follows. The lower $i$ bits of $t$ are the offset inside a $B^l_k$. The $l$ is absent if bit $j$ is 0, or the prime (for the inverted sort order) if bit $j$ is 1.

The $j$ controls the larger recursion structure, and $i$ begins at $j$ and counts down for each $j$, being the inside recursive structure.



In [1]:
def bitonic_wiring(tid,j,i):
    assert j>=i
    d = 2**i
    mask = d-1
    tid_top = (tid>>i)<<(i+1)
    tid_bot = tid & mask 
    tid_dir = (tid>>j)%2
    return (tid_top+tid_bot, tid_top+tid_bot+d, tid_dir)
    
def bitonic_wiring_test(bits):
    
    def bitonic_wiring_test_aux(j,i):
        u_prev = 0
        for tid in range(2**bits):
            (u,v,color) = bitonic_wiring(tid,j,i)
            dir = '+'
            if color:
                dir = '-'
            if u-u_prev>1:
                print('---')
            u_prev = u
            print(f'{u}\t{v}\t{dir}')
            
    for c in range(bits):
        print(f'\ni==j=={c}')
        bitonic_wiring_test_aux(c,c)
        

bitonic_wiring_test(4)


i==j==0
0	1	+
---
2	3	-
---
4	5	+
---
6	7	-
---
8	9	+
---
10	11	-
---
12	13	+
---
14	15	-
---
16	17	+
---
18	19	-
---
20	21	+
---
22	23	-
---
24	25	+
---
26	27	-
---
28	29	+
---
30	31	-

i==j==1
0	2	+
1	3	+
---
4	6	-
5	7	-
---
8	10	+
9	11	+
---
12	14	-
13	15	-
---
16	18	+
17	19	+
---
20	22	-
21	23	-
---
24	26	+
25	27	+
---
28	30	-
29	31	-

i==j==2
0	4	+
1	5	+
2	6	+
3	7	+
---
8	12	-
9	13	-
10	14	-
11	15	-
---
16	20	+
17	21	+
18	22	+
19	23	+
---
24	28	-
25	29	-
26	30	-
27	31	-

i==j==3
0	8	+
1	9	+
2	10	+
3	11	+
4	12	+
5	13	+
6	14	+
7	15	+
---
16	24	-
17	25	-
18	26	-
19	27	-
20	28	-
21	29	-
22	30	-
23	31	-


### END