# Radix_2_bowers FFT
Bowers FFT is an optimized version of the classical DIT (Decimation-In-Time) and DIF (Decimation-In-Frequency) FFT algorithms. The primary goal of Bowers FFT is to enhance computational efficiency by reducing the number of Twiddle factor accesses and optimizing memory access patterns. In modern computing architectures, memory access efficiency often becomes a bottleneck. Bowers FFT addresses this issue by significantly lowering Twiddle factor overhead and improving hardware vectorization utilization.

## 1. Dit and Dif FFT
In Dit FFT,the input data is in bit-reversed order, while the output is in natural order.It start with an 2-DFT, and then 4-DFT...until N-DFT. In Dif FFT, the input data is in natural order, while the output is in bit-reversed order.It start with an N-DFT, and then N/2-DFT...until 2-DFT. In both DIT and DIF FFT, the butterfly operations are performed using an iterative method instead of a recursive approach<br>
In this part, only the code for Dif FFT will be presented, but in the section on the Four-Step FFT, both Dit and Dif FFT (including forward and inverse transforms) will be demonstrated.<br>

## 2. Bowers FFT
Bowers FFT is very similar to DIF FFT, with the core difference lying in the way twiddle factors are accessed. Taking an 8-point FFT as an example, in DIF FFT, the twiddle factors for the different stage are follows:<br>
first Layer:    $$w_{8}^{0} ,w_{8}^{1}, w_{8}^{2}, w_{8}^{3}$$
second Layer:   $$w_{8}^{0} ,w_{8}^{2}, w_{8}^{0}, w_{8}^{2}$$ 
third Layer:   $$w_{8}^{0} ,w_{8}^{0}, w_{8}^{0}, w_{8}^{0}$$ 
While in Bower FFT. the the twiddle factors for the different stage are follows:<br>
first Layer:    $$w_{8}^{0} ,w_{8}^{0}, w_{8}^{0}, w_{8}^{0}$$
second Layer:   $$w_{8}^{0} ,w_{8}^{0}, w_{8}^{2}, w_{8}^{2}$$ 
third Layer:   $$w_{8}^{0} ,w_{8}^{1}, w_{8}^{2}, w_{8}^{3}$$ 
We focus on the second layer: In Bowers FFT, memory access will be more contiguous. The difference is shown by the following figure (a is dif fft, b is bowers_g_t fft):
<img src="bowers & dif fft.png" alt="Example Image" style="background-color: #f0f0f0; width:1000px; display: block; margin: auto;">

## 3. Code implementation of bowers FFT
In plonky3, two computing networks, Bowers G and Bower G^T, are implemented, which are inverse fft algorithms with inverse transformations to each other. The Bowers G network is similar to DFT, with the only difference being that the inputs are bit-reverse, so that the outputs are directly in natural order. (Note that the twiddle factor also uses bit-reverse). In the bowers g^t network, it is similar to the inverse DFT. We use natural sequential inputs, and the outputs need to be bit-reversed.

### 3.1 implementation of Dif and Bowers FFT
You can use the following code with a simple 8-point data [1, 2, 3, 4, 5, 6, 7, 8] as input to run the DIF FFT and its inverse transform, so that it can be used for comparative observation later.
Notice that bowers_g is for fft, and bower_g_t is for inverse-fft.

In [2]:
import numpy as np
class Field:
    # basic operations of finite domains
    def __init__(self, modulus):
        self.modulus = modulus

    def add(self, x, y):
        return (x + y) % self.modulus

    def sub(self, x, y):
        return (x - y) % self.modulus

    def mul(self, x, y):
        return (x * y) % self.modulus

    def pow(self, x, exp):
        return pow(x, exp, self.modulus)

    def inv(self, x):
        return pow(x, self.modulus - 2, self.modulus)

    def roots_of_unity(self, n):
        root = self.pow(11, (self.modulus - 1) // n)  # suppose 11 is the primitive root
        return [self.pow(root, i) for i in range(n)]
    

class FTT:
    # generate forward and inverse roots, bit-reverse, dit and dif FFT, forward and inverse dit or dif.
    def __init__(self, modulus, n):
        self.gf = Field(modulus)
        self.n = n
    
    def get_forward_roots(self,n):
        return self.gf.roots_of_unity(n)
    
    def get_inverse_roots(self,n):
        forward_roots=self.gf.roots_of_unity(n)        
        return [self.gf.inv(r) for r in forward_roots]

    def bit_reversed_indices(self, n):
        logn = n.bit_length() - 1
        return [int(f"{i:0{logn}b}"[::-1], 2) for i in range(n)]

    def bit_reverse(self, a):
        n = len(a)
        indices = self.bit_reversed_indices(n)
        return [a[i] for i in indices]

    def dif(self, a, roots):
        n = len(a)
        logn = n.bit_length() - 1
        for s in range(logn, 0, -1):
            m = 1 << s
            wm = roots[n//m]
            for k in range(0, n, m):
                w = 1
                for j in range(m // 2):
                    u = a[k + j]
                    v = a[k + j + m // 2]
                    a[k + j] = self.gf.add(u, v)
                    a[k + j + m // 2] = self.gf.mul(w, self.gf.sub(u, v))
                    w = self.gf.mul(w, wm)
        return self.bit_reverse(a)

    def forward_dif(self, a):
        roots=self.get_forward_roots(len(a))
        return self.dif(a,roots)

    def inverse_dif(self, a):
        inverse_roots=self.get_inverse_roots(len(a))
        a = self.dif(a, inverse_roots)
        n_inv = self.gf.inv(len(a))
        return [self.gf.mul(x, n_inv) for x in a]
    
    def bower_g(self,a):
        n = len(a)
        a = self.bit_reverse(a)
        roots=self.get_forward_roots(n)
        roots=self.bit_reverse(roots[:len(roots) // 2])
        logn = n.bit_length() - 1
        for s in range(1, logn + 1):
            m = 1 << s
            for k in range(0, n, m):
                w = roots[k//m]
                for j in range(m // 2):
                    u = a[k + j]
                    v = a[k + j + m // 2]
                    a[k + j] = self.gf.add(u, v)
                    a[k + j + m // 2] = self.gf.mul(w, self.gf.sub(u, v))
        return a
        
    def bower_g_t(self, a):
        n = len(a)
        roots=self.get_inverse_roots(n)
        roots=self.bit_reverse(roots[:len(roots) // 2])
        logn = n.bit_length() - 1
        for s in range(logn, 0, -1):
            m = 1 << s
            for k in range(0, n, m):
                w = roots[k//m]
                for j in range(m // 2):
                    u = a[k + j]
                    v = self.gf.mul(w, a[k + j + m // 2])
                    a[k + j] = self.gf.add(u, v)
                    a[k + j + m // 2] = self.gf.sub(u, v)
        n_inv = self.gf.inv(len(a))
        a=self.bit_reverse(a)
        return [self.gf.mul(x, n_inv) for x in a]


## 3.2 Test for Dif and Bowers FFT


In [3]:

def test_fft():
    # 1.set the input and other params:
    modulus = 17 
    input_array = [1,2,3,4,5,6,7,8] 
    n = len(input_array)         
    ntt = FTT(modulus, n)
    forward_roots=ntt.get_forward_roots(n)
    print("forward_roots:", forward_roots)
    inverse_roots=ntt.get_inverse_roots(n)
    print("inverse_roots:", inverse_roots)

    # 2. test for Dif fft
    forward_result_dif = ntt.forward_dif(input_array[:]) 
    print("result of forward_result_dif is:", forward_result_dif)
    # test for inverse fft
    inverse_result_dif = ntt.inverse_dif(input_array[:]) 
    print("result of inverse_result_dif is:", inverse_result_dif)
    # test if it can be restored to the original input
    result_back=ntt.inverse_dif(forward_result_dif)
    print("inverse back result is:", result_back)  
    assert result_back == input_array,"Dif FTT test Failed!"
    print("Dif FFT tests passed!")

    # 2. test for Bowers fft
    forward_result_dif = ntt.forward_dif(input_array[:]) 
    bowers_g_result= ntt.bower_g(input_array[:]) 
    print("bowers_g_result is:", bowers_g_result)
    assert forward_result_dif==bowers_g_result,"forward_result_dif is not equal to bowers_g_result!"

    bowers_g_t_result=ntt.bower_g_t(input_array[:])
    print("bowers_g_t_result is:", bowers_g_t_result)
    assert inverse_result_dif==bowers_g_t_result,"inverse_result_dit is not equal to bowers_g_t_result!"
    print("Bowers FFT tests passed!")

if __name__ == "__main__":
    test_fft()

forward_roots: [1, 2, 4, 8, 16, 15, 13, 9]
inverse_roots: [1, 9, 13, 15, 16, 8, 4, 2]
result of forward_result_dif is: [2, 8, 14, 6, 13, 3, 12, 1]
result of inverse_result_dif is: [13, 15, 10, 11, 8, 5, 6, 1]
inverse back result is: [1, 2, 3, 4, 5, 6, 7, 8]
Dif FFT tests passed!
bowers_g_result is: [2, 8, 14, 6, 13, 3, 12, 1]
bowers_g_t_result is: [13, 15, 10, 11, 8, 5, 6, 1]
Bowers FFT tests passed!
