Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of bit reversal tables for CMSIS-DSP #858

Closed
christophe0606 opened this issue Mar 11, 2020 · 9 comments
Closed

Generation of bit reversal tables for CMSIS-DSP #858

christophe0606 opened this issue Mar 11, 2020 · 9 comments

Comments

@christophe0606
Copy link
Contributor

christophe0606 commented Mar 11, 2020

Generating bit reversal tables for CMSIS-DSP is a question which is coming back often.
In this post, I'll explain how to do it. (I'll use the naming "bit reversal" even if not totally correct).

Pure radix

Pure radix case is the simplest. The reversal formula is obvious and the
out of place reversal and in-place reversal are the same.

In CMSIS the bit reversal tables are in bytes and take into account the fact that the FFT is complex. So there is a factor of 8 applied to the sample indexes.

In this post I'll just consider the index. So you'll need to multiply by 8 to generate the final CMSIS-DSP tables.

Radix-8

Let's take the example of the 64 samples CFFT F32.

The index of a sample is written

8 n1 + n2 with 0 <= n1,n2 < 8

The reversal of this index is just a reversal of the values n1, n2 and is written

8 n2 + n1

The correspondence

8 n1 + n2 -> 8 n2 + n1 

is giving the transposition required to implement the bit reversing table.

You must of course remove the cases where 8 n1 + n2 is equal to 8 n2 + n1.
And if you include the case a -> b, don't include the case b -> a.

This list of transpositions is working for an out of place reversal and also for an in-place one because the transpositions are always disjoint ones. It won't be the case any more with mixed radix as we will see. And since CMSIS-DSP is doing an in-place reversing, it will have to be taken into account.

Radix-4

The radix-4 FFT in CMSIS-DSP are implemented in such a way that the bit reversal is in fact a base 2 one. So, if you decompose your index as:

4 n1 + 2 n2 + n3 

and reverse it as

4 n3 + 2 n2 + n1 

then you'll get the tables for the radix-4 FFT and radix 4x2 of CMSIS.

Mixed radix

Mixed radix is more complex for two reasons:

  • The reversal formula is not as obvious.
  • The reversal formula is not giving directly the list of transpositions to apply for the in-place algorithm. It is just giving a transformation for the out-of-place algorithm.

Radix 8x2

Example with 16 samples

Let's take as example a 16 sample FFT.

The reversal formula is:

2 b + a -> 8 a + b 
with 0 <= a < 1 and 0 <= b < 7.

If we compute all the values we get:

{0 -> 0, 2 -> 1, 4 -> 2, 6 -> 3, 8 -> 4, 10 -> 5, 12 -> 6, 14 -> 7, 
1 -> 8, 3 -> 9, 5 -> 10, 7 -> 11, 9 -> 12, 11 -> 13, 13 -> 14, 
15 -> 15}

We see a problem with the transform 8 -> 4 and 4 -> 2. They are not disjoint since both involve the position 4.

If we compute the out-of-order reordering by applying above rules to the sequence [1..15], we get:

{8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15}

Permutations

To go from [1..15] to the above order, the permutation required is:

(1 2 4 8)(3 6 12 9)(5 10)(7 14 13 11)

Where (a b c d) is a cycle representing the transform a -> b -> c -> d -> a

This permutation can be computed using FindPermutation in Mathematica, or Permutation.from_sequence using the Python symbolic package sympy.

So, we see that the effect of having overlap in the transformation rules is to generate a cycle in the final permutation.

To do this permutation in place we need to decompose the cycle in a sequence of transpositions.
There is not an unique way to do it. So you may not (and will not) recover the table of the CMSIS-DSP but you'll get tables with the same length and the same effect on the array of samples.

In addition to that, the disjoint cycles being independent they can be reordered in the permutation. It means that the order in the final CMSIS-DSP table is also a bit arbitrary.

So the best way to check that the resulting table is correct is to compute its equivalent permutation. You don't need to be bit exact with the CMSIS-DSP bit reversal tables to be right.

I choose to decompose a cycle

(a b c d) as (a d)(b d)(c d).

So I finally get the following list of transpositions:

(1 8)(2 8)(4 8)(3 9)(6 9)(12 9)(5 10)(7 11)(14 11)(13 11)

If we flatten the list of numbers and multiply by 8, we get the following table:

8,64, 16,64, 32,64, 24,72, 48,72, 96,72, 40,80, 56,88, 112,88, 104,88

The original CMSIS-DSP table is:

8,64, 24,72, 16,64, 40,80, 32,64, 56,88, 48,72, 88,104, 72,96, 104,112

First obvious difference : the order is different. But it is not the only difference.

In our table we have the pairs (56,88), (112,88) and (104,88).
In CMSIS-DSP table, we have (56,88), (88,104) and (104,112).

And you can check that:

(56 88)(112 88)(104 88) = (56 88)(88 104)(104 112)

So both tables are representing the same permutation of the samples.

Example with 8192 samples

For a 8192 table, the correspondence formula would be:

1024 e + 128 d + 16 c + 2 b + a -> 4096 a + 512 b + 64 c + 8 d  + e
with 0 <= a < 2 and 0 <= b,c,d,e < 8.

Radix 8x4

Formula for 32 samples

For a 32 samples FFT, the correspondence formula is:

4 b + a -> 8 a + b 
with 0 <= a < 4 and  0 <= b < 8

Formula for 2028 samples

For a 2048 FFT, the correspondence formula is:

256 d + 32 c + 4 b + a -> 512 a + 64 b + 8 c + d
with 0 <= a < 4 and  0 <= b,c,d < 8

I have attached the bit reversal table and twiddle table to implement the 8192 CFFT.
Note that the index type for bit reversal has to be changed from uint16_t to uint32_t to support 8192. So it impacts the API and other bit reversal tables.

TablesFor8192.zip

@rosek86
Copy link

rosek86 commented Mar 29, 2020

Hello,

Thank you for the guide, why do you think that index type for bit reversal has to be changed from uint16_t to uint32_t to support 8192, please? I've generated tables for radix 4 and radix 8 and they all seem to fit in 0xFFFF.

If someone is struggling with this one too, I prepared code snippet here:
https://gist.github.com/rosek86/d0d709852fddf36193071d7f61987bae

Currently radix 4, 4x2 and radix 8.

@christophe0606
Copy link
Contributor Author

Hello @rosek86, I have had some warnings with the compiler.
But if you confirm it is working with uint16_t then it is great. Less changes to the code.

@rosek86
Copy link

rosek86 commented Mar 30, 2020

Hello @christophe0606, I confirm that arm_cfft_q15 and arm_bitreversal_16 (both c/asm) works fine. I haven't compiled tested anything else apart from q15.

@rosek86
Copy link

rosek86 commented Apr 4, 2020

I updated code snippet to support 8x4 and 8x2, maybe someone will find this useful.

@Lijiugen
Copy link

Lijiugen commented Dec 1, 2021

Hello @rosek86, I have had some warnings with the compiler. But if you confirm it is working with uint16_t then it is great. Less changes to the code.

How to generate bitreversal tables ? Do you know? I couldn't understand?Can you send me the code?

@rosek86
Copy link

rosek86 commented Dec 1, 2021

Hi @Lijiugen, Have you tried the code I posted above?

@Lijiugen
Copy link

Lijiugen commented Dec 1, 2021

@Lijiugen,您是否尝试过我上面发布的代码?

Thank you very much. I'm so excited. You returned to me so soon. I just opened the link, but I couldn't open it before. It's so slow for us to visit GitHub here

@alireza-tabatabaee
Copy link

Thanks for sharing this guide! Just one issue, by looking through the code comments in arm_common_tables.c in Revision v1.9.0, I've noticed that all of the bit reversal tables follow a periodic sequence of 8x2, 8x4 and radix-8, except 2048!

16 -> 8x2
32 -> 8x4
64 -> 8
128 -> 8x2
256 -> 8x4
512 -> 8
1024 -> 8x2
2048 -> 8x2
4096 -> 8

Is there any reason for this? Or is it a bug? Or just a wrong comment? Screenshot attached below as evidence.

image

@christophe0606
Copy link
Contributor Author

christophe0606 commented May 16, 2022

@alireza-tabatabaee It should work in theory with both 8x2 and 8x4.
But 8x4 may be a bit more efficient. So I don't know the reason why 8x2 was kept for 2048. Perhaps the bit reversal table was smaller ? Perhaps the performance difference between 8x2 and 8x4 was not big enough.

But once you choose to use 8x2 or 8x4 then the code has to take it into account. You cannot just replace the table. The code for 2048 (arm_cfft_f32.c) is using arm_cfft_radix8by4_f32

So, I assume the comment is wrong and the table is a 8x4 (Otherwise tests would fail).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants