Skip to content

AhmedAalaaa/32-point-FFT-Verilog-design-based-DIT-butterfly-algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

32-point-FFT-Verilog-design-based-DIT-butterfly-algorithm

This project aims to design an 32-point FFT (Fast Fourier Transform) based DIT (decimation in time) Butterfly Algorithm with multiple clock domains and time-shared design.

The design procedure is aimed to be work for the both digital designs flow FPGA and ASIC.

The FPGA flow will start from writing the RTL to the bitstream generation on the Xilinx Vivado tool.

The ASIC flow will start from writing the RTL to GDS file generation on Synopsys EDA tools.

Introduction

Transforms are generally used to convert a function from time domain to frequency domain without any loss of information, The Fast Fourier transforms (FFT) are the numerically efficient algorithms to compute the Discrete Fourier Transform (DFT).

FFT algorithms are based on the concept of divide and conquer approach. The FFT is used in so many applications as In the field of communications, the FFT is important because of its use in orthogonal frequency division multiplexing (OFDM) systems.

In this project it was implemented the FFT for a 32-points sequence with the help of Decimation In Time algorithm with radix-2.

Butterfly diagram

image

Design Approach

Here is more details on the modules that used in the design.

MAC

The MAC unit consists of eight multipliers, four adders and four two’s complement sages. MAC unit Design which is named “butterfly2” in our design and has 8 inputs (real and imaginary of input 1and input 2, real and imaginary of twiddle 1 and twiddle 2) and 4 imaginary (real and imaginary of output 1and output 2). image

Two’s complement Design

The function of this stage is only to get the two’s complement of the fixed-point number, instead of making subtractions blocks.

Multiplier Design

the multiplier is designed for a fixed-point operation, the Idea of the multiplier is that the ordinary multiplier is worked for two positive fixed-point number, so it is needed to ensure that the two multiplicands are positive to give the correct answer, so if one of the inputs or both of them are negative fixed-point numbers, first it is needed to take the two’s compliment for them, then apply the multiplication and depending on the inputs decide the shape of the output which is two’s complement or a positive fixed-point value.

image

Adder design

The design of the adder is very simple, it is need to add three numbers together, so by adding the two of them then take the result and add it with the third and put it into the register that clocked with 50 MHz image

The final design of the mac block which is run on 50MHz.

Register Files

Registers are necessary to save outputs after each stage, as we achieved the concept of the pipeline by putting 2 register files one at the input stage and the other on the output stage. Each register file is of size 32 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟𝑠 × 2 (𝑟𝑒𝑎𝑙 𝑜𝑟 𝑖𝑚𝑎𝑔) × 8 𝑏𝑖𝑡𝑠 = 32 × 2 × 8 registers and run on Clk 10MHz. as we can enter input and get output every 100ns.

Muxs

Mux will be used in second approach, As we use Muxs to choose which inputs of the stage will enter to 16 MAC circuits to achieve time sharing concept to produce the output. As in the second approach we will use 16 MAC circuits only to cover all the 5 stages but in the first approach we use 16 MAC circuits in every stage. And in the time share implementation, the MUX will choose between 𝑀 = 5 buses, each of size 32 × 2 × 8, with counter selection that resets after reaching count #5 by control unit.

Block Diagram

The hardware which used in this approach is 16 MAC’s, 2 register files, a MUX that chooses which inputs will be loaded this cycle and a Control unit to choose a selection lines of the Muxs. Each MAC unit operates at 50 MHz and the pipeline clock at 10 MHz, so we can operate 5 operations of the MAC until new inputs enter. As We can use time sharing concept to operate 16 MAC unit to serve 5 stages in the algorithm and hence 5 cycles of clk 50 MHz are needed to get an FFT output for a specific input.So, we could reduce the Utilization of the hardware.

image

As when inputs enter, it will pass to Reg1 of clk 10Mhz and then will choose of the mux and then will go through MAC units and the outputs of this MAC units will go through Muxes and it will choose according to which stage operate now and that will be repeated until reach to the final stage and the output will go through reg 2 to save it.

Software Design

Cooley-Tukey DIT-FFT Algorithm was implement using MATLAB to compare the results with the built-if function fft as depicted form the following figures the result of the built-in function and the estimated values of the Algorithm for a random sample of data.

image image

Synthesis and Implementation

This design was Synthesized and implemented for Xilinx Zynq UltraScale+ (xazu3eg-sfva625-1-i) FPGA, and all the following results are met.

Utilization report

image image

Timing Constrains

image image

Critical path

image image

Power Report

image

About

This project aims to design an 32-point FFT (Fast Fourier Transform) based DIT (decimation in time) Butterfly Algorithm with multiple clock domains and time-shared design

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published