Skip to content

bkarl/conv2d-vhdl

Repository files navigation

conv2d

This is an implementation of a 2D convolution in VHDL.

Features

  • stride 1

  • 3x3 filters

  • zero padding

  • one pixel per clock input/output

  • re loadable parameters

  • input image size can be changed on runtime (max image size os fixed)

  • quadratic images only

Focus

  • low resource usage

  • short critical path

  • no vendor libraries

  • bit depths and max image size can be changed by generics

Implementation

To achieve one pixel per clock input and output rate 9 multiply-accumulate operations (because of the 3x3 filter size) have to be processed in parallel. Two line buffers (module pixbuf.vhd) are used to store intermediate results of the computation.

The convolution operation can be written as

P_{c(0,0)}=P_{(0,0)}c_0 + P_{(0,1)}c_1 + P_{(0,2)}c_2+P_{(1,0)}c_3+P_{(1,1)}c_4+P_{(1,2)}c_5+P_{(2,0)}c_6+P_{(2,1)}c_7+P_{(2,2)}c_8

Where c_0 to c_8 are the filter coefficients from the top left to the bottom right and P_{(y,x)} is the input pixel value at the location (y,x).

This computation is now split into y_{(y,x)} terms to model the pixel wise input. If we get an input pixel we can immediately compute the first term.

y_{(0,0)}=P_{(0,0)}c_0

In the next clock cycle when we get the second pixel we can calculate

y_{(0,1)}=P_{(0,1)}c_1+y_{(0,0})

and after 3 Pixels seen

y_{(0,2)}=P_{(0,2)}c_2+y_{(0,1})

y_{(0,2)} will now be stored in the line buffer until P_{(1,0)} is fed to the input and we can calculate

y_{(1,0)}=P_{(1,0)}c_3+y_{(0,2})

and so on. Our final value will then be y_{(2,2)} that contains y-terms and will be output.

To achieve maximum throughput of course all of these calculations mentioned above have to be executed in parallel.

Integration

There is an example how to use the module in the testbench (conv2d_tb.vhd). In order to start convolution the first line of zeros (padding) has to be fed into the module. Zero values also have to be applied on the borders the the left and right of an input image.

The last row of zeros for the bottom zero padding will be generated by the module. For each pixel that that is fed to the module the current pixels address (x/y) and a valid signal has to be assigned.

Parameters

NUM_BITS_PIXEL

This is the number of bits per input pixel. To keep precision the module does all arithmetic operations using NUM_BITS_PIXEL*2 arithmetic.

NUM_BITS_ADDR

Number of bits to encode the pixel addresses x and y. Basically lb(MAX_IMAGE_WIDTH).

NUM_BITS_COEFF

Word width of the coefficients.

MAX_IMG_WIDTH

Max image width for the block to expect. Images must be quadratic.

MAX_IMG_HEIGHT

Max image width for the block to expect. Images must be quadratic.

Test

A test file generated by a python module (data_py.bin) can be checked against a file generated by the testbench of the conv2d module (data_sim.bin).

Resource utilization

Because of the 9 parallel MACs the module needs 9 DSPs. The maximum image size and the internal bit width defines the number of BRAM instances the synthesis tool will infer. The memory in bits needed can be calculated using the formula

D_{Linebuffer}=2\cdot MAX_IMG_WIDTH \cdot 2 \cdot NUM_BITS_PIXEL\ b

Using the parameters MAX_IMG_WIDTH=64 and MAX_IMG_WIDTH=16 a standard synthesis in Vivado 2019.2 for 7 Series FPGAs boils down to

  • 256 LUTs
  • 1 BRAM
  • 9 DSP

Todo

  • write a wrapper for an on chip bus (AXI Stream...) for easier integration

  • add different strides/paddings/filtersizes that can be changed on the fly

About

2D convolution implemented in VHDL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published