fpga_soft_mpeg

Project Goal

This project intends to create a MPEG1 capable decoder for video and audio for FPGAs. A hybrid of software, running on a soft core, and hardware acceleration of math intensive tasks is used.

Multiple soft cores, based on the RISC V architecture, are available for evaluation:

PicoRV32
VexiiRiscv (with TileLink and Wishbone bus)

Both cores are configured to implemented the rv32imc instruction set. A FPU is absent, but the multiply unit and the compressed instruction set are utilized.

This project has tried to use FFmpeg but eventually, pl_mpeg was chosen instead, because of its clean code base.

This project aims to decode MPEG1 audio with a 30 MHz clock rate, to make it usable with the MiSTer CD-i core without clock domain crossing. Note: The original MPEG1 audio decoder of the CD-i also had only this clock rate available.

Status

Currently, only audio is supported.

Simulation

Verilator is used as simulation tool.

But first, we need some example data to work with. To avoid any legal issues, public domain files are used. Execute this to generate the test data:

cd sim
./prepare_memory.sh

Now execute one of these commands to simulate the model using one of the available soft cores:

./sim_top.sh vexii
./sim_top.sh vexiiwb
./sim_top.sh picorv32

The simulation will run until the MPEG stream has ended. You can hear the result by doing this.

./create_wav.sh && mplayer audio.wav

How to enable tracing using gtkwave?

For performance reasons, the trace is not created per default. Uncomment this line in sim_top.cpp

//#define TRACE

How to verify the results of the hardware vector unit

Uncomment this line in main.c. It will make the calculations slower, since vector multiplications are performed in software and in hardware and then compared with each other to ensure that the hardware calculated results are correct.

//#define SOFT_CONVOLVE

Results

The md5sum of the result files are expected as

e025598c855ae4d39c9cc642d929de0c  audio_left.bin
ef552bf5104375e11a6689f883bf2068  audio_right.bin

Just to be sure, the input files have these md5sum

b1a53b51752ca0b3dc92c6f83bbd99c1  Arpent.mp3
eb87dddfa689ce122cb1bfe606e2f9e5  fma.mpg

Benchmarks with vector math acceleration unit

After some evaluation, it turns out that VexiiRiscv with TileLink is the most efficient solution, since only 47% of the clock ticks were required to keep up with the incoming data stream. Picorv32 is too slow, even with hardware acceleration.

These results are based on compiling the software with -O3 and a bitrate of 387 kb/s and a sample rate of 44100 Hz:

Debug out a1a1a1a1  Waterlevel:        98008        98008 Samples decoded:      186624  Samples played:       88889  Load:          47 %   vexii
Debug out a1a1a1a1  Waterlevel:        61851        61851 Samples decoded:      186624  Samples played:      125047  Load:          67 %   vexiiwb
Debug out a1a1a1a1  Waterlevel:       -76603       -76603 Samples decoded:      186624  Samples played:      263501  Load:         141 %   picorv32

Reducing the bit rate to 224 kb/s to align it with VideoCD results into a slightly better performance:

Debug out a1a1a1a1  Waterlevel:       105441       105441 Samples decoded:      186624  Samples played:       81456  Load:          43 %   vexii
Debug out a1a1a1a1  Waterlevel:        74385        74385 Samples decoded:      186624  Samples played:      112513  Load:          60 %   vexiiwb
Debug out a1a1a1a1  Waterlevel:       -58413       -58413 Samples decoded:      186624  Samples played:      245312  Load:         131 %   picorv32

Compiling the software with -Os decreases the number of instruction words from 3101 to 1929 but also decreases performance. These are still results using 224 kb/s:

Debug out a1a1a1a1  Waterlevel:        49404        49404 Samples decoded:      186624  Samples played:      137493  Load:          73 %   vexii
Debug out a1a1a1a1  Waterlevel:        22340        22340 Samples decoded:      186624  Samples played:      164557  Load:          88 %   vexiiwb
Debug out a1a1a1a1  Waterlevel:      -137554      -137554 Samples decoded:      186624  Samples played:      324453  Load:         173 %   picorv32

Switching over from a muxed system stream to an elementary stream leads to reduction of memory consumption. The text segment is reduced from 1929 to 1357 instruction words since the demuxing logic is removed The total memory requirements are decreased from 28000 to 21300 byte since the elementary stream buffer is no longer required. The performance however is nearly untouched by this.

Debug out a1a1a1a1  Waterlevel:        50769        50769 Samples decoded:      186624  Samples played:      136129  Load:          72 %   vexii
Debug out a1a1a1a1  Waterlevel:        23769        23770 Samples decoded:      186624  Samples played:      163128  Load:          87 %   vexiiwb
Debug out a1a1a1a1  Waterlevel:      -134253      -134253 Samples decoded:      186624  Samples played:      321151  Load:         172 %   picorv32

Benchmarks using only the soft core

Running pl_mpeg without any modifications, shows how inefficient the RISC V implementations are when it comes to execute Multiply-Accumulate operations. It might be possible that a RISC V with vector instructions and the correct compiler can fix this problem though.

These results are based on compiling the software with -O3 and a bitrate of 387 kb/s and a sample rate of 44100 Hz:

Debug out a1a1a1a1  Waterlevel:       -22333       -22333 Samples decoded:      186624  Samples played:      209231  Load:         112 %   vexii
Debug out a1a1a1a1  Waterlevel:      -125639      -125639 Samples decoded:      186624  Samples played:      312537  Load:         167 %   vexiiwb
Debug out a1a1a1a1  Waterlevel:      -444427      -444427 Samples decoded:      186624  Samples played:      631325  Load:         338 %   picorv32

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.vscode		.vscode
rtl		rtl
sim		sim
sw		sw
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fpga_soft_mpeg

Project Goal

Status

Simulation

How to enable tracing using gtkwave?

How to verify the results of the hardware vector unit

Results

Benchmarks with vector math acceleration unit

Benchmarks using only the soft core

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fpga_soft_mpeg

Project Goal

Status

Simulation

How to enable tracing using gtkwave?

How to verify the results of the hardware vector unit

Results

Benchmarks with vector math acceleration unit

Benchmarks using only the soft core

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages