Dynamically Allocated Neural Network (DANA) Accelerator for the RISC-V Rocket microprocessor
Scala C++ C Assembly Makefile Verilog Other
Latest commit 6966dd5 Aug 22, 2017 @seldridge seldridge Change default DANA config to 2PE, 512KiB Cache
This updates the default DANA configuration to use parameters that are
both suitable for small MNIST networks and that will barely fit on a
Zedboard.

Signed-off-by: Schuyler Eldridge <schuyler.eldridge@ibm.com>

README.md

Dynamically Allocated Neural Network (DANA) Accelerator

A Chisel3 implementation of a neural network accelerator, DANA, capable of supporting simultaneous inferences or learning transactions originating from the same or different contexts [1]. DANA integrates with the RISC-V Rocket microprocessor as a Rocket Custom Coprocessor (RoCC).

This is currently compatibile with rocket-chip:f3299ae9 -- an older rocket-chip version used by fpga-zynq.

Table of Contents

Setup

Requirements:

  1. Clone the Rocket Chip Repository

This is not, at present, a standalone repository and must be cloned inside of an existing Rocket Chip clone. The following will grab a supported version of rocket-chip and clone DANA inside of it:

git clone https://github.com/ucb-bar/rocket-chip $ROCKETCHIP_DIR
cd $ROCKETCHIP_DIR
git reset --hard f3299ae91d3f01d0349eb4746886e303e8fb1b41
git clone https://github.com/bu-icsg/dana
cd dana
git submodule update --init --recursive
  1. Build the RISC-V Toolchain

This requires a supported version of the RISC-V toolchain. Go ahead and build the version of the toolchain pointed at by the rocket-chip repository. This requires setting the RISCV environment variable and satisfying any dependencies required to build the toolchain.

cd $ROCKETCHIP_DIR/riscv-tools
./build.sh

Emulation (Functional Verification)

This project uses Chisel3 and FIRRTL for hardware design and Verilog generation. The Verilog emitted by FIRRTL can then be tested either as a standalone accelerator or integrated with Rocket Chip.

Standalone Emulation

This is currently WIP.

Rocket Chip Emulation

You can build a complete version of Rocket Chip that includes DANA in a RoCC socket.

You can build an emulator of Rocket + DANA using the rocket-chip make target inside the rocket-chip/emulator directory. The Makefile just needs to know what configuration we're using and that we have additional Chisel code located in the dana directory. Below we build a Rocket + DANA configuration with a DANA unit having 4 processing elements and using a block width of 4 32-bit elements:

cd $ROCKETCHIP/emulator
make CONFIG=XFilesDanaCppPe4Epb4Config ROCKETCHIP_ADDONS=xfiles-dana

We provide bare-metal test programs inside the tests directory.

Emulation Debugging

For debugging or running the emulator more verbosely, you have the option of either relying on Chisel's printf or building a version of the emulator that supports full VCD dumping.

Printf Debugging

Chisel's printf writes to STDERR, all printf statements are disabled by default. You can enable all Chisel-included printf commands with the +verbose option:

cd $ROCKETCHIP/emulator
./emulator-Top-XFilesDanaCppPe4Epb4Config +verbose [binary] 2>&1 | tee run.log

Note: Rocket Chip dumps information every cycle and it is often useful to grep for the exact printf that you're looking for.

Waveform Debugging

You can build a "debug" version of the emulator (which provides full support for generating vcd traces with:

cd $ROCKETCHIP/emulator
make debug

This creates a *-debug emulator which supports a -v[FILE] option for generating a VCD file, a +start option for starting VCD dumping at a specific cycle.

To further reduce the size of the VCD file we provide a tool that prunes a VCD file to only include signals in a specific module and it's children, vcd-prune. Example usage to only emit DANA signals:

cd $ROCKETCHIP_DIR/emulator
./emulator-Top-XFilesDanaCppPe4Epb4Config -v- [binary] 2>&1 | ../dana/util/hdl-tools/scripts/vcd-prune -m Dana > run.vcd

This waveform can then be viewed using GTKWave by building GTKWave locally and using a helper script to pre-populate the waveform window:

cd $ROCKETCHIP/emulator
make -C ../dana/util/hdl-tools gtkwave
../dana/util/hdl-tools/scripts/gtkwave-helper run.vcd

Hardware Evaluation

Rocket + DANA can be evaluated on a Zynq FPGA using the Berkeley-provided fpga-zynq repository.

Known Issues and WIP Features

There are a few remaining things that we're working on closing out which limit the set of available features.

Configuration Size

Currently, the neural network configuration must fit completely in one of DANA's configuration cache memories. We plan to enable the ability for weight data to be loaded as needed for large configurations that do not wholly fit in a cache memory.

Linux Support

We're working on a full integration of the X-FILES supervisor library with the Linux kernel. Supervisor features are currently supported via system calls added to the RISC-V Proxy Kernel via an included patch.

IO Queues

While neural network configurations are loaded from the memory of the microprocessor, all input and output data is transferred from Rocket to DANA hardware through the Rocket Custom Coprocessor (RoCC) register interface. We have plans to enable asynchronous transfer through in-memory queues.

Additional Documentation

Additional documentation can be found in the xfiles-dana/doc directory or in some of our publications.

Attribution

If you use this for research, please cite the original PACT paper:

@inproceedings{eldridge2015,
  author    = {Schuyler Eldridge and
               Amos Waterland and
               Margo Seltzer and
               Jonathan Appavoo and
               Ajay Joshi},
  title     = {Towards General-Purpose Neural Network Computing},
  booktitle = {2015 International Conference on Parallel Architecture and Compilation,
               {PACT} 2015, San Francisco, CA, USA, October 18-21, 2015},
  pages     = {99--112},
  year      = {2015},
  url       = {http://dx.doi.org/10.1109/PACT.2015.21},
  doi       = {10.1109/PACT.2015.21},
  timestamp = {Wed, 04 May 2016 14:25:23 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/conf/IEEEpact/EldridgeWSAJ15},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}

Doc Directory

Specific documentation includes:

Publications

[1] S. Eldridge, A. Waterland, M. Seltzer, J. Appavoo, and A. Joshi, "Towards General Purpose Neural Network Computing", in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 2015.

[2] S. Eldridge, "Neural Network Computing Using On-Chip Accelerators", Boston University. 2016.

Workshop Presentations and Posters

[3] S. Eldridge., T. Unger, M. Sahaya Louis, A. Waterland, M. Seltzer, J. Appavoo, and A. Joshi, "Neural Networks as Function Primitives: Software/Hardware Support with X-FILES/DANA", Boston Area Architecture Workshop (BARC). 2016.

Contributors and Acknowledgments

The following people, while not mentioned in the commit log, have contributed directly or indirectly to the development of this work:

This work was funded by a NASA Space Technology Research Fellowship.