The Gemmini-SEA features a Sign-Separated Accumulation (SEA)-based weight stationary systolic array implementation. This design is focused on enhancing the resource efficiency of floating-point (FP) addition/accumulation operations, which are crucial in DNN accelerators. Unlike traditional accumulation, the Gemmini-SEA architecture innovatively accumulates same-signed terms separately using efficient same-signed FP adders, followed by a final addition of oppositely-signed sub-accumulations. This approach not only leads to a substantial improvement in overall resource efficiency but also maintains accuracy by not introducing approximations.
In the systolic array, we perform matrix multiplication
On the right side of the array, each processing element (PE) starts by loading its weight, denoted as
As the computation progresses, each PE transmits its received
Note the blue FP adder in PE is the generic FP adder.
In our SEA-based systolic array, we focus on separately accumulating quantities with the same sign. This means we're dealing with two different kinds of partial sums, labeled as
PEs take in two distinct partial sums (
This design does not introduce any approxmation, despite the little difference that come from FP operations not being associative. Even though it looks like we've upped the number of logic components and FP adders, our design actually leads to lower ADP, energy. Please check our paper for details.
Our implementation is based on Gemmini V0.6.4.
git clone git@github.com:AaronJing/Chipyard-SEA.git
cd Chipyard-SEA
git checkout sea
./scripts/init-submodules-no-riscv-tools.sh
./scripts/build-toolchains.sh esp-tools
source env.sh
cd generators/gemmini
git fetch && git checkout sea
git submodule update
cd -
cd toolchains/esp-tools/riscv-isa-sim/build
git fetch && git checkout sea
make && make install
cd Chipyard-SEA/generators/gemmini
./scripts/setup-paths.sh
./scripts/build-verilator.sh
cd software/gemmini-rocc-tests
./build.sh
cd -
./scripts/run-verilator.sh template
You should expect some output without any errors.
You can generate SEA-based implementation by modifying configs/GemminiCustomConfigs.scala
sea = true,
samesigned = true,
Or generate original implementation
sea = false,
samesigned = false,
Then, run Baremetal test matmul_ws_sea. Note that the inputType, spatialArrayOutputType and accType of matmul_ws_sea are BF16, FP32 and FP32, respectively. If you generate gemmini with other data types, this test cannot be performed.
matmul_ws_sea contains 100 GEMM tests, each test contains two BF16 4-by-4 matrices and outputs one 4-by-4 matrix.
./scripts/build-verilator.sh
cd software/gemmini-rocc-tests
./build.sh
cd -
./scripts/run-verilator.sh matmul_ws_sea
@INPROCEEDINGS{gong2024,
author={Gong, Jing and Saadat, Hassaan and Javaid, Haris and Gamaarachchi, Hasindu and Taubman, David and Parameswaran, Sri},
booktitle={To appear: 2024 Design Automation and Test in Europe (DATE)},
title={SEA: Sign-Separated Accumulation Scheme for Resource-Efficient DNN Accelerators},
year={2024}}




