Gemmini-SEA

The Gemmini-SEA features a Sign-Separated Accumulation (SEA)-based weight stationary systolic array implementation. This design is focused on enhancing the resource efficiency of floating-point (FP) addition/accumulation operations, which are crucial in DNN accelerators. Unlike traditional accumulation, the Gemmini-SEA architecture innovatively accumulates same-signed terms separately using efficient same-signed FP adders, followed by a final addition of oppositely-signed sub-accumulations. This approach not only leads to a substantial improvement in overall resource efficiency but also maintains accuracy by not introducing approximations.

Design Overview

Original Weight Stationary Systolic Array

In the systolic array, we perform matrix multiplication $C = A*B + D$, where $A$, $B$, and $D$ represent activations, weights, and partial sums, respectively. In a weight stationary systolic array, the weights are preloaded into the array before the actual computation begins.

On the right side of the array, each processing element (PE) starts by loading its weight, denoted as $b$, into a specific register. Once all the weights are loaded, each PE then begins to process input activations, referred to as $a$. These activations are either passed from the adjacent left PE or come directly from the primary input for PEs at the left edge. Simultaneously, each PE receives partial sums, labeled as $d$, from the PE above or directly from the main input for the top row of PEs.

As the computation progresses, each PE transmits its received $a$ and $d$ values to the neighboring PE on its right and below, respectively. Concurrently, it calculates the product of its current input activation $a$ and the pre-stored weight $b$. This product is then added to the received partial sum $d$ to generate a new partial sum, also labeled as $d$. This newly computed $d$ is then sent down to the PE located directly below.

Note the blue FP adder in PE is the generic FP adder.

SEA-based Weight Stationary Systolic Array

In our SEA-based systolic array, we focus on separately accumulating quantities with the same sign. This means we're dealing with two different kinds of partial sums, labeled as $d$ and $d'$, which each PE handles and moves along. These partial sums have opposite signs and flow from top to bottom in the array. As shown in our diagram, every PE has two partial sums inputs $d$ and $d'$, and two pipeline registers help pass these along. At the very bottom of the array, we place a generic FP adder to wrap up by adding $d$ and $d'$ together. An initialization setup at the top of systolic array to make sure the first row of PEs starts with two inputs

PEs take in two distinct partial sums ($d$ and $d'$) with opposing signs. Inside each PE, you'll find a multiplier, a register $b$, and a same-signed FP adder. A key part of this design is the swapping mechanism, made up of two multiplexers and an XOR gate. It makes sure that the partial sum being processed by the same-signed adder matches the sign of the product of the input activation and weight ($a × b$). The other partial sum doesn't change. So, you end up with an updated $d$ that's either $a × b + d$ or $a × b + d'$, depending on how the signs line up in the same-signed FP adder. There's also a bypass path for the partial sum that's not being used, which gets sent as the new $d'$ to the next PE down the line. This way, the $d'$ that gets passed on always has a different sign than the $a × b$ that's being processed.

This design does not introduce any approxmation, despite the little difference that come from FP operations not being associative. Even though it looks like we've upped the number of logic components and FP adders, our design actually leads to lower ADP, energy. Please check our paper for details.

Getting started

Dependecies

Our implementation is based on Gemmini V0.6.4.

Installation

git clone git@github.com:AaronJing/Chipyard-SEA.git
cd Chipyard-SEA
git checkout sea
./scripts/init-submodules-no-riscv-tools.sh
./scripts/build-toolchains.sh esp-tools
source env.sh

cd generators/gemmini
git fetch && git checkout sea
git submodule update

cd -
cd toolchains/esp-tools/riscv-isa-sim/build
git fetch && git checkout sea
make && make install

Verify Installation

cd Chipyard-SEA/generators/gemmini
./scripts/setup-paths.sh
./scripts/build-verilator.sh
cd software/gemmini-rocc-tests
./build.sh
cd -
./scripts/run-verilator.sh template

You should expect some output without any errors.

Running Baremetal test using Verilator

You can generate SEA-based implementation by modifying configs/GemminiCustomConfigs.scala

sea = true,
samesigned = true,

Or generate original implementation

sea = false,
samesigned = false,

Then, run Baremetal test matmul_ws_sea. Note that the inputType, spatialArrayOutputType and accType of matmul_ws_sea are BF16, FP32 and FP32, respectively. If you generate gemmini with other data types, this test cannot be performed.

matmul_ws_sea contains 100 GEMM tests, each test contains two BF16 4-by-4 matrices and outputs one 4-by-4 matrix.

./scripts/build-verilator.sh
cd software/gemmini-rocc-tests
./build.sh
cd -
./scripts/run-verilator.sh matmul_ws_sea

Cite us if it helps your research :)

@INPROCEEDINGS{gong2024,
  author={Gong, Jing and Saadat, Hassaan and Javaid, Haris and Gamaarachchi, Hasindu and Taubman, David and Parameswaran, Sri},
  booktitle={To appear: 2024 Design Automation and Test in Europe (DATE)}, 
  title={SEA: Sign-Separated Accumulation Scheme for Resource-Efficient DNN Accelerators}, 
  year={2024}}

Name		Name	Last commit message	Last commit date
Latest commit History 678 Commits
.github		.github
img		img
project		project
scripts		scripts
software		software
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CHIPYARD.hash		CHIPYARD.hash
LICENSE		LICENSE
README.md		README.md
SPIKE.hash		SPIKE.hash
build.sbt		build.sbt
scalastyle-config.xml		scalastyle-config.xml
scalastyle-test-config.xml		scalastyle-test-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemmini-SEA

Design Overview

Original Weight Stationary Systolic Array

SEA-based Weight Stationary Systolic Array

Getting started

Dependecies

Installation

Verify Installation

Running Baremetal test using Verilator

Cite us if it helps your research :)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gemmini-SEA

Design Overview

Original Weight Stationary Systolic Array

SEA-based Weight Stationary Systolic Array

Getting started

Dependecies

Installation

Verify Installation

Running Baremetal test using Verilator

Cite us if it helps your research :)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages