## Eyeriss

An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks



### Memory Access is the Bottleneck





# Memory Read MAC\* Memory Write DRAM DRAM DRAM

all memory R/W are **DRAM** accesses => worst case



- data reuse
- local accumulation



#### Architecture



#### **Local Memory Hierarchy**

- Global Buffer
- Direct inter-PE network
- PE-local memory (RF)

## Processing Element (PE)









National ChungHsing University

## Dataflow Taxonomy

- Weight Stationary
- Output Stationary
- Row Stationary



## Filter a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d e a b c d



### 



### 







Multimedia & Communication IC Design Lab



Iultimedia & Communication IC Design Lab



& Communication IC Design Lab

imedia







Filter rows are reused across PEs horizontally



Fmap rows are reused across PEs diagonally

### multiple channels





accumulate psums



### Run-Length Coding (RLC)

Input: 0, 0, 12, 0, 0, 0, 53, 0, 0, 22, ...

16b 5b 16b 5b

Run Level Run Level Run Level Term

Output (64b): 2 12 4 53 2 22



### Eyeriss system architecture





### Logical to Physical Mappings



