Pimsim-nn is a simulator designed for RRAM-/PIM-based neural network accelerators. By taking an instruction sequence as input, pimsim-nn evaluates performance (inference latency and/or throughput), power dissipation, and energy consumption under a given architecture configuration.
Pimsim-nn should be used with an associated compiler, pimcomp-nn. The compiler accepts an ONNX file and the architecture configuration (same as the architecture configuration used in pimsim-nn) as inputs and produces the instruction sequence.
The ISA of PIMSIM-NN is based on the document published here.
- cmake >= 3.6
- gcc >= 4.8.5
Cmake is used to build the whole project, run codes below:
cd pimsim-nn
mkdir build
cd build
cmake ..
make
In build
directory, checkout executable file ChipTest
.
There is a built-in resnet-18 example. Configuration and instructions file is under folder test/resnet18
. Use codes below to simulate resnet-18:
ChipTest ~/pimsim-nn/test/resnet18/full.gz ~/pimsim-nn/test/resnet18/config.json
outputs:
SystemC 2.3.4-Accellera --- Jul 4 2023 15:44:33
Copyright (c) 1996-2022 by all Contributors,
ALL RIGHTS RESERVED
Loading Inst and Config
Load finish
Reading Inst From Json
hereRead finish
Start Simulation
Progress --- <10%>
Progress --- <20%>
Progress --- <30%>
Progress --- <40%>
Progress --- <50%>
Progress --- <60%>
Progress --- <70%>
Progress --- <80%>
Progress --- <90%>
Simulation Finish
|*************** Simulation Report ***************|
Basic Information:
- config file: ../test/resnet18/config.json
- inst file: ../test/resnet18/full.gz
- verbose level: 0
- core count: 136
- simulation mode: 0
- simulation time: 200 ms
Chip Simulation Result:
- output count: 2.24 samples
- throughput: 11.2 samples/s
- average latency: 89.5 ms
- average power: 6.09e+03 mW
- average energy: 5.45e+11 pJ/it
Pimsim-nn assumes a chip consists of many cores connected via NoC, and the core architecture is shown below:
The architecture of core is very similar to a RISC processor, but with four dedicated execute units, namely Scalar Unit, Vector Unit, Matrix Unit and Transfer Unit. Scalar Unit is used to process scalar operations. Vector Unit performs vector-vector operations. Matrix Unit is mainly composed of RRAM crossbar arrays and executes matrix-vector multiply efficiently. Transfer Unit is responsible for inter-core data exchange and synchronization.
Simulator requires three files:
- Architecture Configuration file
- NoC Configuration file
- Program Instructions file
The architecture configuration file primarily defines the latency and power of different components in the simulator. The NoC configuration file gives the latency and power of NoC. Actually, NoC configuration is a part of the architecture configuration, but is separated as an independent file due to the large number of configuration parameters it requires. For simplicity, there is a parameter in architecture configuration that indicates the path of NoC configuration file and the simulator can load NoC configuration automatically. The program instruction file is generated by pimcomp-nn.
Finally, only two inputs are required: one is the path of program instruction file, and the other is the path of architecture configuration file.
ChipTest path_to_program_instructions_file path_to_archtecture_configuration_file
There are some parameters in architecture configuration file to change simulation behavior.
Parameter | Description |
---|---|
sim_time | sim_time represents simulation time in unit ms |
sim_mode | When set to 0 , simulator assumes enough input samples and reports throughout rate. When set to 1 , simulator will only process one input sample and gives its latency. |
report_verbose_level | When set to 0 , simulator will only give chip level performance and power consumption statistics. When set to 1 , simulator will also give core level statistics. |
- Xinyu Wang (Institute of Computing Technology, Chinese Academy of Sciences)