# XiangShan 中的性能计数器

**Performance counter in XiangShan**

在运行基准测试时，我们需要对硬件的行为（性能事件）进行采集和记录，以便进行分析和调优。根据不同行为的特征，我们可能需要不同类型的性能计数器。为了满足这些需求，XiangShan 的 RTL 中实现了以下三类性能计数器：

- Accumulate：基础的累加型计数器，每当性能事件发生时进行累加；
- Histogram：统计性能事件发生时的数值分布；
- Rolling：类似于分段的 Accumulate，统计整个运行过程中每个小片段的性能事件数的变化。

When running benchmarks, we need to collect and record hardware behavior (performance events) for analysis and tuning. Depending on the characteristics of different behavior, we may require different types of performance counters. To meet these needs, XiangShan's RTL implements the following three types of performance counters:

- Accumulate: Basic counter that accumulates whenever a performance event occurs;
- Histogram: Records the distribution of values when performance events occur;
- Rolling: Similar to segmented Accumulate, it tracks the changes in the number of performance events in each segment throughout the entire run.

使用伪代码描述这三类性能计数器的行为如下：

The behavior of these three types of performance counters can be described using pseudocode as follows:

- Accumulate:
  ```c
  if (valid)
    counter += diff;
  ```
- Histogram:
  ```c
  if (valid)
    distribution[value / step] += 1;
  ```
- Rolling:
  ```c
  if (valid)
    counters[segment] += diff;
  if (cycles++ == segment_size) {
    cycles = 0;
    segment++;
  }
  ```

## Accumulate & Histogram

这两种类型的性能计数器在仿真结束后打印到 stderr。由于这些数据太大了，因此我们只展示最后的 100 条，并选择 ROB 统计的指令提交总数作为 Accumulate 的例子，选择 L2 Cache 重填延迟的分布作为 Histogram 的例子。

These two types of performance counters are printed to stderr when the simulation ends. Since the data is too large, we only show the last 100 entries, selecting the total number of instructions committed as an example of Accumulate, and the distribution of L2 Cache acquire latency as an example of Histogram.

两个例子在 XiangShan 的 RTL 代码中分别如下所示：

The two examples are as follows in the RTL code of XiangShan:

```scala
def ifCommitReg(counter: UInt): UInt = Mux(isCommitReg, counter, 0.U)
XSPerfAccumulate("commitInstr", ifCommitReg(trueCommitCnt), XSPerfLevel.CRITICAL)
```

```scala
XSPerfHistogram("acquire_period", acquire_period, acquire_period_en, 0, 30, 1, true, true)
XSPerfHistogram("acquire_period", acquire_period, acquire_period_en, 30, 100, 5, true, true)
XSPerfHistogram("acquire_period", acquire_period, acquire_period_en, 100, 200, 10, true, true)
XSPerfHistogram("acquire_period", acquire_period, acquire_period_en, 200, 1000, 100, true, true)
XSPerfHistogram("acquire_period", acquire_period, acquire_period_en, 1000, 5000, 1000, true, false)
```

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

$(get_asset emu-precompile/emu) \
    -i $(get_asset workload/hello-riscv64-xs.bin) \
    --no-diff 2>stderr.log

echo "=== Last 10 lines:"
tail -n 10 stderr.log

echo "=== Example of XSPerfAccumulate: rob commitInstr"
grep -n "rob: commitInstr," stderr.log

echo "=== Example of XSHistogram: l2cache acquire period"
grep -n "l2cache.slices_0.mshrCtl: acquire_period" stderr.log

在 [https://github.com/OpenXiangShan/env-scripts/blob/main/perf/perf.py](https://github.com/OpenXiangShan/env-scripts/blob/main/perf/perf.py) 可以找到更多的数据分析脚本。

You can find more data analysis scripts at [https://github.com/OpenXiangShan/env-scripts/blob/main/perf/perf.py](https://github.com/OpenXiangShan/env-scripts/blob/main/perf/perf.py).

## Rolling

前面两种计数器无法反应一个程序执行过程中不同片段的特征差异，可能会错过某个微架构修改对某个特定片段（关键区域）的影响。所以我们需要 rolling 分析。

The previous two types of counters cannot reflect the characteristic differences of different segments during the execution of a program, so we may miss the impact of a certain microarchitecture modification on a specific segment (critical region). Therefore, we need rolling analysis.

![rolling](../images/03-performance/02-xsperf/rolling-en.png)

这种类型的性能计数器借助了 02-functional/04-chiseldb 中介绍的 ChiselDB 框架，将采集到的数据存储到一个 SQLite3 数据库文件中。

This type of performance counter utilizes the ChiselDB framework introduced in 02-functional/04-chiseldb to store the collected data into a SQLite3 database file.

要启用 RollingDB，需要在编译时指定 `WITH_ROLLINGDB=1`，并在运行时指定 `--dump-db` 参数。

To enable RollingDB, you need to specify `WITH_ROLLINGDB=1` during compilation and use the `--dump-db` parameter at runtime.

⚠️注意：如果您正在 tutorial 的演示服务器上阅读此 notebook，请不要重新编译 XiangShan，这会需要很长的时间和大量计算资源。

⚠️Note: If you are reading this notebook on the tutorial demo server, please do not recompile XiangShan, as it will take a long time and consume a lot of computing resources.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

#make clean

# compile emu with rolling db enabled
#make emu \
#    EMU_THREADS=4 \
#    WITH_CHISELDB=1 \
#    WITH_ROLLINGDB=1 \
#    -j8

# run emu with rolling db enabled
#./build/emu \
#    -i $(get_asset workload/coremark-2-iteration.bin) \
#    --diff $(get_asset workload/riscv64-nemu-interpreter-so) \
#    --dump-db

mkdir -p ${WORK_DIR}/03-performance/02-xsperf

# copy the latest generated rolling db to tutorial dir
#cp $(find ./build/ -type f -name "*.db" | tail -1) \
#    ${WORK_DIR}/03-performance/02-xsperf/xs-perf-rolling.db

# for tutorial: copy a pre-generated rolling db to tutorial dir
cp $(get_asset emu-perf-result/xs-perf-rolling.db) \
    ${WORK_DIR}/03-performance/02-xsperf/xs-perf-rolling.db

在获得数据库文件后，我们使用 python 脚本对其进行分析。

After obtaining the database file, we use a python script to analyze it.

下面的示例中，我们使用 rollingplot.py 脚本对 ipc 数据进行绘图。

In the following example, we use the rollingplot.py script to plot ipc data.

采集 ipc 数据在 XiangShan 的 RTL 代码中如下所示：

Gathering ipc data in XiangShan's RTL code is as follows:

```scala
// every 1000 cycles
XSPerfRolling("ipc", ifCommitReg(trueCommitCnt), 1000, clock, reset)
```

In [None]:
%%bash
cd .. && source env.sh
cd ${WORK_DIR}/03-performance/02-xsperf

# Use python scripts to analyze the rolling db, for example, plot ipc
python3 ${NOOP_HOME}/scripts/rolling/rollingplot.py \
    ./xs-perf-rolling.db \
    ipc

ls -lh ${WORK_DIR}/03-performance/02-xsperf/results/perf.png

该脚本会输出以下图片，可以看到 XiangShan 在运行这一程序时，每一段时间内的 IPC 变化：

The script outputs the following image, showing the IPC changes of XiangShan over time while running this program:

![perf](../work/03-performance/02-xsperf/results/perf.png)

如果图片没有正确加载，您可以尝试关闭 notebook，然后重新打开。

If the image does not load correctly, you can try closing the notebook and reopening it.