# Checkpoint

为了评估性能，我们通常需要通过仿真（例如，使用 verilator 进行纯软件仿真，或使用 FPGA 或模拟器进行硬件加速仿真）来运行基准测试。

To evaluate performance, we usually run benchmark suites via simulation (i.e. software simulation using verilator, hardware-accelerated simulation using FPGA/emulator).

然而，现有的方针手段各自存在问题：
- 纯软件仿真太慢了；
- FPGA片上资源受限，难以用在 XiangShan 这样的复杂设计上；
- 模拟器太贵了。

However, existing approaches each have their own challenges:
- Software simulation is too slow;
- FPGA has limited on-chip resources, making it difficult to use for complex designs like XiangShan;
- Emulators are too expensive.

我们了解到有一些工作在尝试加速软件仿真、或改进 FPGA 的可用性。与此同时，我们认为利用 checkpoint 机制来减少需要仿真的指令数、提高仿真并行度也是一个更简单有效的思路。

We have seen some works trying to accelerate software simulation or improve FPGA usability. Meanwhile, we think using checkpointing to reduce the number of instructions that need to be simulated and increase simulation parallelism is also a simpler and more effective approach.

![intro](../images/03-performance/01-checkpoint/intro-en.png)

Checkpoint 机制简单说就是选取程序运行过程的一些片段，保存下这些片段开始时的体系结构状态（即寄存器和内存），以后每次仿真时都从保存的状态开始运行，直到片段结束。

Checkpointing simply means selecting some segments of a program's execution, saving the architectural state (i.e. registers and memory) at the beginning of these segments, and later starting simulation from the saved state each time, until the end of the segment.

来自同一程序的不同片段可以并行仿真，从而提高仿真并行度。

Different segments from the same program can be simulated in parallel, thus increasing simulation parallelism.

随后，通过对每个片段采集到的性能数据进行加权平均，可以估算运行整个程序的性能数据。

Then, by taking a weighted average of the performance data collected for each segment, we can estimate the performance data for running the entire program.

常见的选取片段的方法有两种：
1. 均一采样，即每隔固定指令数选取一个片段；
2. SimPoint，即通过 profiling 选取能代表程序整体行为的片段。

Common methods for selecting segments include:
1. Uniform sampling, i.e., selecting a segment every fixed number of instructions;
2. SimPoint sampling, i.e., selecting segments that can represent the overall behavior of the program by profiling.

![method](../images/03-performance/01-checkpoint/method-en.png)

我们将在本节中演示 SimPoint 如何对程序进行 profiling，生成 checkpoint，并使用 checkpoint 进行仿真。

In this section, we will demonstrate how SimPoint profiles a program, generates checkpoints, and runs simulations using checkpoints.

本节中会用到一些与 `../env.sh` 不同的路径和常量，为了方便使用，我们创建了一个 `01-env.sh`，在本节中我们将使用该脚本设置环境变量，您可以运行下面的单元格查看这些环境变量

This section will use some paths and constants different from `../env.sh`. For convenience, we have created a `01-env.sh`. In this section, we will use this script to set environment variables. You can run the following cell to view these environment variables.

In [None]:
%%bash
source 01-env.sh

env | grep WORKLOAD= # workload to be simulated / profiled / checkpointed
env | grep CHECKPOINT_INTERVAL=
env | grep NEMU=
env | grep _HOME
env | grep _PATH

要进行切片，我们首先需要编译 SimPoint 和 NEMU（切片模式），并生成切片恢复器。

To perform checkpointing, we need to compile SimPoint and NEMU (in checkpoint mode), and generate a checkpoint restorer first.

In [None]:
%%bash
source 01-env.sh

cd ${NEMU_HOME}
git submodule update --init

# Compile simpoint generator
cd ${NEMU_HOME}/resource/simpoint/simpoint_repo
make clean
make

# Compile NEMU in checkpoint mode
cd ${NEMU_HOME}
make clean
make riscv64-xs-cpt_defconfig
make -j8

# Generate checkpoint restorer for ${WORKLOAD}
cd ${NEMU_HOME}/resource/gcpt_restore
rm -rf ${GCPT_PATH}
make -C ${NEMU_HOME}/resource/gcpt_restore/ \
    O=${GCPT_PATH} \
    GCPT_PAYLOAD_PATH=$(get_asset workload/${WORKLOAD}.bin) \
    CROSS_COMPILE=riscv64-linux-gnu-

接下来，我们需要使用 NEMU 运行要进行切片的程序，来收集程序行为用于 profiling。

Next, we need to run the program to be checkpointed using NEMU to collect program behavior for profiling.

In [None]:
%%bash
source 01-env.sh

rm -rf ${RESULT_PATH}

_LOG_PATH=${LOG_PATH}/profiling
mkdir -p ${_LOG_PATH}

# ${GCPT}:            workload is checkpoint restorer
# -w:                 the actual workload is ${WORKLOAD}
# -D:                 the result will be saved to ${RESULT_PATH}
# -C:                 task name is profiling
# --simpoint-profile: run simpoint
# --cpt-interval:     checkpoint interval (in instructions)
${NEMU} ${GCPT} \
    -w ${WORKLOAD} \
    -D ${RESULT_PATH} \
    -C profiling \
    -b \
    --simpoint-profile \
    --cpt-interval ${CHECKPOINT_INTERVAL} \
    > >(tee ${_LOG_PATH}/${WORKLOAD}-out.txt) 2> >(tee ${_LOG_PATH}/${WORKLOAD}-err.txt)


进而，使用 SimPoint 对采集到的程序行为进行聚类分析，选取程序片段。

Then, use SimPoint to perform clustering analysis on the collected program behavior, selecting segments.

In [None]:
%%bash
source 01-env.sh

CLUSTER=${RESULT_PATH}/cluster/${WORKLOAD}
mkdir -p ${CLUSTER}

random1=`head -20 /dev/urandom | cksum | cut -c 1-6`
random2=`head -20 /dev/urandom | cksum | cut -c 1-6`

_LOG_PATH=${LOG_PATH}/cluster
mkdir -p ${_LOG_PATH}

# -loadFVFile          # load a frequency vector file
# -saveSimpoints       # file to save simpoints
# -saveSimpointWeights # file to save simpoints weights
# -inputVectorsGzipped # input vectors have been compressed with gzip
# -maxK                # maximum number of clusters to use
# -numInitSeeds        # times of different random initialization for each run, taking only the best clustering
# -iters               # maximum number of iterations that should perform
# -seedkm              # random seed for choosing initial k-means centers
# -seedproj            # random seed for random linear projection
${SIMPOINT} \
    -loadFVFile ${PROFILING_RESULT_PATH}/${WORKLOAD}/simpoint_bbv.gz \
    -saveSimpoints ${CLUSTER}/simpoints0 \
    -saveSimpointWeights ${CLUSTER}/weights0 \
    -inputVectorsGzipped \
    -maxK 3 \
    -numInitSeeds 2 \
    -iters 1000 \
    -seedkm ${random1} \
    -seedproj ${random2} \
    > >(tee ${_LOG_PATH}/${WORKLOAD}-out.txt) 2> >(tee ${_LOG_PATH}/${WORKLOAD}-err.txt) 

最后，使用 NEMU 重新运行需要采样的程序，生成 checkpoint。

Finally, use NEMU to rerun the program that needs to be checkpointed to generate checkpoint files.

In [None]:
%%bash
source 01-env.sh

CLUSTER=${RESULT_PATH}/cluster
_LOG_PATH=${LOG_PATH}/checkpoint
mkdir -p ${_LOG_PATH}

${NEMU} ${GCPT} \
    -w ${WORKLOAD} \
    -D ${RESULT_PATH} \
    -C checkpoint \
    -b \
    -S ${CLUSTER} \
    --cpt-interval ${CHECKPOINT_INTERVAL} \
    > >(tee ${_LOG_PATH}/${WORKLOAD}-out.txt) 2> >(tee ${_LOG_PATH}/${WORKLOAD}-err.txt)


前往目录 `${RESULT_PATH}/checkpoints`，可以看到生成的 checkpoint 文件，共有 cluster 数量个 `.gz` 文件，文件名上表明了 checkpoint 的权重。

Go to the directory `${RESULT_PATH}/checkpoints`, you can see the generated checkpoint files, a total of cluster number of `.gz` files, with the weight of the checkpoint indicated in the file name.

我们可以使用 emu 运行一下采集到的 checkpoint，看看效果。

We can use emu to run one of the generated checkpoints and see the effect.

emu 检测到文件是 gzip 压缩的 checkpoint 时，会自动进行解压缩，并从 checkpoint 恢复内存状态和体系结构状态。

When emu detects that the file is a gzip-compressed checkpoint, it will automatically decompress it and restore the memory state and architectural state from the checkpoint.

In [None]:
%%bash
source 01-env.sh

CHECKPOINT=$(find ${RESULT_PATH}/checkpoint/${WORKLOAD} -type f -name "*_.gz" | tail -1)

$(get_asset emu-precompile/emu) \
    -i ${CHECKPOINT} \
    --diff $(get_asset emu-precompile/riscv64-nemu-interpreter-so) \
    --max-cycles=50000 \
    2>/dev/null


下面的 python 脚本可以生成 checkpoint 的配置文件，便于 GEM5/XiangShan 的批量运行。

The following python script can generate configuration files for checkpoints, facilitating batch runs in GEM5/XiangShan.

In [None]:
import os
import re
import json
from pathlib import Path
from itertools import product

app_list = [
    "bwaves", "gamess_cytosine", "gamess_gradient", "gamess_triazolium",
    "milc", "zeusmp", "gromacs", "cactusADM", "leslie3d", "namd", "dealII",
    "soplex_pds-50", "soplex_ref", "povray", "calculix", "GemsFDTD", "tonto",
    "lbm", "wrf", "sphinx3"
]

spec_2017_list = [
    "bwaves_1", "bwaves_2", "bwaves_3", "bwaves_4", "cactuBSSN", "namd",
    "parest", "povray", "lbm", "wrf", "blender", "cam4", "imagick", "nab",
    "fotonik3d", "roms", "perlbench_diff", "perlbench_spam", "perlbench_split",
    "gcc_pp_O2", "gcc_pp_O3", "gcc_ref32_O3", "gcc_ref32_O5", "gcc_small_O3",
    "mcf", "omnetpp", "xalancbmk", "x264_pass1", "x264_pass2", "x264_seek",
    "deepsjeng", "leela", "exchange2", "xz_cld", "xz_combined", "xz_cpu2006"
]

spec2017_int_list = [
    "perlbench_diff", "perlbench_spam", "perlbench_split", "gcc_pp_O2",
    "gcc_pp_O3", "gcc_ref32_O3", "gcc_ref32_O5", "gcc_small_O3", "mcf",
    "omnetpp", "xalancbmk", "x264_pass1", "x264_pass2", "x264_seek",
    "deepsjeng", "leela", "exchange2", "xz_cld", "xz_combined", "xz_cpu2006"
]

spec2017_fp_list = list(set(spec_2017_list) - set(spec2017_int_list))


def profiling_instrs(profiling_log, spec_app, using_new_script=False):
    regex = r".*total guest instructions = (.*)\x1b.*"
    new_path = os.path.join(profiling_log, spec_app, "profiling.out.log")
    old_path = os.path.join(profiling_log, "{}-out.txt".format(spec_app))

    if using_new_script:
        path = new_path
    else:
        path = old_path

    with open(path, "r", encoding="utf-8") as f:
        for i in f.readlines():
            if "total guest instructions" in i:
                match = re.findall(regex, i)
                match = match[0].replace(',', '')
                return match
        return 0


def cluster_weight(cluster_path, spec_app):
    points = {}
    weights = {}

    weights_path = f"{cluster_path}/{spec_app}/weights0"
    simpoints_path = f"{cluster_path}/{spec_app}/simpoints0"

    with open(weights_path, "r") as f:
        for line in f.readlines():
            a, b = line.split()
            weights.update({"{}".format(b): "{}".format(a)})

    with open(simpoints_path, "r") as f:
        for line in f.readlines():
            a, b = line.split()
            points.update({a: weights.get(b)})

    return points


def per_checkpoint_generate_json(profiling_log, cluster_path, app_list,
                                 target_path):
    result = {}
    for spec in app_list:
        result.update({
            spec: {
                "insts": profiling_instrs(profiling_log, spec),
                'points': cluster_weight(cluster_path, spec)
            }
        })
    with open(os.path.join(target_path), "w") as f:
        f.write(json.dumps(result))


def per_checkpoint_generate_worklist(cpt_path, target_path):
    cpt_path = cpt_path + "/"
    checkpoints = []
    for item in os.scandir(cpt_path):
        if item.is_dir():
            checkpoints.append(item.path)

    checkpoint_dirs = []
    for item in checkpoints:
        for entry in os.scandir(item):
            checkpoint_dirs.append(entry.path)

    with open(target_path, "w") as f:
        for i in checkpoint_dirs:
            path = i.replace(cpt_path, "")
            name = path.replace('/', "_", 1)
            print("{} {} 0 0 20 20".format(name, path), file=f)


def generate_result_list(base_path, times, ids):
    result_list = []

    for i, j, k in product(range(ids[0], times[0]), range(ids[1], times[1]),
                           range(ids[2], times[2])):
        cluster = f"cluster"
        profiling = f"profiling"
        checkpoint = f"checkpoint"
        result_list.append({
            "cl_res": os.path.join(base_path, "result", cluster),
            "profiling_log": os.path.join(base_path, "logs", profiling),
            "checkpoint_path": os.path.join(base_path, "result", checkpoint),
            "json_path": os.path.join(base_path, "result", checkpoint, f"{cluster}.json"),
            "list_path": os.path.join(base_path, "result", checkpoint, "checkpoint.lst"),
        })

    print("Result list:")
    print(json.dumps(result_list, indent=2, separators=(",", ": ")))
    return result_list



def dump_result(base_path, spec_app_list, times, ids):
    result_list = generate_result_list(base_path, times, ids)

    for result in result_list:
        per_checkpoint_generate_json(result["profiling_log"], result["cl_res"],
                                     spec_app_list, result["json_path"])
        per_checkpoint_generate_worklist(result["checkpoint_path"],
                                         result["list_path"])


# NOTE: should be same with 01-checkpoint-env.sh
spec_list=["stream_100000"]
base_path = os.path.join(os.getcwd(), "..", "work", "03-performance", "01-checkpoint")
times = [1, 1, 1]
ids = [0, 0, 0]

dump_result(base_path, spec_list, times, ids)

结果生成在 `${RESULT_PATH}/checkpoints` 目录下。

The results are generated in the `${RESULT_PATH}/checkpoints` directory.

In [None]:
%%bash
source 01-env.sh

# list file for GEM5
cat ${RESULT_PATH}/checkpoint/checkpoint.lst
echo

# json file for XiangShan
cat ${RESULT_PATH}/checkpoint/cluster.json

附：Checkpoint 的文件结构如下所示（低地址在上）：

File structure of a checkpoint is as follows(low address at the top):

![structure](../images/03-performance/01-checkpoint/structure-en.png)