<h1 style="text-align:center">Basic Introduction for XiangShan</h1>

# 00-Welcome to the XiangShan Tutorial

In this section, we will give some notes on this tutorial.

- Cells that start with %%bash are Bash scripts; the rest are Python code.

- Lines that start with # are comments.

You can click ▶ in the top-left corner of a cell to run that single cell; the output will be displayed below the cell.

In [None]:
%%bash
echo "Welcome to the XiangShan Tutorial!"

Each cell has its own working directory and environment variables, so some commands may need to be rerun. If you execute these commands directly in the shell, you can skip the repetitive parts, e.g. `source env.sh`.

In [None]:
%%bash
# Change the working directory in this cell.
cd ../
pwd

In [None]:
%%bash
# Changing the working directory in other cells does not affect this cell.
pwd

XiangShan has design documentation synchronized with development; the GitHub repository is [https://github.com/OpenXiangShan/XiangShan-Design-Doc](https://github.com/OpenXiangShan/XiangShan-Design-Doc)

We have also deployed the design documentation website at [https://docs.xiangshan.cc/projects/design](https://docs.xiangshan.cc/projects/design)


# First Run

In this section, we present the basic workflow for building and running XiangShan.

The bootcamp repository contains the environment setup scripts necessary for compiling and running XiangShan and can be cloned directly from GitHub.

In [None]:
%%bash
# For this tutorial, the local directory have been preconfigured; therefore, you do not need to execute these commands.
# The following commands are provided for reference.

# git submodule update --init --recursive # init submodule

Then, we can getting start!

The build and execution of XiangShan rely on specific environment variables, which are provisioned by the `env.sh` script. This script must be sourced whenever a new terminal session is started; to automate this, you can add it to your `.bashrc`. As shown in Section 00-welcome, within this tutorial each cell constitutes a fresh Bash environment; therefore, the script must be re-sourced in every cell.

In [None]:
%%bash
cd ../ && source env.sh

env | grep _HOME

Running the code block above completes the environment variable setup. After the setup, go to `$NOOP_HOME` (`xs-env/XiangShan`) to build XiangShan. 


The build parameters will be introduced later.

We can also use the tree command to view the project structure.

In [None]:
%%bash
tree -d -L 1 ..

XiangShan provides hundreds of user-configurable parameters, including:

- The processor core parameters. (`src/main/scala/top/Configs.scala`)
- The SoC parameters. (`src/main/resources/config/Default.yaml`)

Press Ctrl+P to open file search, then type the file name above to jump to it quickly.

With the configuration finalized, we can proceed to build XiangShan!

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

# Warning⚠️：Building XiangShan is highly resource‑intensive; For this tutorial, we’ve prepared a precompiled binary for you.
# Reference setup: 16 CPU cores, 64 GB RAM.
# make emu -j16 CONFIG=MinimalConfig

# Additional build options
# CONFIG=MinimalConfig  XiangShan configuration
# EMU_THREADS=4         Simulation thread count
# EMU_TRACE=1           Enable waveforms
# WITH_DRAMSIM=1        Simulate DRAM with DRAMSim3
# WITH_CHISELDB = 1     Enable ChiselDB
# WITH_CONSTANTIN = 1   Enable Constantin

The commands above will generate outputs like `build/emu` and `build/rtl`.

- build/rtl/*.sv is Verilog files generated by Chisel.
- build/emu is a simulation executable further compiled with Verilator.

You can run `./build/emu` to simulate XiangShan. 

Since we haven’t built emu in this tutorial, we’ll use the precompiled emu.

We’ll introduce the run-time arguments later.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

$(get_asset emu-precompile/emu) \
    -i $(get_asset workload/hello-riscv64-xs.bin) \
    --no-diff \
    2>/dev/null | outputBuffer

# Some key runtime parameters.
# -i                        Workload path
# -C / -I                   Maximum cycle count / Maximum instruction count
# --diff=PATH / --no-diff   Reference model path / Disable difftest

Note: XiangShan prints performance counters to stderr at the end of a run, and the output is very large. We recommend always redirecting stderr to a file. In the example above, since we don’t need the counters, we redirect it to `/dev/null`; adjust as needed.

# Build the workload using Nexus-AM


Xiangshan is a bare-metal device that requires an operating system to provide a runtime environment for running programs. However, operating systems like Linux are too **heavyweight** and inconvenient for rapid testing and iteration.

**We present a bare metal runtime environment called Nexus-AM**

**Purpose**

- Generate workloads agilely without an OS

- Provide runtime framework for bare metal machines like XiangShan

Nexus-AM is a bare-metal runtime and test-generation environment. It is lightweight and easy to use, implements basic system call interfaces and exception handlers, and supports multiple ISAs and configurations. 

The `am/` directory contains the Nexus-AM framework sources; `apps/` and `tests/` hold common workload sources, and you can create your own apps and tests.

For test generation, we may want to generate workloads agilely without an OS, and provide runtime framework for bare metal machines like XiangShan

So we present a bare metal runtime test generation environment called Nexus-AM,

It is light-weight, and easy to use; and implements system call interfaces and exception handlers;

what's more, it supports multiple ISAs and configurations


In [None]:
%%bash
cd .. && source env.sh
cd ${AM_HOME}

tree -d -L 1

echo apps: $(ls ./apps)
echo tests: $(ls ./tests)

We start with the "Hello, XiangShan" sample (`apps/hello`). Replace "Hello, XiangShan" with "Welcome to XiangShan Tutorial" 

Then compile Nexus-AM.

In [None]:
%%bash
cd ../ && source env.sh >/dev/null
cd $AM_HOME/apps/hello

# Use sed to replace "Hello, XiangShan" with "Welcome to XiangShan Tutorial".
sed -i 's/Hello, XiangShan/Welcome to XiangShan Turtorial/' hello.c

# compiling
make ARCH=riscv64-xs LINUX_GNU_TOOLCHAIN=1

# check output
ls -l build

After compilation, the following three files will be generated:
- hello-riscv64-xs.bin：Program binary image (The ELF header and other metadata was removed) for emu.
- hello-riscv64-xs.elf：The program's ELF file.
- hello-riscv64-xs.txt：The program’s disassembly for debugging

In [None]:
%%bash
cd ../ && source env.sh >/dev/null
cd $NOOP_HOME

# Use emu to run workload.
$(get_asset emu-precompile/emu) -i $AM_HOME/apps/hello/build/hello-riscv64-xs.bin --no-diff 2>/dev/null | outputBuffer

# Run the RTL simulation

XiangShan's emu supports many options; run emu --help to see usage.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

$(get_asset emu-precompile/emu) --help | outputBuffer

Unlike the 01-first-run section, this section we'll run a more complex program: CoreMark (2 iterations). The binary is already prepared in the `Xiangshan/ready-to-run` folder.

Running the full CoreMark can take 5–10 minutes. To save time, use the -C option to cap the simulation at 20,000 cycles.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

$(get_asset emu-precompile/emu) \
    -i $(get_asset workload/coremark-2-iteration.bin) \
    --no-diff \
    -C 20000 \
    2>/dev/null | outputBuffer

We've also prepared a fault-injection simulation program for XiangShan; feel free to try running it.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

# err 1
$(get_asset emu-precompile/emu-alu-err) \
    -i $(get_asset workload/coremark-2-iteration.bin) \
    --no-diff \
    -C 20000 \
    2>/dev/null | outputBuffer || true

The faulty simulation program does not correctly print “Running CoreMark for 2 iterations,” and when it reaches the 20,000-cycle limit, the PC is 0xE, clearly not the behavior of a normal program.



<span style="color:red; font-weight:700; font-size:24px;">
Great! 

We have learned the basic simulation process of Xiangshan.
</span>

<span style="color:black; font-weight:500; font-size:24px;">
    
Once we get the RTL simulator ready, what's next? 

How can we 

- find and fix functional bugs
- do performance analysis
- do research on micro-architecture


</span>

Great. We have learned the basic simulation process of XiangShan.

However, getting the RTL simulator ready is just the first step. 

We want to find and fix functioanl bugs;

We also want do further performance analysis and do research on micro-architecture.

Next slide will show our solutions.


![MinJie Cycle](../images/02-functional/00-intro/overal-en.png)

As we mentioned before, we use MinJie Platform to construct the workflow, which is shown as functional verification toolchain in the picture

The functional verification loop usually includes 4 parts:

 usually, the head of the loop is test generation: we developed nexus-am: generate bare-metal tests
 
 for bug detection: nemu is used to provide golden result, and difftest is the result comparison tool.
 
 to preserve the bug context, we will demonstrate the usage of lightsss
 
 the tail of the loop, aka, troubleshoot & bug fixation, we use waveform & chiselDB
 
We propose several tools for each step.


# NEMU: ISA Reference

To address the issue “How does the simulation program know it has already encountered an error?”, we must first define what “correct” means; in other words, we need a reference model.

We developed NEMU, a Spike-like ISA simulator. With targeted optimizations, NEMU achieves QEMU-class performance and exposes APIs to compare and verify XiangShan's architectural state.

In this section, we demonstrate how to compile and run NEMU.

NEMU provides two default configurations:
- xxx_defconfig：xxx Default settings for standalone run mode
- xxx-ref_defconfig：xxx As the default configuration for DiffTest co-simulation mode



In [None]:
%%bash
cd .. && source env.sh
cd ${NEMU_HOME}

make clean

# compile default config as standalone mode
make riscv64-xs_defconfig
make -j

make clean-softfloat

# compile default config as reference mode
make riscv64-xs-ref_defconfig
make -j

Execute CoreMark on NEMU.

Use the -b option to start NEMU in batch mode and avoid manually entering commands to run the workload.

In [None]:
%%bash
cd .. && source env.sh
cd ${NEMU_HOME}

./build/riscv64-nemu-interpreter \
    -b \
    $(get_asset workload/coremark-2-iteration.bin) | outputBuffer

# Difftest：ISA Co-simulation framework

To address the issue of when the simulation program fails, we introduce DiffTest, an ISA co-simulation framework. Flow: whenever the RTL core (DUT) commits an instruction or updates state, the ISA simulator(REF) executes the same instruction; DiffTest compares architectural state between the DUT and the REF. On any mismatch it halts and reports an error; otherwise it continues. 

<div align="center">
  <img src="../images/02-functional/02-difftest/difftest-arch-en.png" alt="difftest-arch">
</div>


Run the workload on XiangShan and use NEMU for Difftest.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

$(get_asset emu-precompile/emu) \
    -i $(get_asset workload/hello-riscv64-xs.bin) \
    --diff ${NEMU_HOME}/build/riscv64-nemu-interpreter-so \
    2>/dev/null | outputBuffer

You can run workloads on the prebuilt XiangShan processor with injected bugs and use NEMU for Difftest.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

$(get_asset emu-precompile/emu-alu-err) \
    -i $(get_asset workload/hello-riscv64-xs.bin) \
    --diff ${NEMU_HOME}/build/riscv64-nemu-interpreter-so \
    2>/dev/null | tee emu_err.log > /dev/null || true # tutorial：add "|| true" to avoid notebook errors; It's not needed in real usage.

echo "Difftest directly point out specific errors of registers and PC"

tail -n 7 emu_err.log


At PC 0x0080000078, the REF and DUT are not matched: a0 is 0 in the REF, but 0x2000 in the DUT.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

echo "----------------------------------------------------------------------------------------------"
echo "Difftest presents registers situation, incluing Inter/Float/CSR Registers"

tail -n 95 emu_err.log | head -n 19


In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

echo "-----------------------------------------------------------------------------------------------"
echo "Difftest shows error occurs in which commit group"

tail -n 124 emu_err.log | head -n 11

After Difftest detects an error, we can rerun the simulation and enable waveform output around the failing cycle reported by Difftest.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

mkdir -p build
rm -f ./build/*.vcd

$(get_asset emu-precompile/emu-alu-err) \
    -i $(get_asset workload/hello-riscv64-xs.bin) \
    --diff ${NEMU_HOME}/build/riscv64-nemu-interpreter-so \
    -b 8000 \
    -e 10000 \
    --dump-wave \
    2>/dev/null > /dev/null|| true

echo -n "Dump wave: "
realpath ./build/*.vcd

![LightSSS](../images/02-functional/03-lightSSS/lightSSS-overall-en.png)

As we know, if you want to reproduce a bug, re-run simulation is so time-consuming, especially for long workload like spec cpu.
So Snapshot is the way out. We have LightSSS, a light-weight simulation snapshot.

During simulation, LightSSS will record snapshots of the process with funtion fork().

When bug is detected, it will be waked up and generate waveform of several cycles from the latest snapshot before the bug occurred.

LightSSS have good scalability because you can make snapshots for any external models (such as model written in C++), and do not need to understand model details.

And the overhead of taking a snapshot is low, only about 500 micro-seconds. This is far less than the overhead of RTL snapshots from verilator.


If you see "the oldest checkpoint start to dump wave and dump nemu log...", LightSSS is active. The simulation will then restart from the latest snapshot and record waveforms.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

mkdir -p build
rm -f ./build/*.vcd

$(get_asset emu-precompile/emu-alu-err) \
    -i $(get_asset workload/hello-riscv64-xs.bin) \
    --diff ${NEMU_HOME}/build/riscv64-nemu-interpreter-so \
    --enable-fork \
    2> /dev/null | outputBuffer || true

echo -n "Dump wave: "
realpath ./build/*.vcd

# ChiselDB：Debug-friendly structured database

<!-- <div align="center">
  <img src="../images/02-functional/chiselDB-overview.png" alt="chiselDB-overview" style="width: auto; height: 50%;">
</div> -->

<img src="../images/02-functional/04-chiseldb/chiseldb-en.png" alt="chiselDB-overview" style="float:right; width:500px; margin-left:5px;">

**Motivation**

- Waveforms are large in size and hard to apply further analysis

- Need to analyze structured data like memory transaction trace

**We propose ChiselDB for storage of structured data to support faster bug localization.**

**Highlights**

- Inserting probes between module interfaces in hardware

- DPI-C: Using C++ function in Chisel code to transfer data

- Persist in database, SQL queries supported
<div style="clear:both;"></div>


LigtSSS is powerful but the waveform are still large in size and hard to apply further analysis
And we want to analyze structured data like memory transaction trace
So we present ChiselDB, a debug-friendly structured database.
It will insert probes between module interfaces in hardware,
and use DPI-C in Chisel code directly to transfer bundle info and data
As for bug analysis, SQL queries are supported so it's much more easy to use than waveform.


ChiselDB is a structured database for aiding functional and performance debugging. It uses a DPI-C interface to log Chisel Bundle data into an SQLite database.

Compared to Verilog wires, Chisel Bundles carry higher-level semantics—essentially groups of multiple wires. You can then access these structured data with SQL queries for analysis.

We provide a prebuilt simulator `emu-cdb-err` with an injected bug that forces all data released from L2 Cache to L3 Cache to a constant value.

Enable ChiselDB with `--dump-db` and turn on DiffTest; after running, DiffTest reports an error and a `.db` file is generated under `./build`.

In [None]:
%%bash
cd .. && source env.sh
cd ${NOOP_HOME}

rm -f ./build/*.db # clean old files
mkdir -p build

$(get_asset emu-precompile/emu-cdb-err) \
    -i $(get_asset workload/stream_100000.bin) \
    --diff $(get_asset emu-precompile/riscv64-nemu-interpreter-so) \
    --dump-db \
    2>linux.err || true

echo -n "Dump DB: "
realpath ./build/*.db

Then use SQLite to read the `.db` for analysis: query all TileLink transactions at address `0x800419c0`, and format the output with `./scripts/cache/convert_tllog.sh`.

In [None]:
%%bash
cd .. && source env.sh

DB=$(ls -t ${NOOP_HOME}/build/*db | head -n 1)

sqlite3 ${DB} "select * from TLLog where ADDRESS=0x80048f00" | sh ${NOOP_HOME}/scripts/cache/convert_tllog.sh | outputBuffer

Result: [Time/To_From/Channel/Opcode/Permission/Address/Data]

**Data successfully transferred from L1D to L2**

16171 L2_L1D_0 C ProbeAckData Shrink TtoN 0 5 80048f00 0000000080048f50 0000000080014328 0000000000000000 0000000000000000 user: 0 echo: 0 

16172 L2_L1D_0 C ProbeAckData Shrink TtoN 0 5 80048f00 0000000000000000 0000000000000000 000000008001e000 0000000080042060 user: 0 echo: 0 



**Data successfully transferred from L2 to L3**

16179 L3_L2_0 C ProbeAckData Shrink TtoN 0 2 80048f00 0000000080048f50 0000000080014328 0000000000000000 0000000000000000 user: 0 echo: 1 

16180 L3_L2_0 C ProbeAckData Shrink TtoN 0 2 80048f00 0000000000000000 0000000000000000 000000008001e000 0000000080042060 user: 0 echo: 1

**But when L1D acquires Eaddr again, data loaded from L3 is wrong**

16457 L2_L1D_0 A AcquireBlock Grow NtoT 0 0 80048f00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 user: 80048f07 echo: 0 

16463 L3_L2_0 A AcquireBlock Grow NtoT 0 0 80048f00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 user: 0 echo: 1 

16486 L3_L2_0 D GrantData Cap toT 1 0 80048f00 **0000000000abcdef** 0000000000000000 0000000000000000 0000000000000000 user: 0 echo: 1 

**So there must be something wrong when L3 records Release Data**


![TL-TEST](../images/02-functional/05-tl_test/tl-test-overall-en.png)


Co-verification of the Cache system with upstream modules is complex and prevents rapid iteration.

To address this issue, we developed TL-Test: a unit-level cache-system verification framework that supports the TileLink protocol, cache-coherence checking, and randomized test-case generation.

Here is another example to detect cache coherence violation by TL-Test. We inject a bug that wrongly shift the grant data.


In [None]:
%%bash
cd ../ && source env.sh

cat $(get_asset tltest-precompile/tlt_err.patch) | outputBuffer

TL-Test generates randomized tests and pinpoints a transfer problem at a specific address in our cache design. It logs all bus transactions; we then use grep to extract log for analysis.

You can run the demo on the prebuilt tl-test.

In [None]:
%%bash
cd ../ && source env.sh
# cd $TLT_HOME && make coupledL2-test-l2l3-v3 run THREADS_BUILD=16 CXX_COMPILER=clang++-17
# cd $TLT_HOME/run && ./tltest_v3lt 2>&1 | tee tltest_v3lt.log

get_asset tltest-precompile/tltest_err

cd assets/tltest-precompile && $(get_asset tltest-precompile/tltest_err) 2>&1 | tee tltest_v3lt.log > /dev/null

echo "run complete!"

tail -n 50 tltest_v3lt.log | head -n 15

Error Addr： 0x80

In [None]:
%%bash
cd ../ && source env.sh > /dev/null
# grep "addr: 0x80" $TLT_HOME/run/tltest_v3lt.log

cd assets/tltest-precompile && grep "addr: 0x80," tltest_v3lt.log | head -n 10

Result: [Time/INFO-Level/Node-Idx/Core/Channel/Opcode/Source/Address/alias/Data]

**L1D acquires Eaddr** 

[236] [tl-test-new-INFO] #0 L2[0].C[0] [fire A] [AcquirePerm NtoT] source: 0x3, addr: 0x80, alias: 0

**L1D release Eaddr, and data successfully transferred from L1D to L2**

[806] [tl-test-new-INFO] #0 L2[0].C[0] [fire C] [ReleaseData TtoN] source: 0x3, addr: 0x80, alias: 0, data: [ c7 a5 ... ]

[808] [tl-test-new-INFO] #0 L2[0].C[0] [fire C] [ReleaseData TtoN] source: 0x3, addr: 0x80, alias: 0, data: [ fe 14 ... ]

**but when L2 grant data of Eaddr, data loadad from L2 is error**

[2036] [tl-test-new-INFO] #0 L2[0].C[0] [fire D] [GrantData toT] source: 0xf, addr: 0x80, alias: 0x1, data: [ 00 c7 ... ]

[2038] [tl-test-new-INFO] #0 L2[0].C[0] [fire D] [GrantData toT] source: 0xf, addr: 0x80, alias: 0x1, data: [ 00 fe ... ]

**So there must be something wrong when L2 Grant data!** 