This repository creates a benchmark and library dataset from some of the open-source PULP platform's designs in System Verilog, with a dual goal:
- Leveraging the generated benchmark to assess LLM-based RTL design and verification capabilities
- Leveraging the generated library as a retrieval database for LLM-based hardware design agents, enabling them to consult and reuse modules in a way that mirrors how human designers reference existing third party IPs.
The reference RTL designs and testbenches are single-source, self-contained
files from the original PULP codebase, generated using bender
and morty
open-source tool. Specification prompts are generated by an LLM.
The idea of this repository is inspired by verilog-eval from NVIDIA.
- Bender >= 0.27.1
- Morty >= 0.9.0
- Python >= 3.11
Additionally, you should set "OPENAI_API_KEY", "ANTHROPIC_API_KEY" or other keys in your env variables to use a cloud LLM provider's APIs, or create key.cfg file. The file should be in format of:
OPENAI_API_KEY= 'xxxxxxx'
ANTHROPIC_API_KEY= 'xxxxxxx'
VERTEX_SERVICE_ACCOUNT_PATH= 'xxxxxxx'
VERTEX_REGION= 'xxxxxxx'
This project uses Git submodules that have to be initialized. Either clone the repository recursively using:
git clone --recursive <url>
or fetch the submodules afterwards in the repository:
git submodule update --init --recursive
Install the dependencies:
# get latest stable rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# bender
cargo install bender
# morty
cargo install --git https://github.com/pulp-platform/morty.git
# python environment
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
The list of (RTL DUT, TB) pairs and their original asset have to be listed in a
JSON file. We provide a sample file in $ROOT/assets.json
file. Then call:
./scripts/bench-lib-gen.sh \
--json assets.json \
--out out \
--provider openai \
--model gpt-4o-2024-08-06 \
--key-cfg ./key.cfg \
--max-token 8192 \
--tokens 60000 \
--temperature 0.6 \
--top-p 0.95
Name | Description |
---|---|
$ROOT/out/bench/ProbXXX_<dut_name>_ref.sv |
Reference DUT RTL design for automatic spec generation. |
$ROOT/out/bench/ProbXXX_<dut_name>_test.sv |
Reference testbench for the DUT. Used by the assessed LLM to verify its generated RTL DUT in-the-loop. TBs are self-checking. |
$ROOT/out/bench/ProbXXX_<dut_name>_test_golden.sv |
Reference testbench instantiating the reference DUT. Serves as a golden reference for comparison with the LLM-generated RTL DUT. |
$ROOT/out/bench/ProbXXX_<dut_name>_prompt.txt |
LLM-generated natural language input spec based on the reference design and, if any specified, testbench. Can also be written from scratch if a reference design is missing. |
Name | Description |
---|---|
$ROOT/out/lib/<dut_name>.json |
LLM-generated structured input spec (json) based on the reference design. |
An example output with the designs fifo_v3
and credit_counter
from PULP
platform's common_cells is
provided in out/{bench,lib}
.