PULP System Verilog evaluation benchmark and library for large language models

This repository creates a benchmark and library dataset from some of the open-source PULP platform's designs in System Verilog, with a dual goal:

Leveraging the generated benchmark to assess LLM-based RTL design and verification capabilities
Leveraging the generated library as a retrieval database for LLM-based hardware design agents, enabling them to consult and reuse modules in a way that mirrors how human designers reference existing third party IPs.

The reference RTL designs and testbenches are single-source, self-contained files from the original PULP codebase, generated using bender and morty open-source tool. Specification prompts are generated by an LLM.

The idea of this repository is inspired by verilog-eval from NVIDIA.

Dependencies

Bender >= 0.27.1
Morty >= 0.9.0
Python >= 3.11

Additionally, you should set "OPENAI_API_KEY", "ANTHROPIC_API_KEY" or other keys in your env variables to use a cloud LLM provider's APIs, or create key.cfg file. The file should be in format of:

OPENAI_API_KEY= 'xxxxxxx'
ANTHROPIC_API_KEY= 'xxxxxxx'
VERTEX_SERVICE_ACCOUNT_PATH= 'xxxxxxx'
VERTEX_REGION= 'xxxxxxx'

Getting started

This project uses Git submodules that have to be initialized. Either clone the repository recursively using:

git clone --recursive <url>

or fetch the submodules afterwards in the repository:

git submodule update --init --recursive

Setup

Install the dependencies:

# get latest stable rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# bender
cargo install bender

# morty
cargo install --git https://github.com/pulp-platform/morty.git

# python environment
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Usage

The list of (RTL DUT, TB) pairs and their original asset have to be listed in a JSON file. We provide a sample file in $ROOT/assets.json file. Then call:

./scripts/bench-lib-gen.sh \
  --json assets.json \
  --out out \
  --provider openai \
  --model gpt-4o-2024-08-06 \
  --key-cfg ./key.cfg \
  --max-token 8192 \
  --tokens 60000 \
  --temperature 0.6 \
  --top-p 0.95

Output

Name	Description
`$ROOT/out/bench/ProbXXX_<dut_name>_ref.sv`	Reference DUT RTL design for automatic spec generation.
`$ROOT/out/bench/ProbXXX_<dut_name>_test.sv`	Reference testbench for the DUT. Used by the assessed LLM to verify its generated RTL DUT in-the-loop. TBs are self-checking.
`$ROOT/out/bench/ProbXXX_<dut_name>_test_golden.sv`	Reference testbench instantiating the reference DUT. Serves as a golden reference for comparison with the LLM-generated RTL DUT.
`$ROOT/out/bench/ProbXXX_<dut_name>_prompt.txt`	LLM-generated natural language input spec based on the reference design and, if any specified, testbench. Can also be written from scratch if a reference design is missing.

Name	Description
`$ROOT/out/lib/<dut_name>.json`	LLM-generated structured input spec (json) based on the reference design.

An example output with the designs fifo_v3 and credit_counter from PULP platform's common_cells is provided in out/{bench,lib}.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PULP System Verilog evaluation benchmark and library for large language models

Dependencies

Getting started

Setup

Usage

Output

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
out		out
scripts		scripts
.gitmodules		.gitmodules
README.md		README.md
assets.json		assets.json
requirements.txt		requirements.txt

alex96295/pulp-verilog-eval

Folders and files

Latest commit

History

Repository files navigation

PULP System Verilog evaluation benchmark and library for large language models

Dependencies

Getting started

Setup

Usage

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages