BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Ali Taheri, Alireza Taban, Sadegh Soudjani, Ashutosh Trivedi

Isfahan University of Technology, Max Planck Institute for Software Systems, University of Colorado Boulder

Brief Introduction

This paper, accepted at the 8th Annual Learning for Dynamics & Control (L4DC) 2026, introduces BarrierBench, a benchmark of 100 dynamical systems for evaluating LLMs on safety verification via barrier certificate synthesis. We propose an LLM-based agentic framework that uses natural language reasoning to propose, refine, and validate barrier certificates, integrating SMT-based verification and retrieval-augmented generation. The framework achieves over 90% success in generating valid certificates.

Benchmark

The benchmark is available at: https://hycodev.com/dataset/barrierbench

Installation

Install dependencies using the provided requirements.txt:

pip install -r requirements.txt

Overview

Our agentic framework addresses the limitations of classical barrier certificate synthesis by:

Retrieval-Augmented Generation: Retrieve similar solved problems from the benchmark dataset
Barrier Synthesis Agent: LLM-guided template discovery and candidate generation
Barrier Verifier Agent: SMT-based formal verification of candidate certificates
Iterative Refinement: Feedback loop to refine candidates based on verification results

Usage

Setup

Set your API key in main.py:

synthesizer = BarrierSynthesisAgent(api_key="YOUR_API_KEY", max_iterations=5, dataset_json_path="../Benchmark/barrier_dataset.json")

Running

python main.py

Input Format

{
  "problem": {
    "dynamics": "mathematical equations",
    "initial_set": {"type", "radius"/"bounds", "center"},
    "unsafe_set": {"type", "radius"/"bounds", "complement"},
    "controller_parameters": "control inputs (if applicable)"
  },
  "barrier": "barrier function polynomial",
  "controllers": "control law expressions",
  "template_type": "solution classification"
}

Results

Approach	Claude Sonnet 4	ChatGPT-4o
Baseline (Single Prompt)	41%	17%
Full Framework	90%	46%
Improvement	+49%	+29%

Citation

@inproceedings{taheri2026barrierbench,
  title={BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems},
  author={Taheri, Ali and Taban, Alireza and Soudjani, Sadegh and Trivedi, Ashutosh},
  booktitle={8th Annual Learning for Dynamics \& Control (L4DC)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Benchmark		Benchmark
Source-Code		Source-Code
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Brief Introduction

Benchmark

Installation

Overview

Usage

Setup

Running

Input Format

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Brief Introduction

Benchmark

Installation

Overview

Usage

Setup

Running

Input Format

Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages