Skip to content

AliTaheri2002/BarrierBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Ali Taheri, Alireza Taban, Sadegh Soudjani, Ashutosh Trivedi

Isfahan University of Technology, Max Planck Institute for Software Systems, University of Colorado Boulder


Brief Introduction

This paper, accepted at the 8th Annual Learning for Dynamics & Control (L4DC) 2026, introduces BarrierBench, a benchmark of 100 dynamical systems for evaluating LLMs on safety verification via barrier certificate synthesis. We propose an LLM-based agentic framework that uses natural language reasoning to propose, refine, and validate barrier certificates, integrating SMT-based verification and retrieval-augmented generation. The framework achieves over 90% success in generating valid certificates.

Benchmark

The benchmark is available at: https://hycodev.com/dataset/barrierbench

Installation

Install dependencies using the provided requirements.txt:

pip install -r requirements.txt

Overview

Our agentic framework addresses the limitations of classical barrier certificate synthesis by:

  1. Retrieval-Augmented Generation: Retrieve similar solved problems from the benchmark dataset
  2. Barrier Synthesis Agent: LLM-guided template discovery and candidate generation
  3. Barrier Verifier Agent: SMT-based formal verification of candidate certificates
  4. Iterative Refinement: Feedback loop to refine candidates based on verification results

Usage

Setup

Set your API key in main.py:

synthesizer = BarrierSynthesisAgent(api_key="YOUR_API_KEY", max_iterations=5, dataset_json_path="../Benchmark/barrier_dataset.json")

Running

python main.py

Input Format

{
  "problem": {
    "dynamics": "mathematical equations",
    "initial_set": {"type", "radius"/"bounds", "center"},
    "unsafe_set": {"type", "radius"/"bounds", "complement"},
    "controller_parameters": "control inputs (if applicable)"
  },
  "barrier": "barrier function polynomial",
  "controllers": "control law expressions",
  "template_type": "solution classification"
}

Results

Approach Claude Sonnet 4 ChatGPT-4o
Baseline (Single Prompt) 41% 17%
Full Framework 90% 46%
Improvement +49% +29%

Citation

@inproceedings{taheri2026barrierbench,
  title={BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems},
  author={Taheri, Ali and Taban, Alireza and Soudjani, Sadegh and Trivedi, Ashutosh},
  booktitle={8th Annual Learning for Dynamics \& Control (L4DC)},
  year={2026}
}

About

[L4DC 2026] BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages