This folder is a local reproducibility pipeline for the paper:
Preregistered Belief Revision Contracts (PBRC): a Logic-Based Protocol for Conformity-Resilient Multi-Agent Deliberation
It contains:
-
Fully runnable simulations that reproduce the core theoretical predictions:
- elimination of social-only wrong-but-sure cascades under PBRC enforcement;
- token-sufficiency / persuasion separation (token-equivalent events induce identical enforced updates);
- topology affects PBRC only via token arrival times (distance/diameter effects under flooding);
- sound-but-incomplete routers preserve safety but reduce liveness.
- ablations over the fallback dilution parameter
$\lambda$ (safety vs epistemic conservatism); - a DoS/cost stress test for token flooding (validation cost scaling and short-circuiting).
-
Benchmark adapters (BenchForm, KAIROS) that let you run PBRC-enforced variants on real LLM MAS conformity benchmarks once you provide:
- model access (OpenAI / local vLLM / Ollama), and
- the benchmark repositories/datasets.
Important: This package includes complete results for the simulation experiments (already generated in
results/sim/). For LLM benchmarks, the package provides a complete pipeline (scripts + adapters), but the results depend on your model/API and compute.
python -m venv .venv
source .venv/bin/activate # (Windows: .venv\Scripts\activate)
pip install -r requirements.txtpython -m pbrc.experiments.run_all --out results/sim --seed 0Outputs:
- CSV summaries in
results/sim/*.csv - Figures in
results/sim/*.png - A short aggregate report
results/sim/REPORT.md
Official repo: https://github.com/Zhiyuan-Weng/BenchForm (see README for eval.py/analysis.py)
Reproducible download (pinned commits):
bash scripts/download_benchmarks.shThen follow BenchForm's README for environment setup, e.g.:
cd BenchForm
pip install -r requirements.txt
export OPENAI_API_KEY="..." # if using OpenAI
cd ..PBRC adapter (from this repo):
python -m pbrc.benchmarks.benchform_pbrc \
--benchform_root /path/to/BenchForm \
--model gpt-4o \
--out results/benchform_pbrc \
--mode social_onlyModes:
social_only: PBRC contract forbids belief revision without external tokens (maximally conformity-resilient).tool_tokens: enables a tool-token plugin (deterministic verifier for some BBH tasks, or custom validators).
The adapter emits JSONL traces with certificates and can be post-processed by BenchForm's
analysis.py.
BenchForm already produces both RAW and social-protocol outputs (e.g., trust, doubt).
PBRC social_only enforcement can therefore be applied without re-running the LLM:
- Run BenchForm normally to generate RAW + social results:
python eval.py --model gpt-4o --save_path benchform_results- Apply PBRC social_only post-processing (reject any flip relative to RAW):
python -m pbrc.benchmarks.benchform_socialonly_postprocess \
--raw benchform_results/<RAW_FILE>.json \
--social benchform_results/<SOCIAL_FILE>.json \
--out benchform_results_pbrc/<PBRC_FILE>.jsonThen analyze with BenchForm's analysis.py on the PBRC-processed file.
Official repo: https://github.com/declare-lab/KAIROS
Dataset: https://huggingface.co/datasets/declare-lab/KAIROS_EVAL
Reproducible download (pinned commits):
bash scripts/download_benchmarks.shThen follow KAIROS's README for environment setup, e.g.:
cd KAIROS
pip install -r requirements.txt
pip install -e .
cd ..PBRC adapter (from this repo):
python -m pbrc.benchmarks.kairos_pbrc \
--kairos_root /path/to/KAIROS \
--subject_model <vllm_endpoint_or_openai_model> \
--out results/kairos_pbrc \
--mode social_onlyKAIROS evaluation already computes RAW predictions and then evaluates social protocols. PBRC social_only can be applied as a post-processing step:
python -m pbrc.benchmarks.kairos_socialonly_postprocess \
--raw eval_results/<MODEL_DIR>/<RAW_JSON>.json \
--social eval_results/<MODEL_DIR>/<PROTOCOL_JSON>.json \
--out eval_results_pbrc/<MODEL_DIR>/<PBRC_JSON>.jsonThe repository ships with pre-generated simulation results in:
results/sim/exp1_social_cascades.csv+exp1_social_cascades.pngresults/sim/exp2_token_sufficiency.csvresults/sim/exp3_topology_token_flow.csv+ figureresults/sim/exp4_incomplete_router.csv+ figureresults/sim/exp1c_ablation_lambda.csv+ figuresresults/sim/exp5_cost_dos.csv+ figure
To regenerate exactly, run the command in Quick start.
This package does not bundle BenchForm or KAIROS datasets/code. It provides adapters and instructions. Please respect the upstream benchmark licenses.