This repository provides the pipeline for constructing the LogiConBench dataset, a large-scale benchmark for evaluating logical consistency in LLMs. The dataset contains 280K samples with
The dataset is built through five main steps, each implemented in a corresponding Jupyter notebook:
-
Generate logical graphs and sample nodes
- Notebook:
step1_generate logical graph and select.ipynb - Construct logical graphs where nodes are symbolic propositions and edges denote reasoning relations.
- Sample subsets of nodes for downstream processing.
- Notebook:
-
Reorder edges into walk sequences
- Notebook:
step2_reorder.ipynb - Reorder the edges of sampled nodes into sequential walk structures to ensure consistent reasoning paths.
- Notebook:
-
Label consistency sets and count inconsistent nodes
- Notebook:
step3_label and count.ipynb - Assign consistency labels to sampled nodes.
- Count how many nodes admit inconsistent sets.
- Notebook:
-
Rewrite equivalent logical expressions
- Notebook:
step4_rewrite.ipynb - Apply symbolic rewriting rules to generate multiple logically equivalent variants for each node, enhancing structural diversity.
- Notebook:
-
Translate into natural language
- Notebook:
step5_nl_template.ipynb - Convert symbolic expressions into natural language statements using predefined templates and WordNet-based lexical substitutions.
- Notebook:
- All generated data is stored under the
data/directory. - The dataset includes 280K samples, distributed across four difficulty levels:
data/k=2/data/k=3/data/k=4/data/k=5/