Dialectica is a design for an agent that transforms natural language into structured logical blocks before downstream prompting, retrieval, and embedding.
The goal is not to force language into brittle formal logic. The goal is to produce a stable intermediate representation that is:
- expressive enough to capture reasoning structure
- soft enough to preserve ambiguity and uncertainty
- machine-friendly enough for retrieval, planning, and synthesis
By doing so, the project explores how token-independent structure can work alongside strong language models without pretending that every task should be reduced to strict symbolic logic.
The repo includes a runnable local toolchain with:
- a local parser CLI
- a small HTTP API
- a benchmark and evaluation harness
- an end-to-end answer evaluation harness
- a synthetic complexity-sweep benchmark generator
- schema, prompt, and sample output files
The parser is heuristic so the project can run locally with no external model dependency. That makes it useful for fast iteration on the intermediate representation before swapping in an LLM-backed parser later.
From the repo root, use uv as the default workflow:
uv syncThen run commands with uv run:
uv run dialectica parse "If the battery is dead, the car will not start unless we jump it." --prettyuv run dialectica parse "If the battery is dead, the car will not start unless we jump it." --prettyuv run dialectica parse "The experiment failed because the reagent was contaminated." --canonicaluv run dialectica fingerprint "If the battery is dead, the car will not start unless we jump it." --summary --prettyuv run dialectica compare-fingerprints "The policy requires encryption at rest." "The policy does not require encryption at rest." --prettyuv run dialectica parse --document "Employees may access the lab only if they completed safety training. Alice completed safety training." --prettyuv run dialectica serve --host 127.0.0.1 --port 8000Then send a request:
curl -X POST http://127.0.0.1:8000/parse ^
-H "Content-Type: application/json" ^
-d "{\"text\":\"We should allow remote work if it improves productivity.\"}"uv run dialectica evaluate --dataset benchmarks/core.json --prettyFor a smoke test that exercises the full pipeline without an API call:
uv run dialectica evaluate-answers --dataset benchmarks/objective_qa.sample.json --provider oracle --prettyFor a real model comparison using the OpenAI Responses API:
$env:OPENAI_API_KEY="your_key_here"
uv run dialectica evaluate-answers --dataset benchmarks/objective_qa.sample.json --provider openai --model gpt-5-mini --output reports/openai_metrics.json --performance-output reports/openai_metrics.performance.json --prettyFor a no-cost local run with Ollama:
ollama pull qwen2.5:3b
uv run dialectica evaluate-answers --dataset benchmarks/objective_qa.sample.json --provider ollama --model qwen2.5:3b --output reports/ollama_metrics.json --performance-output reports/ollama_metrics.performance.json --prettyThe Ollama provider is localhost-only by default. It refuses non-local Ollama
endpoints unless you explicitly set DIALECTICA_ALLOW_REMOTE_OLLAMA=1.
uv run dialectica generate-complexity-benchmark --levels 6 --scenarios-per-level 2 --followups 2 --output benchmarks/complexity_sweep.sample.jsonuv run dialectica evaluate-complexity-sweep --dataset benchmarks/complexity_sweep.sample.json --provider ollama --model qwen2.5:3b --output reports/ollama_complexity.json --performance-output reports/ollama_complexity.performance.json --summary-output reports/ollama_complexity.sweep.json --markdown-output reports/ollama_complexity.report.md --charts-dir reports/ollama_complexity.charts --prettyThe complexity summary separates two crossover questions:
- when the compact Dialectica encoding becomes smaller than the raw context
- when the full compact prompt becomes smaller than the direct prompt
The report bundle can also emit:
- a markdown summary report
- SVG bar charts for accuracy, tokens, latency, and iterations
- an SVG line plot of raw context size versus average total tokens
uv run dialectica reason "If Alice completed amber orientation, then Alice holds an amber badge. If Alice holds an amber badge, then Alice may enter the amber workshop. Alice completed amber orientation. May Alice enter the amber workshop?" --prettyIf you want a step-by-step explanation of what happens during a run, see run walkthrough.
src/dialectica/ir.py: hybrid reasoning ASTsrc/dialectica/translator.py: controlled natural-language translatorsrc/dialectica/compiler.py: export dispatchersrc/dialectica/engine.py: deterministic Horn and broader FOL-style reasoning enginessrc/dialectica/explainer.py: natural-language explanation layersrc/dialectica/parser.py: heuristic parsersrc/dialectica/canonical.py: canonicalization and serializationsrc/dialectica/complexity.py: synthetic complexity benchmark generation and summariessrc/dialectica/filtering.py: question-conditioned logical-form pruningsrc/dialectica/fingerprint.py: Goedel-inspired structural fingerprint backendsrc/dialectica/validation.py: lightweight validationsrc/dialectica/api.py: stdlib HTTP APIsrc/dialectica/cli.py: command-line entry pointssrc/dialectica/reporting.py: markdown and SVG report generationbenchmarks/core.json: initial benchmark setbenchmarks/objective_qa.sample.json: sample objective answer benchmark formatbenchmarks/complexity_sweep.sample.json: generated scaling benchmark samplebenchmarks/reasoning.messy.json: canonical noisy reasoning benchmark withyes / no / unknownanswersschema/logical-blocks.schema.json: JSON schema for the target structureschema/logical-encoding.schema.json: TPTP-inspired encoding layerprompts/parser.md: parser prompt for an eventual LLM-backed versiondocs/concepts/README.md: concept-level architecture and method guidesdocs/tptp-inspired-encoding.md: notes on the richer formal encodingdocs/structural-fingerprint.md: notes on the Goedel-inspired auxiliary backenddocs/objective-evaluation.md: how to test whether Dialectica improves correctnessdocs/run-walkthrough.md: step-by-step explanation of what happens during a rundocs/reasoning-pipeline.md: reasoning pipeline design and CLI usagedocs/encoding-backends.md: practical comparison of export targetsdocs/reasoning-benchmark-suite.md: benchmark philosophy, categories, metrics, and usage
Dialectica has two main tracks:
- a retrieval-oriented logical-form lane for prompting, indexing, and hybrid retrieval
- a deterministic reasoning lane for the safe symbolic fragment
The longer architectural notes live in docs/ so this README can stay focused
on setup and navigation.
Start here:
- Concept Guide Index
- Foundations
- Retrieval Architecture
- Reasoning Methods
- Run Walkthrough
- Objective Evaluation
- Reasoning Benchmark Suite
- Encoding Backends
- TPTP-Inspired Encoding
- Structural Fingerprint
Shortest mental model:
- parse prose into blocks and relations
- canonicalize that structure into a stable form
- use the structure either as better prompt material or compile it into a local reasoning program
- Replace or augment the heuristic parser with an LLM-backed parser
- Broaden contradiction, negation, and arithmetic coverage in the translator
- Add embedding generation for raw text and canonical graph serializations
- Add hybrid retrieval experiments and scoring
- Measure whether logical retrieval beats text-only retrieval for selected tasks