TS-Benchmarks

TS-Benchmarks is the audit-first benchmark harness for the Thinking System ecosystem.

This is not a victory-lap repo. It is a falsification harness. The first result shows clean relaxation on some graph families and failure on scale-free graphs, which is now the next target.

The first implemented slice is Workstream A: scalable graph/tension tests for TS-Core-style relaxation. It creates deterministic synthetic graphs, injects contradictions, runs a sparse active-frontier relaxation loop, compares against simple baselines, and writes auditable receipts.

Current Status

Implemented:

deterministic graph generators: random, scale-free, small-world, knowledge-like, provenance, temporal, multi-context
TS-style tension relaxation reference implementation
degree and PageRank-like localization baselines
scaling CLIs with JSON receipts
receipt JSON schema
markdown/CSV report generation
smoke tests

Not implemented yet:

hard reasoning benchmark runners
TensionLM matched softmax campaign
GPU/Triton hardware microbenchmarks
hybrid Graph+LLM contradiction demo

Quickstart

python3 -m unittest discover -s tests -p 'test*.py'
python3 -m ts_benchmarks.runners.scale_graph --nodes 1000 --graph scale_free --seed 42 --out artifacts/scaling/scale_free_1000.json
python3 -m ts_benchmarks.runners.scale_sweep --sizes 100,1000,10000 --graphs random,scale_free,small_world --seed 42 --out-dir artifacts/scaling
python3 -m ts_benchmarks.reports.plot_scaling --in-dir artifacts/scaling --out-dir artifacts/scaling/report

v0.2 failure decomposition:

python3 -m ts_benchmarks.runners.scale_free_decomposition \
  --sizes 100,1000,10000 \
  --seed 42 \
  --out-dir artifacts/decomposition

v0.3 experimental hub-normalization ablation:

python3 -m ts_benchmarks.runners.hub_normalization_ablation \
  --sizes 100,1000,10000 \
  --graphs scale_free,random,small_world \
  --seed 42 \
  --out-dir artifacts/hub-normalization

v0.4 experimental topology-aware selector:

python3 -m ts_benchmarks.runners.topology_policy_selection \
  --sizes 100,1000,10000 \
  --graphs scale_free,random,small_world \
  --seed 42 \
  --out-dir artifacts/topology-policy

Optional plot generation uses dev dependencies only:

python3 -m venv .venv
.venv/bin/python -m pip install -r requirements-dev.txt
.venv/bin/python -m ts_benchmarks.reports.plot_scaling --in-dir artifacts/scaling --out-dir artifacts/scaling/report

Issue #1

Scale-free graphs retain high final tension under the reference relaxation config. See docs/issues/001-scale-free-residual-tension.md.

v0.2 Task

Scale-Free Failure Decomposition asks whether the scale-free failure is caused by hub dominance, bad damping/plateau behavior, poor contradiction scoring, or active-frontier policy. It intentionally diagnoses the failure before changing the relaxation algorithm.

v0.3 Experimental Branch

Hub-Normalized Relaxation Ablation tests three remedies against the reference config:

degree-normalized update
hub damping
residual redistribution

The branch is experimental. A positive result means the remedy helped seeded synthetic graphs, not that TS-Core scales cleanly.

v0.4 Experimental Branch

Topology-Aware Relaxation Policy Selection tests whether pre-run topology diagnostics can select a safer policy:

hub-heavy graphs select degree normalization
non-hub-heavy graphs keep the reference active-frontier policy

The selector uses graph diagnostics only, not outcome metrics.

This is not yet evidence of universal scaling; it is evidence that topology diagnostics can prevent a known policy tradeoff in the current synthetic sweep.

Claim Boundary

This repo does not prove TS is a transformer replacement. It measures specific graph/tension behavior under specified configs and emits receipts that can be audited.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
artifacts		artifacts
docs		docs
schemas		schemas
tests		tests
ts_benchmarks		ts_benchmarks
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TS-Benchmarks

Current Status

Quickstart

Issue #1

v0.2 Task

v0.3 Experimental Branch

v0.4 Experimental Branch

Claim Boundary

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TS-Benchmarks

Current Status

Quickstart

Issue #1

v0.2 Task

v0.3 Experimental Branch

v0.4 Experimental Branch

Claim Boundary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages