psi-loop

psi-loop is a small Python library for redundancy-aware context selection. Instead of ranking candidates by similarity alone, it scores them by two signals: how relevant they are to the goal, and how novel they are relative to what is already in context. The core rule is Psi0: prefer context that is both useful and non-redundant.

The package is still intentionally narrow, but it is no longer just a hardcoded demo. It now exposes pluggable embedders, pluggable candidate sources, a thin PsiLoop orchestration layer, and a zero-dependency bag-of-words fallback so the ranking thesis can be exercised locally before introducing heavier retrieval or embedding backends.

Why It Exists

Standard retrieval tends to return whatever looks semantically similar to the goal, even when that context is repetitive or stale. psi-loop explores a different ranking rule:

V: value relative to the goal
H: surprise relative to the current context
Psi0 = H * V

Ranking uses linear V × H scoring with near-tie value-priority: when two candidates' scores differ by less than NEAR_TIE_EPSILON (0.01), the one with higher value is ranked first so budget packing prefers usefulness over novelty in close score contests.

The goal of this repo is not to ship a full agent system yet. The goal is to prove that this ranking rule is worth keeping, and to make it easy to plug into richer retrieval systems later.

MVP Scope

Psi0 only
Pluggable embedders and candidate sources
Zero-dependency default behavior
Budgeted context selection
Baseline comparison against plain goal similarity
Test fixtures and GitHub Actions CI

Not included yet:

Psi1 mid-inference hooks
Learning/calibration loop from Psi0 to Psi2
Dual memory store, orchestration layer, or HITL middleware
External model or embedding services

Quickstart

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e .[dev]

List the bundled sample tasks:

psi-loop --list-tasks

Run the bundled demo task:

psi-loop --task retry_backoff

You can still point the CLI at your own fixture file:

psi-loop --fixture tests/fixtures/sample_tasks.json --task retry_backoff

Run the test suite:

pytest

Run the benchmark evaluation:

python scripts/run_baseline_vs_psi0.py --backend bow
python scripts/run_baseline_vs_psi0.py --backend dense

Public API

The current package is organized around three main extension points:

Embedder: text-to-vector protocol
CandidateSource: candidate retrieval protocol
PsiLoop: thin orchestration shell that fetches candidates, scores them, and fits them to a budget
build_task_forensics / render_task_forensics: structured forensic view comparing Psi0 and baseline on a single task

The default install path remains zero-dependency:

BowEmbedder is the fallback embedder
FixtureSource is the default demo source

How It Works

Psi0 combines two simple signals:

V: keyword overlap between a candidate and the goal
H: surprise relative to the current context

The package ranks candidates by H * V (with near-tie value-priority at selection time), then fits the result into a shared token budget. A similarity-only baseline is included for comparison so fixtures can demonstrate where goal-conditioned salience beats naive retrieval.

By default, H is computed through the bundled BowEmbedder, which produces L2-normalized bag-of-words vectors so that short and long chunks contribute equally to the context centroid. The scoring functions also accept injected embedders, which is the seam intended for future dense-vector backends.

Selection is iterative by default: after each candidate is chosen it is appended to the running context before the remaining candidates are re-scored, so already-selected content suppresses redundant subsequent picks. Pass iterative=False to select_context or PsiLoop.select to restore the original single-pass behaviour.

In the bundled example, the baseline prefers a note that repeats the existing fixed-delay retry policy, while Psi0 prefers the more novel note about exponential backoff with jitter.

Best Way To Test It

If you want to evaluate whether the project is doing anything useful yet, use this sequence:

Run psi-loop --list-tasks to confirm the bundled demo data is available.
Run psi-loop --task retry_backoff and compare the Psi0 selection against the baseline.
Open src/psi_loop/data/sample_tasks.json and change the goal, current context, or candidates to see how ranking changes.
Run pytest to make sure your changes did not break the existing behavior.

For the more formal benchmark path:

Run python scripts/run_baseline_vs_psi0.py --backend bow for the zero-dependency baseline comparison.
Optionally install python -m pip install -e .[dense] and run python scripts/run_baseline_vs_psi0.py --backend dense.
Inspect evaluation_results_baseline_vs_psi0_bow.json and evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2.json for structured task-level outputs, including embedder metadata. Dense defaults are model-specific so runs with different --model-name values do not overwrite each other.
Read evaluation_baseline_vs_psi0.md for the scientist-style interpretation of the latest run.

The fastest way to learn the system is to tweak the fixture and rerun the CLI. That gives you immediate feedback on whether the ranking rule is behaving intuitively.

If you want to test the new protocol seam rather than just the demo:

Inject a fake embedder in tests to force a known vector geometry.
Use FixtureSource in tests or scripts to load candidate pools without going through CLI parsing.
Instantiate PsiLoop(source=..., embedder=...) and compare its ranked output to select_context_baseline(...).

Repo Layout

src/psi_loop/embedders.py: Embedder protocol, BowEmbedder, and shared vector math
src/psi_loop/sources.py: CandidateSource protocol and FixtureSource
src/psi_loop/scoring.py: scoring primitives for V, H, and Psi0
src/psi_loop/pipeline.py: PsiLoop, ranking helpers, and budget fitting
src/psi_loop/baseline.py: similarity-only comparison path
src/psi_loop/evaluation.py: benchmark evaluation helpers and aggregation logic
src/psi_loop/cli.py: fixture-backed CLI for local experiments
src/psi_loop/data/sample_tasks.json: bundled demo data for first-run testing
tests/fixtures/benchmark_tasks.json: hybrid benchmark task set for baseline-vs-Psi0 evaluation
scripts/run_baseline_vs_psi0.py: one-command benchmark runner
evaluation_baseline_vs_psi0.md: latest written evaluation report
tests/: scoring, pipeline, and fixture-based regression tests

Current Limits

Tokenization and stemming are intentionally simple heuristics.
The default embedder is still bag-of-words; the dense backend is optional and still under evaluation.
FixtureSource is a demo source, not a production retrieval layer.
The bundled benchmark is still small and hand-authored, not yet a broad external validation set.

Next Steps

Add a dense embedder implementation behind the Embedder protocol
Add richer source implementations behind CandidateSource
Add Psi1 hooks for triggered retrieval during reasoning
Add post-action usefulness tracking and Psi0/Psi2 calibration
Expand fixtures into a repeatable evaluation harness

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
scripts		scripts
src/psi_loop		src/psi_loop
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AI Optimal Synthesis — Architecture Brief.md		AI Optimal Synthesis — Architecture Brief.md
LICENSE		LICENSE
Minimal Context Manager v0.md		Minimal Context Manager v0.md
README.md		README.md
evaluation_baseline_vs_psi0.md		evaluation_baseline_vs_psi0.md
evaluation_forensic_realistic_roadmap_planning_dense.txt		evaluation_forensic_realistic_roadmap_planning_dense.txt
evaluation_results_baseline_vs_psi0_bow.json		evaluation_results_baseline_vs_psi0_bow.json
evaluation_results_baseline_vs_psi0_bow_near_tie_v_priority.json		evaluation_results_baseline_vs_psi0_bow_near_tie_v_priority.json
evaluation_results_baseline_vs_psi0_bow_plan_bonus.json		evaluation_results_baseline_vs_psi0_bow_plan_bonus.json
evaluation_results_baseline_vs_psi0_bow_relation_bonus.json		evaluation_results_baseline_vs_psi0_bow_relation_bonus.json
evaluation_results_baseline_vs_psi0_bow_tempered_h.json		evaluation_results_baseline_vs_psi0_bow_tempered_h.json
evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2.json		evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2.json
evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2_near_tie_v_priority.json		evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2_near_tie_v_priority.json
evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2_plan_bonus.json		evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2_plan_bonus.json
evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2_relation_bonus.json		evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2_relation_bonus.json
evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2_tempered_h.json		evaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2_tempered_h.json
evaluation_run_log.md		evaluation_run_log.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

psi-loop

Why It Exists

MVP Scope

Quickstart

Public API

How It Works

Best Way To Test It

Repo Layout

Current Limits

Next Steps

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

psi-loop

Why It Exists

MVP Scope

Quickstart

Public API

How It Works

Best Way To Test It

Repo Layout

Current Limits

Next Steps

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages