psi-loop is a small Python library for redundancy-aware context selection. Instead of ranking candidates by similarity alone, it scores them by two signals: how relevant they are to the goal, and how novel they are relative to what is already in context. The core rule is Psi0: prefer context that is both useful and non-redundant.
The package is still intentionally narrow, but it is no longer just a hardcoded demo. It now exposes pluggable embedders, pluggable candidate sources, a thin PsiLoop orchestration layer, and a zero-dependency bag-of-words fallback so the ranking thesis can be exercised locally before introducing heavier retrieval or embedding backends.
Standard retrieval tends to return whatever looks semantically similar to the goal, even when that context is repetitive or stale. psi-loop explores a different ranking rule:
V: value relative to the goalH: surprise relative to the current contextPsi0 = H * V
Ranking uses linear V × H scoring with near-tie value-priority: when two candidates' scores differ by less than NEAR_TIE_EPSILON (0.01), the one with higher value is ranked first so budget packing prefers usefulness over novelty in close score contests.
The goal of this repo is not to ship a full agent system yet. The goal is to prove that this ranking rule is worth keeping, and to make it easy to plug into richer retrieval systems later.
Psi0only- Pluggable embedders and candidate sources
- Zero-dependency default behavior
- Budgeted context selection
- Baseline comparison against plain goal similarity
- Test fixtures and GitHub Actions CI
Not included yet:
Psi1mid-inference hooks- Learning/calibration loop from
Psi0toPsi2 - Dual memory store, orchestration layer, or HITL middleware
- External model or embedding services
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e .[dev]List the bundled sample tasks:
psi-loop --list-tasksRun the bundled demo task:
psi-loop --task retry_backoffYou can still point the CLI at your own fixture file:
psi-loop --fixture tests/fixtures/sample_tasks.json --task retry_backoffRun the test suite:
pytestRun the benchmark evaluation:
python scripts/run_baseline_vs_psi0.py --backend bow
python scripts/run_baseline_vs_psi0.py --backend denseThe current package is organized around three main extension points:
Embedder: text-to-vector protocolCandidateSource: candidate retrieval protocolPsiLoop: thin orchestration shell that fetches candidates, scores them, and fits them to a budgetbuild_task_forensics/render_task_forensics: structured forensic view comparingPsi0and baseline on a single task
The default install path remains zero-dependency:
BowEmbedderis the fallback embedderFixtureSourceis the default demo source
Psi0 combines two simple signals:
V: keyword overlap between a candidate and the goalH: surprise relative to the current context
The package ranks candidates by H * V (with near-tie value-priority at selection time), then fits the result into a shared token budget. A similarity-only baseline is included for comparison so fixtures can demonstrate where goal-conditioned salience beats naive retrieval.
By default, H is computed through the bundled BowEmbedder, which produces L2-normalized bag-of-words vectors so that short and long chunks contribute equally to the context centroid. The scoring functions also accept injected embedders, which is the seam intended for future dense-vector backends.
Selection is iterative by default: after each candidate is chosen it is appended to the running context before the remaining candidates are re-scored, so already-selected content suppresses redundant subsequent picks. Pass iterative=False to select_context or PsiLoop.select to restore the original single-pass behaviour.
In the bundled example, the baseline prefers a note that repeats the existing fixed-delay retry policy, while Psi0 prefers the more novel note about exponential backoff with jitter.
If you want to evaluate whether the project is doing anything useful yet, use this sequence:
- Run
psi-loop --list-tasksto confirm the bundled demo data is available. - Run
psi-loop --task retry_backoffand compare thePsi0selection against the baseline. - Open
src/psi_loop/data/sample_tasks.jsonand change the goal, current context, or candidates to see how ranking changes. - Run
pytestto make sure your changes did not break the existing behavior.
For the more formal benchmark path:
- Run
python scripts/run_baseline_vs_psi0.py --backend bowfor the zero-dependency baseline comparison. - Optionally install
python -m pip install -e .[dense]and runpython scripts/run_baseline_vs_psi0.py --backend dense. - Inspect
evaluation_results_baseline_vs_psi0_bow.jsonandevaluation_results_baseline_vs_psi0_dense_all-MiniLM-L6-v2.jsonfor structured task-level outputs, including embedder metadata. Dense defaults are model-specific so runs with different--model-namevalues do not overwrite each other. - Read
evaluation_baseline_vs_psi0.mdfor the scientist-style interpretation of the latest run.
The fastest way to learn the system is to tweak the fixture and rerun the CLI. That gives you immediate feedback on whether the ranking rule is behaving intuitively.
If you want to test the new protocol seam rather than just the demo:
- Inject a fake embedder in tests to force a known vector geometry.
- Use
FixtureSourcein tests or scripts to load candidate pools without going through CLI parsing. - Instantiate
PsiLoop(source=..., embedder=...)and compare its ranked output toselect_context_baseline(...).
src/psi_loop/embedders.py:Embedderprotocol,BowEmbedder, and shared vector mathsrc/psi_loop/sources.py:CandidateSourceprotocol andFixtureSourcesrc/psi_loop/scoring.py: scoring primitives forV,H, andPsi0src/psi_loop/pipeline.py:PsiLoop, ranking helpers, and budget fittingsrc/psi_loop/baseline.py: similarity-only comparison pathsrc/psi_loop/evaluation.py: benchmark evaluation helpers and aggregation logicsrc/psi_loop/cli.py: fixture-backed CLI for local experimentssrc/psi_loop/data/sample_tasks.json: bundled demo data for first-run testingtests/fixtures/benchmark_tasks.json: hybrid benchmark task set for baseline-vs-Psi0evaluationscripts/run_baseline_vs_psi0.py: one-command benchmark runnerevaluation_baseline_vs_psi0.md: latest written evaluation reporttests/: scoring, pipeline, and fixture-based regression tests
- Tokenization and stemming are intentionally simple heuristics.
- The default embedder is still bag-of-words; the dense backend is optional and still under evaluation.
FixtureSourceis a demo source, not a production retrieval layer.- The bundled benchmark is still small and hand-authored, not yet a broad external validation set.
- Add a dense embedder implementation behind the
Embedderprotocol - Add richer source implementations behind
CandidateSource - Add
Psi1hooks for triggered retrieval during reasoning - Add post-action usefulness tracking and
Psi0/Psi2calibration - Expand fixtures into a repeatable evaluation harness
MIT. See LICENSE.