Ballast

Ground, guard, and grade your LLM apps so they don't capsize into hallucination.

Three composing LLM-engineering subsystems over one shared core, built as a portfolio project:

Self-healing RAG (rag/) - a LangGraph stateful graph that retrieves, grades its documents, generates a cited answer, critiques that answer for groundedness and relevancy, and either re-retrieves with a rewritten query or declines gracefully instead of hallucinating.
Guardrails gateway (gateway/) - middleware wrapping the RAG that enforces input guardrails (size, PII, secrets, prompt injection), output guardrails (schema, toxicity, topicality), and a configurable YAML policy layer (must-cite, no personalized/medical/legal advice). Blocks short to a safe fallback and can auto-retry with a correction.
Eval CI/CD (eval/) - a golden Q&A set, faithfulness / relevancy / hallucination / refusal / context-precision / latency / cost metrics, a local gate that fails the build on regression, a committed metrics ledger, and a trend dashboard.

The RAG knowledge base is a public, non-sensitive finance/investing corpus (paraphrased investor.gov material). No private data of any kind belongs in this repo.

Architecture

The self-healing RAG graph (rag/graph.py):

flowchart TD
    START --> retrieve
    retrieve --> grade_documents
    grade_documents -->|relevant docs| generate
    grade_documents -->|none relevant| rewrite_query
    grade_documents -->|node failed| fallback
    generate -->|ok| critic
    generate -->|node failed| fallback
    critic -->|grounded and relevant| END
    critic -->|retries left| rewrite_query
    critic -->|retries exhausted| fallback
    rewrite_query --> retrieve
    fallback --> END

The full request path: input guardrails -> RAG graph -> output + policy guardrails, with an audit trail, cost metering, and a per-run trace on every call.

Stack

Python 3.11 - LangGraph - Claude API (anthropic SDK) behind a swappable core.llm.LLMClient - sentence-transformers embeddings (hashing fallback) - Chroma or in-memory vector store - Pydantic. The Anthropic key is read from AWS Secrets Manager at runtime (env var override for local dev).

Quickstart

# 1. install (editable, with dev tools)
python -m pip install -e ".[dev]"

# 2. provide a key: either AWS creds so Secrets Manager resolves ai-blog/anthropic-api-key,
#    or set a local override
cp .env.example .env        # then optionally set ANTHROPIC_API_KEY for local dev

# 3. ask a question through the full gateway + RAG stack (it indexes the corpus in-process)
make ask Q="Why does diversification reduce risk?"

No `make`? Use the module forms

Every target is a thin wrapper, so on Windows or anywhere without make you can call the modules directly (this is the supported path on those machines):

python -m ballast.rag.ask "Why does diversification reduce risk?"   # full gateway + RAG, cited answer
python -m ballast.rag.ask "Should I buy this stock?" --show-trace   # also print the node-by-node run
python -m ballast.core.ingest                                       # load corpus/*.md into the store
python -m ballast.eval.run_cli --limit 5                            # run part of the golden set
python scripts/local_ci.py                                          # the full local gate
python -m pytest                                                    # 222 tests, no API key needed

Commands

make ask Q="..."   # ask the gateway-wrapped RAG a question (add --show-trace via the module)
make ingest        # load corpus/*.md into the vector store
make eval          # run the golden set, write metrics + append the ledger (LIMIT=N for a subset)
make eval-gate     # check the latest eval run against thresholds (exit non-zero on regression)
make dashboard     # render the metrics trend dashboard to eval/dashboard/index.html
make local-ci      # the merge gate: secret scan, dep audit, ruff, mypy, pytest, eval gate

Use as a library

Ballast is a package, not just a CLI. Wrap the gateway around your own corpus and call it from your app; process returns the answer, its citations, whether a guardrail blocked it, and the full audit trail.

from ballast.core.trace import Trace
from ballast.rag.ask import build_gateway

trace = Trace(run_id="my-app")
gateway = build_gateway(trace)                 # real Claude client, key from Secrets Manager
resp = gateway.process("How does compounding work?")

print(resp.answer.text)                        # the grounded answer
print(resp.answer.citations)                   # [{title, source, url, chunk_id}, ...]
print(resp.blocked, [e.rule for e in resp.audit if e.action == "block"])

To drive it in tests or offline with no API key, inject a fake client:

from ballast.core.testing import FakeLLMClient

gateway = build_gateway(trace, base_client=FakeLLMClient(["a scripted answer"]))

This is exactly how the test suite exercises the whole stack without a key.

Configuration

Strategy and guardrail options are config-driven (core.config.Settings, via env or .env), so the eval matrix can vary them without code edits: retrieval (vector | hybrid), rerank, query_transform (none | multi_query | hyde), vector_store (memory | chroma), embedder, max_retries, and the optional llm_injection_guard / toxicity_guard / topicality_guard.

A note on CI

GitHub Actions is not used. The merge gate runs locally via make local-ci (the eval gate fails the build when the hallucination rate or injection block rate breaches its threshold or p95 latency regresses), with a committed metrics ledger (eval/history.jsonl) carrying the trend across sessions.

Status

Specs in specs/DESIGN.md; the build queue is specs/BACKLOG.md, drained one item per run by the local backlog loop.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
adr		adr
corpus		corpus
docs		docs
eval		eval
examples		examples
scripts		scripts
specs		specs
src/ballast		src/ballast
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
PUBLISHING.md		PUBLISHING.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock
thresholds.yaml		thresholds.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ballast

Architecture

Stack

Quickstart

No `make`? Use the module forms

Commands

Use as a library

Configuration

A note on CI

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ballast

Architecture

Stack

Quickstart

No make? Use the module forms

Commands

Use as a library

Configuration

A note on CI

Status

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

No `make`? Use the module forms

Packages