Release v0.2.0 — Foundation Hardening (5 axes engineering-complete) · Hashevolution/James-RAG-Evol

v0.2.0 — Foundation Hardening

After 44 merged PRs since v0.1.4 (44 days, 127 unit tests, 0 open issues at release time), JAMES exits the v0.2 cycle with five of six axes engineering-complete. Axis 6's remaining gate (a second user running the bench end-to-end on their own corpus) is now in self-feedback + recruitment phase — not code work.

This is the "trustworthy enough to recommend to one other person" milestone described in ROADMAP.md v0.2.0.

한국어 요약

자메스가 v0.2 Foundation Hardening 5개 축을 엔지니어링 측면에서 완료했습니다. 6번째 축(실제 데이터 검증)의 코드 작업은 대부분 완료되었고, 남은 게이트는 "두 번째 사용자가 자기 corpus로 bench 실행"인데 이건 코드가 아니라 사용자 모집 + 자체 피드백 단계입니다.

이번 릴리즈는 v0.1.4 대비 44개 PR 머지, 127개 단위 테스트 통과, open issues 0건 상태에서 발행됩니다.

What's new (axis-by-axis)

Axis 1 — Architecture Separation ✅

core/reasoning_engine.py (51 KB monolith) split into:

core/reasoning/engine.py — orchestration only (16 KB)
core/reasoning/pipeline.py — RAG retrieval pipeline
core/reasoning/modes.py — 4 mode handlers (chat / wiki_edit / self_evolve / coding) + new meta mode

core/memory_*.py consolidated into core/memory/ package with documented public API.

PRs: #35, #37, #38, #39, #50.

Axis 2 — Evaluation Harness ✅

STEP 7 13-query regression suite locked at eval/regression/step7_*.json with byte-identical security-block invariants and graph-paths bands. Runner: python scripts/bench.py --suite=step7 --check.
RAGAS integrated as the third-party harness — context_precision / context_recall / faithfulness / answer_relevancy. Live /query/ driver. Baseline + drift check at eval/ragas/baseline.json.
PR-contract: every change to core/{retrieval,graph,reasoning} must paste bench numbers (CLAUDE.md rule 2 + CONTRIBUTING.md).

PRs: #43, #51, #52, #64, #66.

Axis 3 — Observability / Tracing ✅

trace_id ContextVar at the API edge, propagated end-to-end through core/observability::log_stage.
Per-trace JSONL files at reports/trace/<YYYY-MM-DD>/<trace_id>.jsonl covering auth → retrieve → graph → tool → answer → complete stages.
JAMES_TRACE_STDOUT console mirror — default ON for the single-user operator workflow (set =0 to silence).
GET /admin/trace/{trace_id} full pipeline replay endpoint.
GET /admin/metrics?window_hours=24 per-stage p50/p90/p99/max latency histograms.
7-day auto-prune via JAMES_TRACE_RETENTION_DAYS env (default 7, clamped to [1, 365]).

PRs: #67, #71, #75, #82, #83, #84.

Axis 4 — Security Boundary ✅

core/policy_engine.py is now the single source of role / sensitivity / capability decisions. Removing it would break 6+ production modules.
Capability tokens at every tool call site — no direct fs path strings.
Multimodal trust quarantine: image / video / audio / web inputs flagged and sanitized at a single ingestion chokepoint before joining the LLM context.
Risky-coding hard-refuse policy at pre_check. Queries that ask the model to produce destructive shell / SQL / git commands receive the same byte-identical 26-char block as prompt-injection attempts (q11 / q12 invariants in STEP 7).

PRs: #50, #53, #54, #56, #57, #58, #59, #60, #61, #63, #70.

Axis 5 — Controlled Self-Evolution ✅

Opt-in env flag JAMES_ENABLE_EVOLUTION=0 (default off). JAMES_AUTO_APPROVE requires JAMES_DEV_MODE=1 or the server refuses to start.
Every approved patch is recorded with approver_username / approver_role / approved_at / approval_method in the lifecycle JSONL.
Bench eval gate at /admin/patch/approve: after patch_apply() succeeds, scripts/bench.py --check runs in a subprocess (asyncio.to_thread). Regression triggers auto-rollback via restore_latest() and a ROLLED_BACK lifecycle entry.
Mid-deploy crash recovery tested for byte-identical restore.
GET /admin/patch/audit?since=&approver=&outcome=&limit= operator-facing query endpoint over james_patch_log.jsonl.

PRs: #69, #77, #78, #79.

Axis 6 — Real-Data Validation 🟡

Wiki corpus at 161 entities (concept 62 / org 57 / person 11 / document 31), hard-deduped.
13-query STEP 7 suite spans retrieve / relation / multi-hop / compare / dedup / lang-mix / negative / security / meta categories.
Edge cases discovered + closed via real-data feedback: #5, #6, #7, #8, #11, #14, #20.
Remaining: a second user running the bench end-to-end on their own corpus. This is the v0.2 → v0.3 gate and is now in recruitment phase.

User-feedback fixes (UX)

This cycle also folded in three direct user-feedback items:

Answer flow — replaced the rigid 📚 자료 기반 / 💡 추론 two-section template with a Claude-style natural prose flow (핵심 답 → 근거 → 추가 시각). PR #74.
Debug visibility — JAMES_TRACE_STDOUT defaults ON so operators see per-stage JSONL lines without env-var setup. PR #75.
Meta-mode routing — chat-page inventory queries ("어떤 자료 있어?" / "데이터 뭐 있는지 보여줘") now route to a dedicated handle_meta instead of hallucinating via retrieval. PR #76.

Production bug fixes

Windows path-check in patch_applier.py — str(Path(target)).startswith(".") on Windows normalized away the leading ./, silently rejecting every legitimate self-evolution sandbox patch. Single-line fix. PR #78.
Korean encoding in 4 sites — three self-test files wrote Korean comments via open(..., "w") without encoding="utf-8", landing as cp949 on Windows; bench_gate.subprocess.run also decoded captured output via locale (cp949) instead of utf-8. PR #80.
cp949 console crash — ensure_utf8_console() wired into the server entry, admin scripts, and tests so emoji-bearing print statements don't crash on default Windows consoles. PR #36 + ongoing test additions.

Breaking / behavior changes

Default JAMES_TRACE_STDOUT=1 — every server startup now prints per-stage JSONL lines to stdout. Set JAMES_TRACE_STDOUT=0 to silence. (PR #75)
Default JAMES_TRACE_RETENTION_DAYS=7 — reports/trace/ directories older than 7 days are removed on server startup. Set higher if you need longer audit retention. (PR #84)
Answer style — response_style API param + JAMES_RESPONSE_STYLE env are now no-ops (kept for back-compat); all answers use the natural-flow prompt. (PR #74)
STEP 7 baseline is step7-v3 (was v1 in v0.1.4) — q12 promoted from flaky to byte-identical block, q13 added for meta-mode. Old --check runs against step7-v1 will fail; rerun against the new baseline.

How to upgrade

git pull origin main
git checkout v0.2.0
pip install -r requirements.txt   # cryptography 47.0.0, pynvml 12.x

# new env knobs (optional):
export JAMES_TRACE_STDOUT=0          # silence per-stage console mirror
export JAMES_TRACE_RETENTION_DAYS=14 # keep 2 weeks of traces
export JAMES_EVOLUTION_GATE=0        # disable bench gate during patch deploy (debug only)

# verify:
python -m unittest discover -s tests   # 127 tests, ~6s
python scripts/bench.py --suite=step7  # live STEP 7 against your wiki

What's next

v0.2.1 cycle — self-feedback + second-user recruitment for Axis 6. No more code-only PRs unless a real-use bug surfaces.

v0.3.0 — platform skeleton: core/plugins/base.py (4 plugin types), JAMES_PLUGINS loader, packs/general/ dogfood, JAMES_WORKSPACE for multi-instance hosting, docs/VERSIONING.md with 12-month deprecation policy. Required before any domain pack work.

See ROADMAP.md for the full v0.3 → v0.4 → v1.0 gate definitions.

🤖 Generated with Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 — Foundation Hardening (5 axes engineering-complete)

Choose a tag to compare

Sorry, something went wrong.