Add Langfuse observability and improve Parzival context injection by Hidden-History · Pull Request #45 · Hidden-History/ai-memory

Hidden-History · 2026-03-01T20:28:15Z

Summary

v2.0.9 — Injection quality sprint (PLAN-010) + Langfuse observability + installer fixes.

Dedicated github Qdrant collection — eliminates 79.6% noise from discussions
Structured error pattern detection — eliminates false positives in code-patterns
Tier 2 context injection type filters — prevents low-value content injection
Langfuse trace visibility restored — 15 hardcoded [:300] truncations removed, TRACE_CONTENT_MAX=10000 standardized
Parzival layered priority bootstrap (L1-L4 deterministic + semantic)
Content quality gate for low-value messages
Installer Option 1 recursive copy fix (BUG-205)
Migration script: purge false positives + rename error_fix → error_pattern

Test Results

CI: All checks green (Lint, Unit Tests 3.10/3.11/3.12, Integration, CodeQL, Installation Ubuntu+macOS)
Local: 2,108 pass, 0 fail, 477 skipped
Live verification: 277 tests across 11 domains — 249 pass (89.9%), 0 critical issues
Langfuse audit: 9 traces, 42 spans — zero truncation found, full pipeline visible
Hook verification: All 14 scripts checksummed, zero [:300] instances, TRACE_CONTENT_MAX=10000 confirmed
System health: 16/16 Docker services healthy, 5/5 Qdrant collections verified

Commits (27)

27 commits covering: PLAN-010 injection quality, Langfuse observability, Parzival bootstrap, BUG-197/198/199/200/201/204/205, TD-237, CodeQL fix, installer recursive copy + chmod.

Upgrade Path

See CHANGELOG.md for complete upgrade instructions (Option 1 + container rebuild + migration).

…ion, and session_start Instruments the retrieval pipeline with Langfuse trace events via the fire-and-forget trace buffer pattern (emit_trace_event). Previously, search.py and injection.py had zero Langfuse visibility, and session_start.py only emitted aggregate counts. - search.py: search_query, dual_collection_search, cascading_search events with collection, model, duration, and score metadata - injection.py: bootstrap_retrieval (per-category counts + per-result scores), greedy_fill (budget utilization, dedup/score-gap skip counts, cached token counts), format_injection (output size tracking) - session_start.py: enhanced 4 existing trace events with per-result detail (type, collection, score, tokens) for both startup and resume/compact paths All emit calls guarded by if emit_trace_event + try/except Exception: pass. No direct Langfuse SDK imports. No existing signatures or behavior changed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…maries Previous trace events only showed aggregate counts like "Found 5 results, top score: 0.8283" — making it impossible to see WHAT was actually retrieved and injected. Now all trace event outputs include the actual content with per-result previews (type, collection, score, content). - search_query: output now shows each result's content (500 char preview) - bootstrap_retrieval: output shows per-result content with type/score - greedy_fill: output shows selected results with token counts - format_injection: output shows the actual <retrieved_context> content - cascading_search + dual_collection_search: same content preview pattern Also added TRACE_CONTENT_MAX constants (2000/10000 chars) for truncation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…n_id All hook scripts now propagate LANGFUSE_TRACE_ID and CLAUDE_SESSION_ID env vars before calling library functions (search.py, injection.py). This ensures all emit_trace_event() calls within a single hook execution share the same Langfuse trace, enabling proper span nesting and session correlation. Changes: - trace_buffer.py: Add CLAUDE_SESSION_ID env var fallback for session_id - session_start.py: Propagate env vars in startup + resume/compact paths, remove random trace_id on session_bootstrap events - context_injection_tier2.py: Propagate env vars before search loop - best_practices_retrieval.py: Generate trace_id + extract session_id from hook_input, propagate via env vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Root cause: _make_parent_context() with INVALID_SPAN_ID causes OTel to create separate traces per event instead of nesting under one trace. Fix: Each hook generates a root_span_id (LANGFUSE_ROOT_SPAN_ID env var). Library functions (search.py, injection.py) get this as parent_span_id via env fallback in trace_buffer.py, creating proper parent→child nesting. Root span events pass parent_span_id=None explicitly to skip env fallback. Sentinel _UNSET differentiates "not provided" (use env) from "explicitly None" (root span, no parent). Changes: - trace_buffer.py: _UNSET sentinel for parent_span_id env fallback - session_start.py: Root span IDs for startup + resume/compact paths - context_injection_tier2.py: Root span ID + wall-clock start_time - best_practices_retrieval.py: Root span ID for PreToolUse hook Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ntext Root cause: SpanContext with INVALID_SPAN_ID fails is_valid(), causing OTel to ignore the trace_id and create a new trace. Root spans and child spans ended up in different Langfuse traces. Fix: Generate random valid span_id for root spans (no parent_span_id). With is_remote=True, OTel won't look for this span locally, but will inherit the trace_id ensuring all spans share the same Langfuse trace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace flat score-sorted pool with layered priority retrieval when parzival_enabled=True. Layers are processed in order by greedy fill: Layer 1 (handoff via get_recent), Layer 2 (decisions via get_recent), Layer 3 (insights via search), Layer 4 (GitHub via search). Conventions query removed from Parzival path (noise for PM oversight). Non-Parzival path unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When parzival_enabled, the resume/compact path now uses get_recent() for deterministic timestamp-sorted retrieval of session summaries and decisions only (skipping conventions and code-patterns noise). Non-Parzival path remains exactly as before with all three vector search queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

F-1: Score gap filter excludes deterministic score=1.0 from best_score F-2: Session summary get_recent() wrapped in try/except F-3: searcher.close() guaranteed via try/finally F-4: Success-path Prometheus metrics added to get_recent() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BUG-197: The top-level import of async_sdk_wrapper eagerly pulls in anthropic as a transitive dependency. Environments without anthropic (e.g. embedding container) crash on `import memory`. Wrap the import in try/except ImportError so the rest of the module loads cleanly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TD-219: E2E tests fail with FileNotFoundError when saving screenshots because tests/e2e/screenshots/ doesn't exist. Add a session-scoped autouse fixture that creates the directory with os.makedirs(exist_ok). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TD-225: When searching the discussions collection with a memory_type filter that includes github_code_blob, route to the code embedding model instead of the default prose model. This matches the storage-side routing in MemoryStorage._get_embedding_model (SPEC-010 Section 4.2), ensuring query embeddings use the same model as stored embeddings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TD-228: The compact/resume path only had a high-level session_bootstrap trace event with no per-retrieval detail. Add memory_retrieval_* trace events for each get_recent() and retrieve_session_summaries() call: - Parzival path: trace events for session summaries (get_recent) and decisions (get_recent) with timing and result counts - Non-Parzival path: trace event for session summaries (scroll) — the search() calls already self-trace via SPEC-021 Each event includes trigger, collection, method, result_count, and retrieval_ms for full observability in Langfuse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add strict=False to zip() calls in select_results_greedy trace block - Replace try/except Exception: pass with contextlib.suppress in format_injection_output - Move E402 imports (activity_log, metrics_push) to top of search.py - Remove f-prefix from f-string without placeholders in dual_collection_search trace Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…eval Update 4 failing tests to match PM #136 changes where Parzival bootstrap switched from flat-pool vector search to layered priority retrieval: - test_parzival_bootstrap_uses_agent_id: handoff now uses get_recent, not search - test_parzival_bootstrap_full_qdrant_down: mock get_recent + add github_sync_enabled - test_parzival_bootstrap_graceful_degradation: same mock fixes - test_parzival_bootstrap_includes_github_enrichment: route handoff through get_recent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r regex, add quality gate P10-5: L4 GitHub enrichment now queries COLLECTION_GITHUB instead of COLLECTION_DISCUSSIONS P10-9: Tier 2 discussions search filters to high-value types only (decision, guideline, session, agent_insight, agent_handoff, agent_memory) P10-7: Rewrite detect_error_indicators() to use structured error patterns instead of bare keyword matching, preventing false positives from filenames containing 'error' P10-10: Add quality gate to user_prompt_store_async.py and agent_response_store_async.py to skip low-value short messages (< 4 words or known low-value phrases) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add dedicated `github` Qdrant collection to separate GitHub data from discussions. Updates config constants, setup-collections indexes, and all github connector modules (schema, code_sync, sync, __init__). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

F-1: Fix test import DISCUSSIONS_COLLECTION → GITHUB_COLLECTION F-2: Add missing error patterns to migration ERROR_INDICATORS regex F-3: Replace hardcoded "discussions" string with COLLECTION_DISCUSSIONS constant F-4: Replace hardcoded GITHUB_COLLECTION with import from memory.config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…010)

Add v2.0.9 CHANGELOG section covering PLAN-010 injection quality sprint, Langfuse observability, and Parzival layered priority bootstrap. Update architecture doc from three-collection to five-collection (github + jira-data). Fix GitHub integration docs to reference github collection. Update error detection docs for structured pattern matching rewrite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Data flow diagram: Stop hook → PreCompact hook for session summaries - Mistake 2: align with Mistake 9 (PreCompact is correct, not Stop) - Summary table: Stop hook → PreCompact hook - Comparison table: mark code-patterns/conventions as conditionally searched in Tier 2 and SessionStart (non-Parzival path) - GitHub indexes: memory_type → type (matches Qdrant field name) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test_get_collection_stats_includes_jira_data_when_exists expected 4 collections but PLAN-010 added the github collection making it 5. Also verify github collection is present in stats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… TD-237 BUG-204: Remove 15 hardcoded [:300] truncation limits from 5 hook scripts. Standardize TRACE_CONTENT_MAX=10000 across all files (search.py, langfuse_stop_hook.py, classification_worker.py). BUG-200 (completion): Correct type="error_fix" to "error_pattern" in error_store_async.py (5 locations). Update retrieval in error_detection.py, triggers.py, and classifier config.py to handle both type names. BUG-201 (completion): Add post-filter in context_injection_tier2.py to exclude error_fix/error_pattern from Tier 2 code-patterns injection. TD-237: Add error_pattern type to classifier LLM prompt template so the classifier can distinguish automated error captures from user-reported fixes. TD-235: Fix install.sh log message "3 collections" to "5 collections". BUG-197: Confirmed fixed (lazy imports in memory/__init__.py). 3-round adversarial review (Sonnet+Opus): Round 1 fixed error_detection.py, triggers.py, classifier config.py. Round 2 fixed test_triggers.py assertion. Round 3: ZERO ISSUES — APPROVED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

scripts/migrate_v209_github_collection.py

CodeQL CWE-117: migrate_v209_github_collection.py logged last 3 chars of QDRANT_API_KEY in mismatch warning. Replace with key length to preserve debugging utility without exposing secret material. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

update_shared_scripts() used a non-recursive glob (*.py) that only copied top-level Python files, missing scripts/memory/ (33 files) and all .sh files (6 files). Replaced with cp -r matching copy_files() pattern. Added chmod +x for executable permissions parity with copy_files(). Fixes: BUG-205 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Migration script now renames any error_fix entries that survive the false-positive purge to the correct error_pattern type (BUG-200). Updated CHANGELOG with BUG-204/205, TD-237, CodeQL fix, and classifier-worker rebuild requirement in upgrade instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

WB Solutions and others added 24 commits March 1, 2026 09:33

feat(search): add get_recent() deterministic timestamp-sorted retrieval

ee3fe19

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: fix black formatting and ruff contextlib.suppress lint

c6d39a8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(test): update github enrichment test for COLLECTION_GITHUB (PLAN-…

ebbadcf

…010)

github-advanced-security bot found potential problems Mar 2, 2026

View reviewed changes

scripts/migrate_v209_github_collection.py Fixed Show fixed Hide fixed

WB Solutions and others added 3 commits March 2, 2026 10:55

Hidden-History merged commit b99b510 into main Mar 2, 2026
12 checks passed

Hidden-History deleted the feature/v2.0.9-observability branch March 2, 2026 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Langfuse observability and improve Parzival context injection#45

Add Langfuse observability and improve Parzival context injection#45
Hidden-History merged 27 commits intomainfrom
feature/v2.0.9-observability

Hidden-History commented Mar 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Hidden-History commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Results

Commits (27)

Upgrade Path

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hidden-History commented Mar 1, 2026 •

edited

Loading