Add Langfuse observability and improve Parzival context injection#45
Merged
Hidden-History merged 27 commits intomainfrom Mar 2, 2026
Merged
Add Langfuse observability and improve Parzival context injection#45Hidden-History merged 27 commits intomainfrom
Hidden-History merged 27 commits intomainfrom
Conversation
…ion, and session_start Instruments the retrieval pipeline with Langfuse trace events via the fire-and-forget trace buffer pattern (emit_trace_event). Previously, search.py and injection.py had zero Langfuse visibility, and session_start.py only emitted aggregate counts. - search.py: search_query, dual_collection_search, cascading_search events with collection, model, duration, and score metadata - injection.py: bootstrap_retrieval (per-category counts + per-result scores), greedy_fill (budget utilization, dedup/score-gap skip counts, cached token counts), format_injection (output size tracking) - session_start.py: enhanced 4 existing trace events with per-result detail (type, collection, score, tokens) for both startup and resume/compact paths All emit calls guarded by if emit_trace_event + try/except Exception: pass. No direct Langfuse SDK imports. No existing signatures or behavior changed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…maries Previous trace events only showed aggregate counts like "Found 5 results, top score: 0.8283" — making it impossible to see WHAT was actually retrieved and injected. Now all trace event outputs include the actual content with per-result previews (type, collection, score, content). - search_query: output now shows each result's content (500 char preview) - bootstrap_retrieval: output shows per-result content with type/score - greedy_fill: output shows selected results with token counts - format_injection: output shows the actual <retrieved_context> content - cascading_search + dual_collection_search: same content preview pattern Also added TRACE_CONTENT_MAX constants (2000/10000 chars) for truncation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n_id All hook scripts now propagate LANGFUSE_TRACE_ID and CLAUDE_SESSION_ID env vars before calling library functions (search.py, injection.py). This ensures all emit_trace_event() calls within a single hook execution share the same Langfuse trace, enabling proper span nesting and session correlation. Changes: - trace_buffer.py: Add CLAUDE_SESSION_ID env var fallback for session_id - session_start.py: Propagate env vars in startup + resume/compact paths, remove random trace_id on session_bootstrap events - context_injection_tier2.py: Propagate env vars before search loop - best_practices_retrieval.py: Generate trace_id + extract session_id from hook_input, propagate via env vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: _make_parent_context() with INVALID_SPAN_ID causes OTel to create separate traces per event instead of nesting under one trace. Fix: Each hook generates a root_span_id (LANGFUSE_ROOT_SPAN_ID env var). Library functions (search.py, injection.py) get this as parent_span_id via env fallback in trace_buffer.py, creating proper parent→child nesting. Root span events pass parent_span_id=None explicitly to skip env fallback. Sentinel _UNSET differentiates "not provided" (use env) from "explicitly None" (root span, no parent). Changes: - trace_buffer.py: _UNSET sentinel for parent_span_id env fallback - session_start.py: Root span IDs for startup + resume/compact paths - context_injection_tier2.py: Root span ID + wall-clock start_time - best_practices_retrieval.py: Root span ID for PreToolUse hook Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntext Root cause: SpanContext with INVALID_SPAN_ID fails is_valid(), causing OTel to ignore the trace_id and create a new trace. Root spans and child spans ended up in different Langfuse traces. Fix: Generate random valid span_id for root spans (no parent_span_id). With is_remote=True, OTel won't look for this span locally, but will inherit the trace_id ensuring all spans share the same Langfuse trace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace flat score-sorted pool with layered priority retrieval when parzival_enabled=True. Layers are processed in order by greedy fill: Layer 1 (handoff via get_recent), Layer 2 (decisions via get_recent), Layer 3 (insights via search), Layer 4 (GitHub via search). Conventions query removed from Parzival path (noise for PM oversight). Non-Parzival path unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When parzival_enabled, the resume/compact path now uses get_recent() for deterministic timestamp-sorted retrieval of session summaries and decisions only (skipping conventions and code-patterns noise). Non-Parzival path remains exactly as before with all three vector search queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
F-1: Score gap filter excludes deterministic score=1.0 from best_score F-2: Session summary get_recent() wrapped in try/except F-3: searcher.close() guaranteed via try/finally F-4: Success-path Prometheus metrics added to get_recent() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BUG-197: The top-level import of async_sdk_wrapper eagerly pulls in anthropic as a transitive dependency. Environments without anthropic (e.g. embedding container) crash on `import memory`. Wrap the import in try/except ImportError so the rest of the module loads cleanly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TD-219: E2E tests fail with FileNotFoundError when saving screenshots because tests/e2e/screenshots/ doesn't exist. Add a session-scoped autouse fixture that creates the directory with os.makedirs(exist_ok). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TD-225: When searching the discussions collection with a memory_type filter that includes github_code_blob, route to the code embedding model instead of the default prose model. This matches the storage-side routing in MemoryStorage._get_embedding_model (SPEC-010 Section 4.2), ensuring query embeddings use the same model as stored embeddings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TD-228: The compact/resume path only had a high-level session_bootstrap trace event with no per-retrieval detail. Add memory_retrieval_* trace events for each get_recent() and retrieve_session_summaries() call: - Parzival path: trace events for session summaries (get_recent) and decisions (get_recent) with timing and result counts - Non-Parzival path: trace event for session summaries (scroll) — the search() calls already self-trace via SPEC-021 Each event includes trigger, collection, method, result_count, and retrieval_ms for full observability in Langfuse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add strict=False to zip() calls in select_results_greedy trace block - Replace try/except Exception: pass with contextlib.suppress in format_injection_output - Move E402 imports (activity_log, metrics_push) to top of search.py - Remove f-prefix from f-string without placeholders in dual_collection_search trace Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eval Update 4 failing tests to match PM #136 changes where Parzival bootstrap switched from flat-pool vector search to layered priority retrieval: - test_parzival_bootstrap_uses_agent_id: handoff now uses get_recent, not search - test_parzival_bootstrap_full_qdrant_down: mock get_recent + add github_sync_enabled - test_parzival_bootstrap_graceful_degradation: same mock fixes - test_parzival_bootstrap_includes_github_enrichment: route handoff through get_recent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r regex, add quality gate P10-5: L4 GitHub enrichment now queries COLLECTION_GITHUB instead of COLLECTION_DISCUSSIONS P10-9: Tier 2 discussions search filters to high-value types only (decision, guideline, session, agent_insight, agent_handoff, agent_memory) P10-7: Rewrite detect_error_indicators() to use structured error patterns instead of bare keyword matching, preventing false positives from filenames containing 'error' P10-10: Add quality gate to user_prompt_store_async.py and agent_response_store_async.py to skip low-value short messages (< 4 words or known low-value phrases) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add dedicated `github` Qdrant collection to separate GitHub data from discussions. Updates config constants, setup-collections indexes, and all github connector modules (schema, code_sync, sync, __init__). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
F-1: Fix test import DISCUSSIONS_COLLECTION → GITHUB_COLLECTION F-2: Add missing error patterns to migration ERROR_INDICATORS regex F-3: Replace hardcoded "discussions" string with COLLECTION_DISCUSSIONS constant F-4: Replace hardcoded GITHUB_COLLECTION with import from memory.config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add v2.0.9 CHANGELOG section covering PLAN-010 injection quality sprint, Langfuse observability, and Parzival layered priority bootstrap. Update architecture doc from three-collection to five-collection (github + jira-data). Fix GitHub integration docs to reference github collection. Update error detection docs for structured pattern matching rewrite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Data flow diagram: Stop hook → PreCompact hook for session summaries - Mistake 2: align with Mistake 9 (PreCompact is correct, not Stop) - Summary table: Stop hook → PreCompact hook - Comparison table: mark code-patterns/conventions as conditionally searched in Tier 2 and SessionStart (non-Parzival path) - GitHub indexes: memory_type → type (matches Qdrant field name) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
test_get_collection_stats_includes_jira_data_when_exists expected 4 collections but PLAN-010 added the github collection making it 5. Also verify github collection is present in stats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… TD-237 BUG-204: Remove 15 hardcoded [:300] truncation limits from 5 hook scripts. Standardize TRACE_CONTENT_MAX=10000 across all files (search.py, langfuse_stop_hook.py, classification_worker.py). BUG-200 (completion): Correct type="error_fix" to "error_pattern" in error_store_async.py (5 locations). Update retrieval in error_detection.py, triggers.py, and classifier config.py to handle both type names. BUG-201 (completion): Add post-filter in context_injection_tier2.py to exclude error_fix/error_pattern from Tier 2 code-patterns injection. TD-237: Add error_pattern type to classifier LLM prompt template so the classifier can distinguish automated error captures from user-reported fixes. TD-235: Fix install.sh log message "3 collections" to "5 collections". BUG-197: Confirmed fixed (lazy imports in memory/__init__.py). 3-round adversarial review (Sonnet+Opus): Round 1 fixed error_detection.py, triggers.py, classifier config.py. Round 2 fixed test_triggers.py assertion. Round 3: ZERO ISSUES — APPROVED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CodeQL CWE-117: migrate_v209_github_collection.py logged last 3 chars of QDRANT_API_KEY in mismatch warning. Replace with key length to preserve debugging utility without exposing secret material. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
update_shared_scripts() used a non-recursive glob (*.py) that only copied top-level Python files, missing scripts/memory/ (33 files) and all .sh files (6 files). Replaced with cp -r matching copy_files() pattern. Added chmod +x for executable permissions parity with copy_files(). Fixes: BUG-205 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migration script now renames any error_fix entries that survive the false-positive purge to the correct error_pattern type (BUG-200). Updated CHANGELOG with BUG-204/205, TD-237, CodeQL fix, and classifier-worker rebuild requirement in upgrade instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v2.0.9 — Injection quality sprint (PLAN-010) + Langfuse observability + installer fixes.
githubQdrant collection — eliminates 79.6% noise from discussionsTest Results
Commits (27)
27 commits covering: PLAN-010 injection quality, Langfuse observability, Parzival bootstrap, BUG-197/198/199/200/201/204/205, TD-237, CodeQL fix, installer recursive copy + chmod.
Upgrade Path
See CHANGELOG.md for complete upgrade instructions (Option 1 + container rebuild + migration).