feat: v2.1.0 — Langfuse V3 migration, data quality, observability by Hidden-History · Pull Request #52 · Hidden-History/ai-memory

Hidden-History · 2026-03-03T23:05:51Z

Summary

Langfuse V3 SDK migration — All instrumentation migrated from V2 to V3 SDK. Compliance review resolved 2 critical, 6 standard, and 9 warning-level issues across 18 files.
Data quality improvements — Fixed false-positive error pattern capture, added type filters to context injection tier 2, renamed error_fix → error_pattern across 36 files.
Trace observability — Standardized TRACE_CONTENT_MAX constant, propagated session_id to search traces, added atexit handlers for graceful Langfuse shutdown in Docker services.
Infrastructure — ClickHouse memory limit reduced to 16 GiB, installer chmod fix for subdirectories.
Testing — Fixed flaky rate limiter integration test (BUG-175) with deterministic mock approach.

Scope

75 files changed (+1,005 / -581 lines)
5 commits covering quality sprint (PM #144–148)
2,111 tests pass locally

Test Plan

CI pipeline passes (unit + integration tests)
Langfuse V3 compliance: no V2 patterns (Langfuse() constructor, start_span(), langfuse_context)
TRACE_CONTENT_MAX used consistently (no hardcoded [:300] or [:10000] literals)
Error pattern capture uses regex indicators, not substring matching
Context injection tier 2 filters by memory_type
Docker services have atexit Langfuse shutdown handlers

PART A — Zero claude_code_session traces (langfuse_stop_hook.py): - Fix transcript format mismatch: add _get_entry_role()/_get_entry_content() helpers to handle Claude Code V2.x format (type/message.content) alongside older role/content format. _pair_turns() and first/last text extraction updated to use helpers — was silently producing empty traces. - Increase FLUSH_TIMEOUT_SECONDS 5→15, configurable via env var LANGFUSE_FLUSH_TIMEOUT_SECONDS. 5s SIGALRM was firing before flush could complete, causing all queued trace data to be discarded. - Upgrade credential-missing diagnostic from INFO→WARNING so it's visible without DEBUG_HOOKS; add transcript parse counts to INFO log. PART B — TD-241: 26/44 hook pipeline traces missing session_id: - agent_response_capture, user_prompt_capture, post_tool_capture, error_pattern_capture: set os.environ["CLAUDE_SESSION_ID"] = session_id in the hook process after reading from hook_input, so downstream library calls using emit_trace_event() pick it up via env fallback. - Propagate CLAUDE_SESSION_ID into subprocess_env for background store_async subprocesses, alongside the existing LANGFUSE_TRACE_ID. Fixes: BUG-200 (zero session traces), BUG-201 (TD-241 session_id orphans) Refs: SPEC-022 S2, BUG-151–BUG-157 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ClickHouse memory 4→16 GiB (Langfuse recommended). Fix langfuse_stop_hook V2.x transcript format (root cause of zero session traces). Fix post_tool_capture validate_hook_input (root cause of zero implementation patterns). Add agent_id to all 4 store hook payloads. Move Tier 2 type exclusion to Qdrant must_not filter. Propagate session_id to store subprocesses (fix orphaned spans). Fix [:500] trace limits in new_file and first_edit triggers. Add chmod +x for scripts subdirectories in both install paths. Rename error_fix → error_pattern across 36 files (TD-236/238/239). Merge duplicate LLM prompt type definitions. Deduplicate VALID_TYPES and SKIP_RECLASSIFICATION_TYPES. Fix cross-agent collision artifacts. Dual adversarial review (opus + sonnet): 5 MUST-FIX resolved. 2137 tests pass, 0 failures. Resolves: TD-236, TD-238, TD-239, TD-240, TD-241, TD-243 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…mpliance Enforce Langfuse Python SDK V3 across the entire codebase per LANGFUSE-INTEGRATION-SPEC.md (created this session, 766 lines). Key changes: - langfuse_stop_hook.py: V2 start_span() → V3 start_as_current_observation() + propagate_attributes() for session/user linking - langfuse_config.py: V2 Langfuse() constructor → V3 get_client() singleton - github/sync.py + code_sync.py: add propagate_attributes() + flush() - jira/sync.py: add session_id to all 3 emit_trace_event calls, graceful import fallback, try/except guards, TRACE_CONTENT_MAX constant - 26 files: add LANGFUSE-INTEGRATION-SPEC header comments (Path A/B/Infra) - tests: update 5 stop hook tests for V3 mock patterns (40/40 pass) Review: 9 findings (3 MUST-FIX resolved, 3 SHOULD-FIX deferred as TD-244/245/246) Tests: 2109 passed, 0 failures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ARNINGS resolved PM #146: Full Langfuse V3 SDK compliance review sprint. 77 files audited by 6 parallel review agents + 2 adversarial opus reviewers. 2 dev→review cycles converged to zero remaining issues. CRITICAL fixes: - trace_flush_worker.py: V2 start_span/start_generation → V3 start_observation with trace_context, observation.end() TypeError safety - test_langfuse_config.py: V2 Langfuse() constructor mocks → V3 get_client mocks + ImportError path test ISSUES fixed: - decay.py, freshness.py: guarded Langfuse imports, TRACE_CONTENT_MAX slicing, session_id propagation - injection.py: session_id propagation, contextlib.suppress → try/except - process_classification_queue.py: TRACE_CONTENT_MAX, dict serialization, session_id - github/sync.py: flush try/finally, atexit shutdown handler (TD-245) - github/code_sync.py: atexit shutdown handler (TD-246) - jira/sync.py: TRACE_CONTENT_MAX slicing WARNINGS fixed: - search.py, config.py: Langfuse spec header comments - 4 classifier providers: header corrected to "Path A upstream" - langfuse_config.py: dead module-level imports removed Tests: 2111 pass (62/62 Langfuse-specific tests pass). TD-245 FIXED, TD-246 FIXED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… BUG-175 Category A code fixes from PM #147 comprehensive review: - search.py: Add session_id to 4 emit_trace_event() calls so search traces link to their Claude Code session in Langfuse UI. Replace 4x hardcoded [:10000] with [:TRACE_CONTENT_MAX] constant per spec §9.2. - process_classification_queue.py: Add atexit Langfuse shutdown handler for classification worker daemon, matching sync.py TD-245 pattern. Ensures buffered OTel spans are flushed on Docker SIGTERM. - test_async_sdk_wrapper.py: Fix BUG-175 flaky test by mocking asyncio.sleep instead of relying on real timing. Root cause: 2 req/min rate limiter needs 30s refill, conflicting with 30s pytest timeout. Remove @pytest.mark.quarantine — test now deterministic. All changes reviewed (sonnet adversarial + Parzival verification). 2111 tests pass, 0 failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add agent_name/agent_role to all emit_trace_event() metadata dicts (23 files, 89 occurrences) for per-agent Langfuse filtering. Defaults to main/user for non-team sessions. Review fixes (adversarial review, 2x sonnet+opus): - F-1: Add error_fix to must_not_types in context_injection_tier2.py - F-2: Move atexit handler outside AnthropicInstrumentor try block - F-3: Add error_fix to error_detection.py search filter - F-4: Add warning log to trace_flush_worker end_time fallback - F-5: Add 3 must_not_types test cases (2114 total tests pass) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix I001 (import sorting), E402 (noqa for intentional lazy imports), SIM105 (contextlib.suppress), RUF100 (unused noqa), RUF003 (unicode). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add agent_name/agent_role to 4 missing emit_trace_event() calls in classification_worker.py and process_classification_queue.py - Wrap FLUSH_TIMEOUT_SECONDS int() parse in try/except (FINDING-1) - Add v2.1.0 upgrade instructions to CHANGELOG - Update ClickHouse resource table (1 GB → 16 GiB cap) - Fix error_detection.py docstring to match dual-type filter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Run black formatter on 18 files (9 from our changes, 9 pre-existing). Clarify CHANGELOG ClickHouse memory entry: 16 GiB cap (up from 4 GiB, down from unlimited default). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Bump pyproject.toml version from 2.0.8 to 2.1.0 - Fix test_generate_hook_config_stop: isolate LANGFUSE_ENABLED env var - Add test_generate_hook_config_stop_with_langfuse: verify dual Stop hooks - Fix TestLangfuseDefaults: autouse fixture clears Langfuse env vars (pydantic reads env vars even with _env_file=None) - Fix 6 test files: replace relative fixture/sys.path with Path(__file__)-based resolution for cwd-independent execution - All 2115 tests pass, black + ruff clean Reviewed-by: reviewer-sonnet (PASS, 0 findings) Reviewed-by: reviewer-opus (PASS, 2 LOW non-blocking) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

WB Solutions and others added 11 commits March 2, 2026 18:20

docs: add CHANGELOG.md v2.1.0 entry

ee50321

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(lint): resolve 26 ruff lint errors for CI compliance

69fcb94

Fix I001 (import sorting), E402 (noqa for intentional lazy imports), SIM105 (contextlib.suppress), RUF100 (unused noqa), RUF003 (unicode). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Hidden-History merged commit 83035ec into main Mar 6, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: v2.1.0 — Langfuse V3 migration, data quality, observability#52

feat: v2.1.0 — Langfuse V3 migration, data quality, observability#52
Hidden-History merged 11 commits intomainfrom
feature/v2.1.0-quality-sprint

Hidden-History commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Hidden-History commented Mar 3, 2026

Summary

Scope

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant