feat: v2.1.0 — Langfuse V3 migration, data quality, observability#52
Merged
Hidden-History merged 11 commits intomainfrom Mar 6, 2026
Merged
feat: v2.1.0 — Langfuse V3 migration, data quality, observability#52Hidden-History merged 11 commits intomainfrom
Hidden-History merged 11 commits intomainfrom
Conversation
PART A — Zero claude_code_session traces (langfuse_stop_hook.py): - Fix transcript format mismatch: add _get_entry_role()/_get_entry_content() helpers to handle Claude Code V2.x format (type/message.content) alongside older role/content format. _pair_turns() and first/last text extraction updated to use helpers — was silently producing empty traces. - Increase FLUSH_TIMEOUT_SECONDS 5→15, configurable via env var LANGFUSE_FLUSH_TIMEOUT_SECONDS. 5s SIGALRM was firing before flush could complete, causing all queued trace data to be discarded. - Upgrade credential-missing diagnostic from INFO→WARNING so it's visible without DEBUG_HOOKS; add transcript parse counts to INFO log. PART B — TD-241: 26/44 hook pipeline traces missing session_id: - agent_response_capture, user_prompt_capture, post_tool_capture, error_pattern_capture: set os.environ["CLAUDE_SESSION_ID"] = session_id in the hook process after reading from hook_input, so downstream library calls using emit_trace_event() pick it up via env fallback. - Propagate CLAUDE_SESSION_ID into subprocess_env for background store_async subprocesses, alongside the existing LANGFUSE_TRACE_ID. Fixes: BUG-200 (zero session traces), BUG-201 (TD-241 session_id orphans) Refs: SPEC-022 S2, BUG-151–BUG-157 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ClickHouse memory 4→16 GiB (Langfuse recommended). Fix langfuse_stop_hook V2.x transcript format (root cause of zero session traces). Fix post_tool_capture validate_hook_input (root cause of zero implementation patterns). Add agent_id to all 4 store hook payloads. Move Tier 2 type exclusion to Qdrant must_not filter. Propagate session_id to store subprocesses (fix orphaned spans). Fix [:500] trace limits in new_file and first_edit triggers. Add chmod +x for scripts subdirectories in both install paths. Rename error_fix → error_pattern across 36 files (TD-236/238/239). Merge duplicate LLM prompt type definitions. Deduplicate VALID_TYPES and SKIP_RECLASSIFICATION_TYPES. Fix cross-agent collision artifacts. Dual adversarial review (opus + sonnet): 5 MUST-FIX resolved. 2137 tests pass, 0 failures. Resolves: TD-236, TD-238, TD-239, TD-240, TD-241, TD-243 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mpliance Enforce Langfuse Python SDK V3 across the entire codebase per LANGFUSE-INTEGRATION-SPEC.md (created this session, 766 lines). Key changes: - langfuse_stop_hook.py: V2 start_span() → V3 start_as_current_observation() + propagate_attributes() for session/user linking - langfuse_config.py: V2 Langfuse() constructor → V3 get_client() singleton - github/sync.py + code_sync.py: add propagate_attributes() + flush() - jira/sync.py: add session_id to all 3 emit_trace_event calls, graceful import fallback, try/except guards, TRACE_CONTENT_MAX constant - 26 files: add LANGFUSE-INTEGRATION-SPEC header comments (Path A/B/Infra) - tests: update 5 stop hook tests for V3 mock patterns (40/40 pass) Review: 9 findings (3 MUST-FIX resolved, 3 SHOULD-FIX deferred as TD-244/245/246) Tests: 2109 passed, 0 failures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ARNINGS resolved PM #146: Full Langfuse V3 SDK compliance review sprint. 77 files audited by 6 parallel review agents + 2 adversarial opus reviewers. 2 dev→review cycles converged to zero remaining issues. CRITICAL fixes: - trace_flush_worker.py: V2 start_span/start_generation → V3 start_observation with trace_context, observation.end() TypeError safety - test_langfuse_config.py: V2 Langfuse() constructor mocks → V3 get_client mocks + ImportError path test ISSUES fixed: - decay.py, freshness.py: guarded Langfuse imports, TRACE_CONTENT_MAX slicing, session_id propagation - injection.py: session_id propagation, contextlib.suppress → try/except - process_classification_queue.py: TRACE_CONTENT_MAX, dict serialization, session_id - github/sync.py: flush try/finally, atexit shutdown handler (TD-245) - github/code_sync.py: atexit shutdown handler (TD-246) - jira/sync.py: TRACE_CONTENT_MAX slicing WARNINGS fixed: - search.py, config.py: Langfuse spec header comments - 4 classifier providers: header corrected to "Path A upstream" - langfuse_config.py: dead module-level imports removed Tests: 2111 pass (62/62 Langfuse-specific tests pass). TD-245 FIXED, TD-246 FIXED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… BUG-175 Category A code fixes from PM #147 comprehensive review: - search.py: Add session_id to 4 emit_trace_event() calls so search traces link to their Claude Code session in Langfuse UI. Replace 4x hardcoded [:10000] with [:TRACE_CONTENT_MAX] constant per spec §9.2. - process_classification_queue.py: Add atexit Langfuse shutdown handler for classification worker daemon, matching sync.py TD-245 pattern. Ensures buffered OTel spans are flushed on Docker SIGTERM. - test_async_sdk_wrapper.py: Fix BUG-175 flaky test by mocking asyncio.sleep instead of relying on real timing. Root cause: 2 req/min rate limiter needs 30s refill, conflicting with 30s pytest timeout. Remove @pytest.mark.quarantine — test now deterministic. All changes reviewed (sonnet adversarial + Parzival verification). 2111 tests pass, 0 failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add agent_name/agent_role to all emit_trace_event() metadata dicts (23 files, 89 occurrences) for per-agent Langfuse filtering. Defaults to main/user for non-team sessions. Review fixes (adversarial review, 2x sonnet+opus): - F-1: Add error_fix to must_not_types in context_injection_tier2.py - F-2: Move atexit handler outside AnthropicInstrumentor try block - F-3: Add error_fix to error_detection.py search filter - F-4: Add warning log to trace_flush_worker end_time fallback - F-5: Add 3 must_not_types test cases (2114 total tests pass) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix I001 (import sorting), E402 (noqa for intentional lazy imports), SIM105 (contextlib.suppress), RUF100 (unused noqa), RUF003 (unicode). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add agent_name/agent_role to 4 missing emit_trace_event() calls in classification_worker.py and process_classification_queue.py - Wrap FLUSH_TIMEOUT_SECONDS int() parse in try/except (FINDING-1) - Add v2.1.0 upgrade instructions to CHANGELOG - Update ClickHouse resource table (1 GB → 16 GiB cap) - Fix error_detection.py docstring to match dual-type filter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run black formatter on 18 files (9 from our changes, 9 pre-existing). Clarify CHANGELOG ClickHouse memory entry: 16 GiB cap (up from 4 GiB, down from unlimited default). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bump pyproject.toml version from 2.0.8 to 2.1.0 - Fix test_generate_hook_config_stop: isolate LANGFUSE_ENABLED env var - Add test_generate_hook_config_stop_with_langfuse: verify dual Stop hooks - Fix TestLangfuseDefaults: autouse fixture clears Langfuse env vars (pydantic reads env vars even with _env_file=None) - Fix 6 test files: replace relative fixture/sys.path with Path(__file__)-based resolution for cwd-independent execution - All 2115 tests pass, black + ruff clean Reviewed-by: reviewer-sonnet (PASS, 0 findings) Reviewed-by: reviewer-opus (PASS, 2 LOW non-blocking) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
error_fix→error_patternacross 36 files.TRACE_CONTENT_MAXconstant, propagatedsession_idto search traces, addedatexithandlers for graceful Langfuse shutdown in Docker services.chmodfix for subdirectories.Scope
Test Plan
Langfuse()constructor,start_span(),langfuse_context)TRACE_CONTENT_MAXused consistently (no hardcoded[:300]or[:10000]literals)memory_typeatexitLangfuse shutdown handlers