Skip to content

# CCC1 — Parser test coverage: schema variants #21

@clean6378-max-it

Description

@clean6378-max-it

CCC1 — Parser test coverage: schema variants

Repo: claude-code-chat-browser
Audit ref: claude-cursor.md CCC1 — "Add direct tests for jsonl_parser.py covering schema variants, malformed entries, and exception paths."
Backlog slice: Chen May 5 — Parser test coverage: test schema variants (5 pt, High).

(After opening on GitHub, paste the issue URL here.)


Summary (from audit)

Finding jsonl_parser.py (617 LOC) is the project's structural core and processes untrusted, schema-evolving Claude Code JSONL files. It has zero direct test coverage. _parse_tool_result (~140 LOC) dispatches on key presence across 14 tool result shapes with no tests. Claude Code's schema is undocumented and changes without notice; a field rename or new tool type is currently invisible.
Fix Add a dedicated tests/test_jsonl_parser.py covering all _parse_tool_result dispatch arms, parse_session entry-type dispatch, metadata accumulation, malformed-entry resilience, and _normalize_content / _extract_text / _extract_images helpers.
Effort M
Priority High — parser is zero-tested on untrusted, schema-evolving input. Pairs with CCC2 (CI must exist to run the tests).

What already exists

tests/test_null_usage_tokens.py

Covers one narrow slice of _process_assistant: null token fields in the usage object.

  • TestProcessAssistantNullUsage — 8 cases asserting null token fields don't raise and default to 0.
  • TestParseSessionNullUsage — 2 integration cases via temp file; null cache_read_input_tokens and mixed null/valid entries.
  • TestEstimateCostNullUsage_estimate_cost from session_stats.py (out of scope for this issue).

Gap: Null-usage is the only parser path with any coverage. Every other entry type, content shape, and tool result variant is untested.


tests/test_export_exclusion_filtering.py, tests/test_export_state.py, tests/test_cli_args.py

These cover export filtering, incremental-export state, and CLI argument parity respectively. None touch jsonl_parser.py directly.


Identified gaps

Gap 1: _parse_tool_result — 14 dispatch arms, 0 tests

The function classifies a toolUseResult dict by key presence. Each arm:

Result type Key(s) used for dispatch
bash stdout or stderr
file_edit structuredPatch, or filePath + newString
file_write filePath + content (no patch)
glob filenames (list)
grep mode + numFiles
file_read file (dict with filePath, numLines, content)
web_search query + results
web_fetch url + code
task (message variant) task_id or message
task (retrieval variant) retrieval_status + task
task (completed subagent) agentId + totalDurationMs
task (async launched) agentId + isAsync
todo_write newTodos or oldTodos
user_input questions + answers
plan plan + filePath
unknown fallback

No test exercises any of these. Schema evolution (e.g., codestatusCode in web_fetch) will be silent regressions.


Gap 2: parse_session — entry-type dispatch

parse_session dispatches on entry.get("type") to four handlers. Not tested:

  • A session with only user entries (no assistant).
  • A session with only assistant entries.
  • An entry with an unknown type (silently ignored — should stay silent).
  • isSidechain: true incrementing sidechain_messages.
  • file-history-snapshot type extracting timestamp from snapshot.timestamp.
  • entry_counts accumulation across mixed entry types.
  • Wall-clock time calculation from first_timestamp / last_timestamp.
  • Empty file (zero entries) returning a valid skeleton.

Gap 3: _process_user — metadata and content extraction

  • version, cwd, gitBranch, permissionMode only captured from the first user entry (subsequent ones must not overwrite).
  • toolUseResult images extracted from nested content list.
  • Missing message key (entry.get("message", {})) — must not raise.
  • content as a plain string vs list of typed blocks.

Gap 4: _process_assistant — content shape variants

  • content as a plain string (normalized to [{type:text}]).
  • content as a list of strings (each becomes a text block).
  • Mixed content: text + thinking + tool_use in one message.
  • thinking blocks accumulated as \n\n-joined string.
  • tool_use counting: multiple calls in one message increment total_tool_calls and tool_call_counts correctly.
  • isApiErrorMessage: true increments api_errors without crashing.
  • stop_reason accumulation across multiple entries.
  • cache_creation dict with ephemeral_5m_input_tokens / ephemeral_1h_input_tokens.
  • service_tier added to service_tiers set.
  • model == "<synthetic>" must not be added to models_used.

Gap 5: _track_file_activity — file and command tracking

  • Read tool → files_read set.
  • Write tool → files_created set.
  • Edit tool → files_written set.
  • Bash tool → bash_commands list.
  • WebFetchweb_fetches list (via url key).
  • WebSearchweb_fetches list (via query key).
  • Tool with empty file_path must not add to any set.

Gap 6: _process_system — compact boundary

  • subtype == "compact_boundary" increments compactions and appends to compact_boundaries.
  • Missing compactMetadata must not raise.
  • Other subtypes append a system message without touching compaction metadata.

Gap 7: _normalize_content, _extract_text, _extract_images

  • _normalize_content: plain string, list of strings, list of dicts, mixed list, None/wrong type → empty list.
  • _extract_text: only type == "text" blocks contribute; tool_use and thinking blocks ignored.
  • _extract_images: base64 image blocks extracted; nested images inside tool_result content blocks extracted; non-image blocks skipped.

Gap 8: _infer_title and _strip_system_tags

  • _infer_title: first user message with text → truncated to 100 chars. No text messages → "Untitled Session". Sidechain-only session.
  • _strip_system_tags: each tag variant removed (system-reminder, ide_opened_file, user-prompt-submit-hook, etc.). Nested/malformed tags handled gracefully.

Gap 9: malformed / partial entries

  • Line with invalid JSON → silently skipped, parse continues.
  • Entry missing type key → counted in entry_counts only if type present; otherwise ignored.
  • Entry with type: "assistant" but missing message key → msg = {}, no crash.
  • Entry with usage as None or a non-dict → no crash.
  • toolUseResult as null_parse_tool_result returns None.
  • toolUseResult as a string → _parse_tool_result returns None.

Gap 10: quick_session_info

  • Small file (≤10 000 bytes): single-pass only, title and timestamps from first 80 lines.
  • Large file (>10 000 bytes): tail-read path finds last timestamp correctly.
  • File with no user entries → title is "Untitled Session".
  • File with only system entries → no crash, both timestamps from system lines.

Proposed test cases

All tests belong in tests/test_jsonl_parser.py. Use tempfile.NamedTemporaryFile (as in test_null_usage_tokens.py) for integration tests; call helpers directly for unit tests.

TestParseToolResult
  test_bash_with_stdout
  test_bash_with_stderr_only
  test_bash_with_exit_code_and_interrupted
  test_file_edit_with_structured_patch
  test_file_edit_with_old_new_string
  test_file_write_content
  test_glob_result
  test_glob_truncated
  test_grep_result
  test_file_read_result
  test_web_search_result
  test_web_fetch_result
  test_task_message_variant
  test_task_retrieval_variant
  test_task_completed_subagent
  test_task_async_launched
  test_todo_write_result
  test_user_input_result
  test_plan_result
  test_unknown_fallback
  test_non_dict_returns_none
  test_slug_preserved

TestNormalizeContent
  test_plain_string
  test_list_of_strings
  test_list_of_dicts
  test_mixed_string_and_dict
  test_none_returns_empty
  test_wrong_type_returns_empty

TestExtractText
  test_text_blocks_joined
  test_tool_use_blocks_ignored
  test_thinking_blocks_ignored
  test_empty_content

TestExtractImages
  test_base64_image_extracted
  test_nested_tool_result_image_extracted
  test_non_image_skipped

TestInferTitle
  test_first_user_message_used
  test_truncated_to_100_chars
  test_no_text_messages_returns_untitled
  test_sidechain_only_returns_untitled

TestStripSystemTags
  test_system_reminder_removed
  test_ide_opened_file_removed
  test_user_prompt_submit_hook_removed
  test_remaining_known_opening_closing_tags_stripped
  test_clean_text_unchanged

TestProcessUser
  test_metadata_captured_from_first_entry_only
  test_missing_message_key_no_crash
  test_tool_use_result_images_extracted

TestProcessAssistant
  test_synthetic_model_not_added
  test_thinking_blocks_joined
  test_tool_use_counts_accumulated
  test_api_error_flag_increments_api_errors
  test_stop_reason_accumulated
  test_service_tier_added
  test_ephemeral_cache_tokens_accumulated

TestTrackFileActivity
  test_read_tool_adds_to_files_read
  test_write_tool_adds_to_files_created
  test_edit_tool_adds_to_files_written
  test_bash_command_appended
  test_web_fetch_url_appended
  test_web_search_query_appended
  test_empty_file_path_not_added

TestProcessSystem
  test_compact_boundary_increments_compaction
  test_compact_boundary_missing_metadata_no_crash
  test_other_subtype_no_compaction_increment

TestParseSession (integration)
  test_empty_file_returns_skeleton
  test_unknown_entry_type_silently_ignored
  test_is_sidechain_increments_counter
  test_file_history_snapshot_timestamp
  test_entry_counts_accumulated
  test_wall_time_computed
  test_invalid_json_line_skipped
  test_missing_type_key_no_crash
  test_missing_usage_dict_no_crash

TestQuickSessionInfo
  test_small_file_title_and_timestamps
  test_large_file_last_timestamp_from_tail
  test_no_user_entries_returns_untitled

Done when

  • tests/test_jsonl_parser.py created in claude-code-chat-browser/tests/.
  • All 14 _parse_tool_result dispatch arms have at least one passing test.
  • Malformed / partial entry cases (invalid JSON, missing keys, wrong types) have at least one passing test each.
  • _normalize_content, _extract_text, _extract_images unit-tested for all input shapes.
  • parse_session integration tests cover empty file, unknown type, sidechain counter, and wall-time.
  • quick_session_info small-file and large-file paths tested.
  • All new tests pass under pytest (CCC2 CI must run green).
  • No test imports _-prefixed symbols from modules other than jsonl_parser itself (avoids CCC3 breach pattern).

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions