test(jsonl): add jsonl_parser schema coverage and harden null usage by clean6378-max-it · Pull Request #22 · cppalliance/claude-code-chat-browser

clean6378-max-it · 2026-05-05T23:36:20Z

Closes #21

Summary

Adds comprehensive pytest coverage for utils/jsonl_parser.py (CCC1) and hardens assistant parsing when message.usage is JSON null or otherwise not a plain object.

Motivation

jsonl_parser.py is the structural core for untrusted, schema-evolving Claude Code JSONL. Direct tests lock in _parse_tool_result dispatch, helper behavior, and integration paths so silent regressions from schema drift are caught early.

Changes

New: tests/test_jsonl_parser.py covering:
- _parse_tool_result for all major result shapes and fallbacks
- _normalize_content, _extract_text, _extract_images
- _infer_title, _strip_system_tags
- _process_user, _process_assistant, _process_system, _track_file_activity
- parse_session integration (empty files, unknown types, sidechains, snapshots, wall time, malformed lines)
- quick_session_info for small vs large sessions
Fix: _process_assistant treats non-dict usage as {} so usage: null does not crash.

Testing

pytest tests/test_jsonl_parser.py
pytest tests/
Notes
Underscore imports are confined to utils.jsonl_parser, consistent with existing parser tests.
CI automation for pytest is tracked separately (CCC2).

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **Tests**
  * Added comprehensive test suite for JSONL parsing functionality, covering tool results, content processing, message handling, file tracking, and edge cases including malformed entries.

* **Bug Fixes**
  * Improved robustness of usage data handling in message processing to prevent type errors when non-dictionary usage data appears.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Add tests/test_jsonl_parser.py covering _parse_tool_result dispatch, content helpers, user/assistant/system paths, file activity, parse_session integration, quick_session_info, and malformed entries. Treat non-dict message.usage in _process_assistant as empty to avoid crashes on null usage.

coderabbitai · 2026-05-05T23:36:33Z

📝 Walkthrough

Walkthrough

A comprehensive test suite for the JSONL parser module is added with coverage of tool result parsing, content handling, message processing, file tracking, and session parsing. A defensive guard is added to ensure usage data is always a dict in assistant message processing.

Changes

Parser Test Coverage & Defensive Guard

Layer / File(s)	Summary
Defensive Implementation `utils/jsonl_parser.py`	`_process_assistant` guards `usage` is always a dict; resets to `{}` if non-dict before accessing token/usage fields.
Unit Tests – Parsing & Content `tests/test_jsonl_parser.py` (lines 32–312)	Test helpers `_fresh_metadata()` and `_write_jsonl()` added. Unit tests cover `_parse_tool_result` across ~14 tool types (bash, file edits, glob, grep, web, tasks), slug preservation, and null/non-dict fallbacks; `_normalize_content`, `_extract_text`, `_extract_images` with various input shapes and edge cases.
Unit Tests – Metadata & Filtering `tests/test_jsonl_parser.py` (lines 317–412)	Tests for `_infer_title` (first user message, truncation to 100 chars, "Untitled Session" default), `_strip_system_tags`, and `_process_user` (metadata capture from first entry, missing message, nested image extraction).
Unit Tests – Assistant & System Processing `tests/test_jsonl_parser.py` (lines 418–613)	`_process_assistant` tests cover content normalization, synthetic model filtering, thinking block joining, tool-use counting, API error flagging, stop reasons, service tier, and ephemeral cache token accumulation. `_process_system` tests verify compact boundary increments and non-compaction subtypes.
Unit Tests – File Activity & Integration `tests/test_jsonl_parser.py` (lines 533–707)	`_track_file_activity` tests verify read/write/edit tracking, bash commands, web fetch logging. End-to-end `parse_session` tests cover empty files, unknown types, sidechain counting, timestamps, entry accumulation, invalid JSON, missing keys, and file cleanup.
Edge Case & Quick-Info Tests `tests/test_jsonl_parser.py` (lines 734–824)	Tests for `quick_session_info` handle small/large files, missing user entries, and tail-read optimization. `TestMalformedPartialEntries` validates safe handling of missing message keys, non-dict/null usage, and invalid toolUseResult.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

#349 – The code change adds a guard ensuring usage is always a dict in _process_assistant, directly supporting the comprehensive test coverage of safe handling for missing or non-dict usage data in assistant message processing.

Poem

🐰 With whisker-twitching care, I test each seam,
From bash to web, no tool left in the gleam.
A guardian guard wraps usage up tight,
Eight hundred lines of tests—we sleep well tonight! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately describes the two main changes: adding comprehensive test coverage for jsonl_parser and hardening the null usage handling in _process_assistant.
Linked Issues check	✅ Passed	The PR comprehensively fulfills all coding requirements from issue `#21`: test coverage for _parse_tool_result arms, content/title/system helpers, parse_session dispatch, processor functions, file activity tracking, malformed entry handling, and quick_session_info behavior, plus the hardening fix for non-dict usage values.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#21` requirements: the new test_jsonl_parser.py file and the targeted usage guard in _process_assistant align entirely with the PR objectives.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/ccc1-jsonl-parser-schema

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

utils/jsonl_parser.py (1)

140-142: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard message before dereferencing in both user/assistant processors.

If an entry contains "message": null, both processors call .get(...) on None and abort parse_session for a recoverable malformed line. Apply the same dict-guard pattern here too.

💡 Suggested fix

 def _process_user(entry: dict, messages: list, metadata: dict):
@@
-    msg = entry.get("message", {})
+    msg = entry.get("message", {})
+    if not isinstance(msg, dict):
+        msg = {}
     content = msg.get("content", [])
@@
 def _process_assistant(entry: dict, messages: list, metadata: dict):
@@
-    msg = entry.get("message", {})
+    msg = entry.get("message", {})
+    if not isinstance(msg, dict):
+        msg = {}
     model = msg.get("model", "")

Also applies to: 173-174

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/jsonl_parser.py` around lines 140 - 142, The processors in
utils/jsonl_parser.py currently assume entry["message"] is a dict and call
entry.get("message", {}) without guarding against message being None; update
both user and assistant processing blocks to first coerce message to a dict
(e.g., msg = entry.get("message") or {} ) before calling msg.get("content", [])
and passing to _extract_text, and apply the same pattern where similar code
exists (the other processor and the parse_session usage) so a null "message" no
longer raises when dereferenced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_jsonl_parser.py`:
- Around line 718-730: Add two regression tests in tests/test_jsonl_parser.py
alongside test_missing_usage_dict_no_crash: one that passes a record with
"message": null and another that passes a record with a non-dict usage (e.g.,
"usage": "invalid" or a number) to ensure parse_session handles these malformed
shapes without crashing and still sets metadata.total_input_tokens to 0;
reference the existing test function name test_missing_usage_dict_no_crash and
the parser entrypoint parse_session to locate where to add the new tests and
mirror the try/finally cleanup pattern used there.

---

Outside diff comments:
In `@utils/jsonl_parser.py`:
- Around line 140-142: The processors in utils/jsonl_parser.py currently assume
entry["message"] is a dict and call entry.get("message", {}) without guarding
against message being None; update both user and assistant processing blocks to
first coerce message to a dict (e.g., msg = entry.get("message") or {} ) before
calling msg.get("content", []) and passing to _extract_text, and apply the same
pattern where similar code exists (the other processor and the parse_session
usage) so a null "message" no longer raises when dereferenced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b102107-1dd1-4277-936f-9231732d0eb4

📥 Commits

Reviewing files that changed from the base of the PR and between 3b72afd and e6df529.

📒 Files selected for processing (2)

tests/test_jsonl_parser.py
utils/jsonl_parser.py

coderabbitai · 2026-05-05T23:41:32Z

+    def test_missing_usage_dict_no_crash(self):
+        path = _write_jsonl([
+            {
+                "type": "assistant",
+                "timestamp": "2026-01-01T00:00:00Z",
+                "message": {"model": "m", "content": [], "usage": None},
+            },
+        ])
+        try:
+            s = parse_session(path)
+            assert s["metadata"]["total_input_tokens"] == 0
+        finally:
+            os.unlink(path)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit message: null and non-dict usage regression tests.

Coverage currently locks in usage=None and missing message, but not "message": null or non-null non-dict usage values. Those malformed shapes are common in JSONL and should be pinned.

🧪 Suggested test additions

class TestParseSession: @@ def test_missing_usage_dict_no_crash(self): @@ s = parse_session(path) assert s["metadata"]["total_input_tokens"] == 0 finally: os.unlink(path) + + `@pytest.mark.parametrize`("bad_usage", ["oops", [], 123]) + def test_non_dict_usage_no_crash(self, bad_usage): + path = _write_jsonl([ + { + "type": "assistant", + "timestamp": "2026-01-01T00:00:00Z", + "message": {"model": "m", "content": [], "usage": bad_usage}, + }, + ]) + try: + s = parse_session(path) + assert s["metadata"]["total_input_tokens"] == 0 + finally: + os.unlink(path) @@ class TestMalformedPartialEntries: @@ def test_assistant_missing_message_key(self): @@ assert s["messages"][0]["role"] == "assistant" finally: os.unlink(path) + + def test_assistant_message_null_no_crash(self): + path = _write_jsonl([ + {"type": "assistant", "timestamp": "2026-01-01T00:00:00Z", "message": None}, + ]) + try: + s = parse_session(path) + assert len(s["messages"]) == 1 + assert s["messages"][0]["role"] == "assistant" + finally: + os.unlink(path)

Also applies to: 797-807

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_jsonl_parser.py` around lines 718 - 730, Add two regression tests in tests/test_jsonl_parser.py alongside test_missing_usage_dict_no_crash: one that passes a record with "message": null and another that passes a record with a non-dict usage (e.g., "usage": "invalid" or a number) to ensure parse_session handles these malformed shapes without crashing and still sets metadata.total_input_tokens to 0; reference the existing test function name test_missing_usage_dict_no_crash and the parser entrypoint parse_session to locate where to add the new tests and mirror the try/finally cleanup pattern used there.

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(jsonl): add jsonl_parser schema coverage and harden null usage#22

test(jsonl): add jsonl_parser schema coverage and harden null usage#22
clean6378-max-it wants to merge 1 commit intomasterfrom
test/ccc1-jsonl-parser-schema

clean6378-max-it commented May 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

clean6378-max-it commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Testing

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clean6378-max-it commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading