Skip to content

test(jsonl): add jsonl_parser schema coverage and harden null usage#22

Open
clean6378-max-it wants to merge 1 commit intomasterfrom
test/ccc1-jsonl-parser-schema
Open

test(jsonl): add jsonl_parser schema coverage and harden null usage#22
clean6378-max-it wants to merge 1 commit intomasterfrom
test/ccc1-jsonl-parser-schema

Conversation

@clean6378-max-it
Copy link
Copy Markdown
Collaborator

@clean6378-max-it clean6378-max-it commented May 5, 2026

Closes #21

Summary

Adds comprehensive pytest coverage for utils/jsonl_parser.py (CCC1) and hardens assistant parsing when message.usage is JSON null or otherwise not a plain object.

Motivation

jsonl_parser.py is the structural core for untrusted, schema-evolving Claude Code JSONL. Direct tests lock in _parse_tool_result dispatch, helper behavior, and integration paths so silent regressions from schema drift are caught early.

Changes

  • New: tests/test_jsonl_parser.py covering:
    • _parse_tool_result for all major result shapes and fallbacks
    • _normalize_content, _extract_text, _extract_images
    • _infer_title, _strip_system_tags
    • _process_user, _process_assistant, _process_system, _track_file_activity
    • parse_session integration (empty files, unknown types, sidechains, snapshots, wall time, malformed lines)
    • quick_session_info for small vs large sessions
  • Fix: _process_assistant treats non-dict usage as {} so usage: null does not crash.

Testing

pytest tests/test_jsonl_parser.py
pytest tests/
Notes
Underscore imports are confined to utils.jsonl_parser, consistent with existing parser tests.
CI automation for pytest is tracked separately (CCC2).

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **Tests**
  * Added comprehensive test suite for JSONL parsing functionality, covering tool results, content processing, message handling, file tracking, and edge cases including malformed entries.

* **Bug Fixes**
  * Improved robustness of usage data handling in message processing to prevent type errors when non-dictionary usage data appears.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Add tests/test_jsonl_parser.py covering _parse_tool_result dispatch,
content helpers, user/assistant/system paths, file activity, parse_session
integration, quick_session_info, and malformed entries. Treat non-dict
message.usage in _process_assistant as empty to avoid crashes on null usage.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

A comprehensive test suite for the JSONL parser module is added with coverage of tool result parsing, content handling, message processing, file tracking, and session parsing. A defensive guard is added to ensure usage data is always a dict in assistant message processing.

Changes

Parser Test Coverage & Defensive Guard

Layer / File(s) Summary
Defensive Implementation
utils/jsonl_parser.py
_process_assistant guards usage is always a dict; resets to {} if non-dict before accessing token/usage fields.
Unit Tests – Parsing & Content
tests/test_jsonl_parser.py (lines 32–312)
Test helpers _fresh_metadata() and _write_jsonl() added. Unit tests cover _parse_tool_result across ~14 tool types (bash, file edits, glob, grep, web, tasks), slug preservation, and null/non-dict fallbacks; _normalize_content, _extract_text, _extract_images with various input shapes and edge cases.
Unit Tests – Metadata & Filtering
tests/test_jsonl_parser.py (lines 317–412)
Tests for _infer_title (first user message, truncation to 100 chars, "Untitled Session" default), _strip_system_tags, and _process_user (metadata capture from first entry, missing message, nested image extraction).
Unit Tests – Assistant & System Processing
tests/test_jsonl_parser.py (lines 418–613)
_process_assistant tests cover content normalization, synthetic model filtering, thinking block joining, tool-use counting, API error flagging, stop reasons, service tier, and ephemeral cache token accumulation. _process_system tests verify compact boundary increments and non-compaction subtypes.
Unit Tests – File Activity & Integration
tests/test_jsonl_parser.py (lines 533–707)
_track_file_activity tests verify read/write/edit tracking, bash commands, web fetch logging. End-to-end parse_session tests cover empty files, unknown types, sidechain counting, timestamps, entry accumulation, invalid JSON, missing keys, and file cleanup.
Edge Case & Quick-Info Tests
tests/test_jsonl_parser.py (lines 734–824)
Tests for quick_session_info handle small/large files, missing user entries, and tail-read optimization. TestMalformedPartialEntries validates safe handling of missing message keys, non-dict/null usage, and invalid toolUseResult.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #349 – The code change adds a guard ensuring usage is always a dict in _process_assistant, directly supporting the comprehensive test coverage of safe handling for missing or non-dict usage data in assistant message processing.

Poem

🐰 With whisker-twitching care, I test each seam,
From bash to web, no tool left in the gleam.
A guardian guard wraps usage up tight,
Eight hundred lines of tests—we sleep well tonight! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the two main changes: adding comprehensive test coverage for jsonl_parser and hardening the null usage handling in _process_assistant.
Linked Issues check ✅ Passed The PR comprehensively fulfills all coding requirements from issue #21: test coverage for _parse_tool_result arms, content/title/system helpers, parse_session dispatch, processor functions, file activity tracking, malformed entry handling, and quick_session_info behavior, plus the hardening fix for non-dict usage values.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #21 requirements: the new test_jsonl_parser.py file and the targeted usage guard in _process_assistant align entirely with the PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/ccc1-jsonl-parser-schema

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
utils/jsonl_parser.py (1)

140-142: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard message before dereferencing in both user/assistant processors.

If an entry contains "message": null, both processors call .get(...) on None and abort parse_session for a recoverable malformed line. Apply the same dict-guard pattern here too.

💡 Suggested fix
 def _process_user(entry: dict, messages: list, metadata: dict):
@@
-    msg = entry.get("message", {})
+    msg = entry.get("message", {})
+    if not isinstance(msg, dict):
+        msg = {}
     content = msg.get("content", [])
@@
 def _process_assistant(entry: dict, messages: list, metadata: dict):
@@
-    msg = entry.get("message", {})
+    msg = entry.get("message", {})
+    if not isinstance(msg, dict):
+        msg = {}
     model = msg.get("model", "")

Also applies to: 173-174

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/jsonl_parser.py` around lines 140 - 142, The processors in
utils/jsonl_parser.py currently assume entry["message"] is a dict and call
entry.get("message", {}) without guarding against message being None; update
both user and assistant processing blocks to first coerce message to a dict
(e.g., msg = entry.get("message") or {} ) before calling msg.get("content", [])
and passing to _extract_text, and apply the same pattern where similar code
exists (the other processor and the parse_session usage) so a null "message" no
longer raises when dereferenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_jsonl_parser.py`:
- Around line 718-730: Add two regression tests in tests/test_jsonl_parser.py
alongside test_missing_usage_dict_no_crash: one that passes a record with
"message": null and another that passes a record with a non-dict usage (e.g.,
"usage": "invalid" or a number) to ensure parse_session handles these malformed
shapes without crashing and still sets metadata.total_input_tokens to 0;
reference the existing test function name test_missing_usage_dict_no_crash and
the parser entrypoint parse_session to locate where to add the new tests and
mirror the try/finally cleanup pattern used there.

---

Outside diff comments:
In `@utils/jsonl_parser.py`:
- Around line 140-142: The processors in utils/jsonl_parser.py currently assume
entry["message"] is a dict and call entry.get("message", {}) without guarding
against message being None; update both user and assistant processing blocks to
first coerce message to a dict (e.g., msg = entry.get("message") or {} ) before
calling msg.get("content", []) and passing to _extract_text, and apply the same
pattern where similar code exists (the other processor and the parse_session
usage) so a null "message" no longer raises when dereferenced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b102107-1dd1-4277-936f-9231732d0eb4

📥 Commits

Reviewing files that changed from the base of the PR and between 3b72afd and e6df529.

📒 Files selected for processing (2)
  • tests/test_jsonl_parser.py
  • utils/jsonl_parser.py

Comment on lines +718 to +730
def test_missing_usage_dict_no_crash(self):
path = _write_jsonl([
{
"type": "assistant",
"timestamp": "2026-01-01T00:00:00Z",
"message": {"model": "m", "content": [], "usage": None},
},
])
try:
s = parse_session(path)
assert s["metadata"]["total_input_tokens"] == 0
finally:
os.unlink(path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit message: null and non-dict usage regression tests.

Coverage currently locks in usage=None and missing message, but not "message": null or non-null non-dict usage values. Those malformed shapes are common in JSONL and should be pinned.

🧪 Suggested test additions
 class TestParseSession:
@@
     def test_missing_usage_dict_no_crash(self):
@@
             s = parse_session(path)
             assert s["metadata"]["total_input_tokens"] == 0
         finally:
             os.unlink(path)
+
+    `@pytest.mark.parametrize`("bad_usage", ["oops", [], 123])
+    def test_non_dict_usage_no_crash(self, bad_usage):
+        path = _write_jsonl([
+            {
+                "type": "assistant",
+                "timestamp": "2026-01-01T00:00:00Z",
+                "message": {"model": "m", "content": [], "usage": bad_usage},
+            },
+        ])
+        try:
+            s = parse_session(path)
+            assert s["metadata"]["total_input_tokens"] == 0
+        finally:
+            os.unlink(path)
@@
 class TestMalformedPartialEntries:
@@
     def test_assistant_missing_message_key(self):
@@
             assert s["messages"][0]["role"] == "assistant"
         finally:
             os.unlink(path)
+
+    def test_assistant_message_null_no_crash(self):
+        path = _write_jsonl([
+            {"type": "assistant", "timestamp": "2026-01-01T00:00:00Z", "message": None},
+        ])
+        try:
+            s = parse_session(path)
+            assert len(s["messages"]) == 1
+            assert s["messages"][0]["role"] == "assistant"
+        finally:
+            os.unlink(path)

Also applies to: 797-807

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_jsonl_parser.py` around lines 718 - 730, Add two regression tests
in tests/test_jsonl_parser.py alongside test_missing_usage_dict_no_crash: one
that passes a record with "message": null and another that passes a record with
a non-dict usage (e.g., "usage": "invalid" or a number) to ensure parse_session
handles these malformed shapes without crashing and still sets
metadata.total_input_tokens to 0; reference the existing test function name
test_missing_usage_dict_no_crash and the parser entrypoint parse_session to
locate where to add the new tests and mirror the try/finally cleanup pattern
used there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

# CCC1 — Parser test coverage: schema variants

1 participant