fix: add missing commas in LLM prompt JSON formats and guard list return types#274
Open
zhoufengen wants to merge 1 commit into
Open
fix: add missing commas in LLM prompt JSON formats and guard list return types#274zhoufengen wants to merge 1 commit into
zhoufengen wants to merge 1 commit into
Conversation
…urn types - Add trailing commas after "thinking" fields in 6 prompt reply formats (check_title_appearance, check_title_appearance_in_start, toc_detector_single_page, check_if_toc_extraction_is_complete, check_if_toc_transformation_is_complete, detect_page_index). Without the comma, LLMs that follow the format literally produce invalid JSON, causing extract_json to fail and downstream KeyError crashes (VectifyAI#257). - Guard generate_toc_init and generate_toc_continue to return [] instead of a dict when extract_json returns a non-list on malformed LLM output. Prevents AttributeError: 'dict' object has no attribute 'extend' in process_no_toc and AttributeError: 'str' object has no attribute 'get' in meta_processor (VectifyAI#199). Generated with [AWS Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related robustness issues that cause crashes when LLMs return malformed or unexpected JSON responses.
Changes
Fix #257 — Missing commas in LLM prompt reply formats
Six prompt templates in
pageindex/page_index.pywere missing a trailing comma after the"thinking"field:check_title_appearance(line 34)check_title_appearance_in_start(line 62)toc_detector_single_page(line 112)check_if_toc_extraction_is_complete(line 132)check_if_toc_transformation_is_complete(line 150)detect_page_index(line 213)When an LLM follows the format literally, it produces invalid JSON (missing comma between keys).
extract_jsonthen fails and returns{}, causingKeyErrorontoc_detected,completed, orpage_index_given_in_toc.Fix #199 —
generate_toc_init/generate_toc_continuemust return a listWhen
extract_jsonfails on a malformed LLM response it returns{}(a dict). Bothgenerate_toc_initandgenerate_toc_continuepassed this through directly. Callers expect a list:process_no_toccallstoc_with_page_number.extend(...)→AttributeError: 'dict' object has no attribute 'extend'meta_processorcallsitem.get('physical_index')on list items →AttributeError: 'str' object has no attribute 'get'Both functions now return
[]whenextract_jsonreturns a non-list, matching the expected contract.Testing
Added
tests/test_llm_response_robustness.pywith 5 tests covering:"thinking"fields in prompt reply formats have trailing commasgenerate_toc_initreturns[](not a dict) when LLM returns malformed JSONgenerate_toc_continuereturns[](not a dict) when LLM returns malformed JSONgenerate_toc_initpasses through a valid list unchangedprocess_no_tocdoes not raiseAttributeErrorwhengenerate_toc_initreturns[]Run with:
Related Issues
Fixes #257
Fixes #199