fix(desktop-backend): repair truncated Gemini JSON responses causing 500s#5957
fix(desktop-backend): repair truncated Gemini JSON responses causing 500s#5957
Conversation
…500s Gemini frequently returns truncated JSON when hitting max_output_tokens, causing "EOF while parsing a string" errors and 500s on /v1/conversations for all users (500+ errors/day in prod). Changes: - Add parse_or_repair_json() that detects truncated JSON and closes open strings, brackets, and braces to recover partial results - Apply repair to all 7 JSON parse sites (structure, brief structure, action items, memories, requires_context, date range, knowledge graph) - Make action items extraction non-fatal (like memories already was) so a parse failure doesn't block the entire conversation from being saved - Add retry with doubled max_tokens for structure extraction on first parse failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR adds a Key changes:
Minor issues to address before merge:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[LLM Response from Gemini] --> B{parse_or_repair_json}
B -->|Valid JSON| C[Deserialize T ✓]
B -->|Parse fails| D[Scan chars: track in_string / stack]
D --> E{Truncated?}
E -->|Yes — build suffix| F["Append close-string + close brackets"]
F --> G{Re-parse repaired JSON}
G -->|OK| H[log info: Repaired truncated JSON]
H --> C
G -->|Still fails| I[Return Err with raw response]
E -->|Empty / no suffix needed| I
I --> J{Which call site?}
J -->|extract_structure| K["retry with 3000 tokens (same prompt)"]
K --> L{parse_or_repair_json again}
L -->|OK| C
L -->|Fails| M[Propagate error → 500]
J -->|extract_action_items| N[warn + return vec empty — non-fatal]
J -->|extract_memories| N
J -->|requires_context / date_range / knowledge_graph| M
Reviews (1): Last reviewed commit: "fix(desktop-backend): repair truncated G..." | Re-trigger Greptile |
| let mut last_was_string_content = false; | ||
|
|
||
| for ch in trimmed.chars() { | ||
| if escape_next { | ||
| escape_next = false; | ||
| continue; | ||
| } | ||
| if ch == '\\' && in_string { | ||
| escape_next = true; | ||
| continue; | ||
| } | ||
| if ch == '"' { | ||
| in_string = !in_string; | ||
| last_was_string_content = false; | ||
| continue; | ||
| } | ||
| if in_string { | ||
| last_was_string_content = true; | ||
| continue; | ||
| } | ||
| last_was_string_content = false; | ||
| match ch { | ||
| '{' => stack.push('}'), | ||
| '[' => stack.push(']'), | ||
| '}' | ']' => { stack.pop(); } | ||
| _ => {} | ||
| } |
There was a problem hiding this comment.
last_was_string_content is unused dead code
last_was_string_content is declared, written to in multiple branches, but never actually read in any conditional check or returned value. The Rust compiler will emit an unused_assignments / unused variable warning for this. If #![deny(warnings)] or RUSTFLAGS=-D warnings is set in CI, this becomes a compile error.
| let mut last_was_string_content = false; | |
| for ch in trimmed.chars() { | |
| if escape_next { | |
| escape_next = false; | |
| continue; | |
| } | |
| if ch == '\\' && in_string { | |
| escape_next = true; | |
| continue; | |
| } | |
| if ch == '"' { | |
| in_string = !in_string; | |
| last_was_string_content = false; | |
| continue; | |
| } | |
| if in_string { | |
| last_was_string_content = true; | |
| continue; | |
| } | |
| last_was_string_content = false; | |
| match ch { | |
| '{' => stack.push('}'), | |
| '[' => stack.push(']'), | |
| '}' | ']' => { stack.pop(); } | |
| _ => {} | |
| } | |
| let mut in_string = false; | |
| let mut escape_next = false; | |
| let mut stack: Vec<char> = Vec::new(); | |
| for ch in trimmed.chars() { | |
| if escape_next { | |
| escape_next = false; | |
| continue; | |
| } | |
| if ch == '\\' && in_string { | |
| escape_next = true; | |
| continue; | |
| } | |
| if ch == '"' { | |
| in_string = !in_string; | |
| continue; | |
| } | |
| if in_string { | |
| continue; | |
| } | |
| match ch { | |
| '{' => stack.push('}'), | |
| '[' => stack.push(']'), | |
| '}' | ']' => { stack.pop(); } | |
| _ => {} | |
| } | |
| } |
| Err(format!( | ||
| "Failed to parse {} response: {} - {}", | ||
| label, | ||
| serde_json::from_str::<serde_json::Value>(response) | ||
| .err() | ||
| .map(|e| e.to_string()) | ||
| .unwrap_or_else(|| "type mismatch".to_string()), | ||
| response | ||
| )) |
There was a problem hiding this comment.
Full LLM response logged in error messages
The error returned by parse_or_repair_json embeds the raw LLM response (potentially kilobytes of transcript data). When this is promoted to a tracing::warn! log — e.g., at the retry site:
tracing::warn!("Structure parse failed, retrying with 3000 tokens: {}", first_err);— the complete truncated response lands in structured logs. This increases log volume significantly and may inadvertently surface sensitive transcript content in log aggregators/monitoring systems.
Consider truncating the response in the error message (e.g., limit to first 200 chars) and logging the full response only at tracing::debug! level:
Err(format!(
"Failed to parse {} response: {} - {}…",
label,
serde_json::from_str::<serde_json::Value>(response)
.err()
.map(|e| e.to_string())
.unwrap_or_else(|| "type mismatch".to_string()),
&response[..response.len().min(200)]
))| // Try parsing, and if it fails (truncated JSON), retry with more tokens | ||
| let result: StructureResponse = match parse_or_repair_json(&response, "structure") { | ||
| Ok(r) => r, | ||
| Err(first_err) => { | ||
| tracing::warn!("Structure parse failed, retrying with 3000 tokens: {}", first_err); | ||
| let retry_response = self.call_with_schema(&prompt, Some(0.7), Some(3000), Some(schema)).await?; | ||
| parse_or_repair_json(&retry_response, "structure (retry)")? | ||
| } | ||
| }; |
There was a problem hiding this comment.
Retry on parse failure may obscure non-truncation errors
The retry is triggered on any parse_or_repair_json failure, not just truncation failures. For example, if Gemini returns a structurally wrong schema (e.g., a field with an unexpected type), the retry will also fire, burning an extra LLM call and doubling latency before propagating the same error.
If you want to limit retries to genuine truncation scenarios, you could expose a separate was_truncated flag from parse_or_repair_json and only retry when that flag is set — or check the finish reason from the Gemini API response before retrying.
…500s (BasedHardware#5957) ## Summary - Gemini frequently returns truncated JSON when hitting `max_output_tokens`, causing `EOF while parsing a string` serde errors and **500s on `/v1/conversations`** for all users (~500+ errors/day in prod) - Added `parse_or_repair_json()` helper that detects truncated JSON and closes open strings/brackets/braces to recover partial results - Applied to all 7 JSON parse sites in the LLM client - Made action items extraction non-fatal (like memories already was) — parse failure no longer blocks the conversation from being saved - Added retry with doubled `max_tokens` for structure extraction on first parse failure ## Test plan - [ ] `cargo build` passes (verified locally) - [ ] Deploy to dev and verify 500 rate drops on `/v1/conversations/from-segments` - [ ] Verify conversations still process correctly with valid Gemini responses - [ ] Verify truncated responses are repaired (check logs for "Repaired truncated" messages) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Summary
max_output_tokens, causingEOF while parsing a stringserde errors and 500s on/v1/conversationsfor all users (~500+ errors/day in prod)parse_or_repair_json()helper that detects truncated JSON and closes open strings/brackets/braces to recover partial resultsmax_tokensfor structure extraction on first parse failureTest plan
cargo buildpasses (verified locally)/v1/conversations/from-segments🤖 Generated with Claude Code