Skip to content

fix(desktop-backend): repair truncated Gemini JSON responses causing 500s#5957

Merged
kodjima33 merged 1 commit intomainfrom
worktree-fix-gemini-json-parsing
Mar 23, 2026
Merged

fix(desktop-backend): repair truncated Gemini JSON responses causing 500s#5957
kodjima33 merged 1 commit intomainfrom
worktree-fix-gemini-json-parsing

Conversation

@kodjima33
Copy link
Copy Markdown
Collaborator

Summary

  • Gemini frequently returns truncated JSON when hitting max_output_tokens, causing EOF while parsing a string serde errors and 500s on /v1/conversations for all users (~500+ errors/day in prod)
  • Added parse_or_repair_json() helper that detects truncated JSON and closes open strings/brackets/braces to recover partial results
  • Applied to all 7 JSON parse sites in the LLM client
  • Made action items extraction non-fatal (like memories already was) — parse failure no longer blocks the conversation from being saved
  • Added retry with doubled max_tokens for structure extraction on first parse failure

Test plan

  • cargo build passes (verified locally)
  • Deploy to dev and verify 500 rate drops on /v1/conversations/from-segments
  • Verify conversations still process correctly with valid Gemini responses
  • Verify truncated responses are repaired (check logs for "Repaired truncated" messages)

🤖 Generated with Claude Code

…500s

Gemini frequently returns truncated JSON when hitting max_output_tokens,
causing "EOF while parsing a string" errors and 500s on /v1/conversations
for all users (500+ errors/day in prod).

Changes:
- Add parse_or_repair_json() that detects truncated JSON and closes open
  strings, brackets, and braces to recover partial results
- Apply repair to all 7 JSON parse sites (structure, brief structure,
  action items, memories, requires_context, date range, knowledge graph)
- Make action items extraction non-fatal (like memories already was) so
  a parse failure doesn't block the entire conversation from being saved
- Add retry with doubled max_tokens for structure extraction on first
  parse failure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kodjima33 kodjima33 merged commit 95d6d9e into main Mar 23, 2026
2 checks passed
@kodjima33 kodjima33 deleted the worktree-fix-gemini-json-parsing branch March 23, 2026 21:31
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 23, 2026

Greptile Summary

This PR adds a parse_or_repair_json helper to recover from truncated Gemini responses (caused by max_output_tokens exhaustion) and applies it at all 7 JSON deserialization sites in desktop/Backend-Rust/src/llm/client.rs. It also makes action-item extraction non-fatal (matching the existing behavior for memories) and adds a retry-with-doubled-tokens path specifically for extract_structure.

Key changes:

  • parse_or_repair_json<T>: scans for unbalanced quotes/braces/brackets and appends a minimal closing suffix before re-parsing; falls through gracefully if repair fails.
  • extract_structure: on parse failure, automatically retries the Gemini call with double the token budget before propagating the error.
  • extract_action_items failure is now non-fatal — returns an empty list with a warn log instead of surfacing a 500.
  • Previous ad-hoc repair for memories (appending }}) replaced by the generalized helper.

Minor issues to address before merge:

  • last_was_string_content is written but never read anywhere in the function — dead code that will generate a Rust compiler warning (and a compile error under RUSTFLAGS=-D warnings).
  • The error value passed to tracing::warn! in the retry path includes the full raw LLM response, which can be verbose and may surface sensitive transcript content in log aggregators.
  • The retry in extract_structure fires on any parse failure, not just confirmed truncation — consider guarding it with a truncation-specific check to avoid unnecessary retries on schema-mismatch errors.

Confidence Score: 4/5

  • Safe to merge after addressing the unused variable warning; production reliability improvement is clear and well-scoped.
  • The core repair logic is sound and addresses a real, high-volume production issue. The three open concerns are all P2/style: an unused variable (compiler warning), log verbosity, and a retry over-trigger. None of these block correct behavior or introduce regressions. The non-fatal action-item change and the generalized repair function are clear improvements over the previous ad-hoc approach.
  • desktop/Backend-Rust/src/llm/client.rs — the parse_or_repair_json function (lines 17–90) is the only changed file and warrants the most attention.

Important Files Changed

Filename Overview
desktop/Backend-Rust/src/llm/client.rs Adds parse_or_repair_json helper to recover truncated Gemini responses; applies it at all 7 JSON parse sites; makes action-item extraction non-fatal; adds a retry-with-doubled-tokens path for structure extraction. Minor issues: unused last_was_string_content variable (compiler warning), full LLM response embedded in error/warn logs, and retry fires on any parse error rather than only on confirmed truncation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LLM Response from Gemini] --> B{parse_or_repair_json}
    B -->|Valid JSON| C[Deserialize T ✓]
    B -->|Parse fails| D[Scan chars: track in_string / stack]
    D --> E{Truncated?}
    E -->|Yes — build suffix| F["Append close-string + close brackets"]
    F --> G{Re-parse repaired JSON}
    G -->|OK| H[log info: Repaired truncated JSON]
    H --> C
    G -->|Still fails| I[Return Err with raw response]
    E -->|Empty / no suffix needed| I

    I --> J{Which call site?}
    J -->|extract_structure| K["retry with 3000 tokens (same prompt)"]
    K --> L{parse_or_repair_json again}
    L -->|OK| C
    L -->|Fails| M[Propagate error → 500]

    J -->|extract_action_items| N[warn + return vec empty — non-fatal]
    J -->|extract_memories| N
    J -->|requires_context / date_range / knowledge_graph| M
Loading

Reviews (1): Last reviewed commit: "fix(desktop-backend): repair truncated G..." | Re-trigger Greptile

Comment on lines +30 to +56
let mut last_was_string_content = false;

for ch in trimmed.chars() {
if escape_next {
escape_next = false;
continue;
}
if ch == '\\' && in_string {
escape_next = true;
continue;
}
if ch == '"' {
in_string = !in_string;
last_was_string_content = false;
continue;
}
if in_string {
last_was_string_content = true;
continue;
}
last_was_string_content = false;
match ch {
'{' => stack.push('}'),
'[' => stack.push(']'),
'}' | ']' => { stack.pop(); }
_ => {}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 last_was_string_content is unused dead code

last_was_string_content is declared, written to in multiple branches, but never actually read in any conditional check or returned value. The Rust compiler will emit an unused_assignments / unused variable warning for this. If #![deny(warnings)] or RUSTFLAGS=-D warnings is set in CI, this becomes a compile error.

Suggested change
let mut last_was_string_content = false;
for ch in trimmed.chars() {
if escape_next {
escape_next = false;
continue;
}
if ch == '\\' && in_string {
escape_next = true;
continue;
}
if ch == '"' {
in_string = !in_string;
last_was_string_content = false;
continue;
}
if in_string {
last_was_string_content = true;
continue;
}
last_was_string_content = false;
match ch {
'{' => stack.push('}'),
'[' => stack.push(']'),
'}' | ']' => { stack.pop(); }
_ => {}
}
let mut in_string = false;
let mut escape_next = false;
let mut stack: Vec<char> = Vec::new();
for ch in trimmed.chars() {
if escape_next {
escape_next = false;
continue;
}
if ch == '\\' && in_string {
escape_next = true;
continue;
}
if ch == '"' {
in_string = !in_string;
continue;
}
if in_string {
continue;
}
match ch {
'{' => stack.push('}'),
'[' => stack.push(']'),
'}' | ']' => { stack.pop(); }
_ => {}
}
}

Comment on lines +81 to +89
Err(format!(
"Failed to parse {} response: {} - {}",
label,
serde_json::from_str::<serde_json::Value>(response)
.err()
.map(|e| e.to_string())
.unwrap_or_else(|| "type mismatch".to_string()),
response
))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Full LLM response logged in error messages

The error returned by parse_or_repair_json embeds the raw LLM response (potentially kilobytes of transcript data). When this is promoted to a tracing::warn! log — e.g., at the retry site:

tracing::warn!("Structure parse failed, retrying with 3000 tokens: {}", first_err);

— the complete truncated response lands in structured logs. This increases log volume significantly and may inadvertently surface sensitive transcript content in log aggregators/monitoring systems.

Consider truncating the response in the error message (e.g., limit to first 200 chars) and logging the full response only at tracing::debug! level:

    Err(format!(
        "Failed to parse {} response: {} - {}…",
        label,
        serde_json::from_str::<serde_json::Value>(response)
            .err()
            .map(|e| e.to_string())
            .unwrap_or_else(|| "type mismatch".to_string()),
        &response[..response.len().min(200)]
    ))

Comment on lines +399 to +407
// Try parsing, and if it fails (truncated JSON), retry with more tokens
let result: StructureResponse = match parse_or_repair_json(&response, "structure") {
Ok(r) => r,
Err(first_err) => {
tracing::warn!("Structure parse failed, retrying with 3000 tokens: {}", first_err);
let retry_response = self.call_with_schema(&prompt, Some(0.7), Some(3000), Some(schema)).await?;
parse_or_repair_json(&retry_response, "structure (retry)")?
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Retry on parse failure may obscure non-truncation errors

The retry is triggered on any parse_or_repair_json failure, not just truncation failures. For example, if Gemini returns a structurally wrong schema (e.g., a field with an unexpected type), the retry will also fire, burning an extra LLM call and doubling latency before propagating the same error.

If you want to limit retries to genuine truncation scenarios, you could expose a separate was_truncated flag from parse_or_repair_json and only retry when that flag is set — or check the finish reason from the Gemini API response before retrying.

Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026
…500s (BasedHardware#5957)

## Summary
- Gemini frequently returns truncated JSON when hitting
`max_output_tokens`, causing `EOF while parsing a string` serde errors
and **500s on `/v1/conversations`** for all users (~500+ errors/day in
prod)
- Added `parse_or_repair_json()` helper that detects truncated JSON and
closes open strings/brackets/braces to recover partial results
- Applied to all 7 JSON parse sites in the LLM client
- Made action items extraction non-fatal (like memories already was) —
parse failure no longer blocks the conversation from being saved
- Added retry with doubled `max_tokens` for structure extraction on
first parse failure

## Test plan
- [ ] `cargo build` passes (verified locally)
- [ ] Deploy to dev and verify 500 rate drops on
`/v1/conversations/from-segments`
- [ ] Verify conversations still process correctly with valid Gemini
responses
- [ ] Verify truncated responses are repaired (check logs for "Repaired
truncated" messages)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant