Session resume via /sessions immediately fails with Bedrock ValidationException (toolResult/toolUse mismatch)

## Summary

Session resume via `cagent run /sessions` fails immediately with a Bedrock `ValidationException` indicating a `toolResult`/`toolUse` block count mismatch. This occurs regardless of conversation length or complexity — even a minimal "hello" conversation fails on resume. The error occurs when the restored conversation history is sent to the Bedrock API, suggesting the issue is related to how session state is serialized and deserialized.

## Environment

- **LLM Provider**: AWS Bedrock via litellm proxy (`localhost:4000`)
- **Model**: Claude Haiku (`haiku` model group)
- **Proxy**: litellm (OpenAI-compatible endpoint)
- **API**: Bedrock `ConverseStream` operation
- **Fallback Configuration**: None (`Available Model Group Fallbacks=None`)

## Steps to Reproduce

1. Configure cagent to use a litellm proxy pointing to AWS Bedrock (Claude Haiku)
2. Start a new cagent session
3. Have a **minimal interaction** — even just saying "hello" is sufficient
4. Exit the session
5. Run `cagent run /sessions` and select the previous session to resume
6. **Immediately fails** with `ValidationException` — no further interaction is possible

> **Note**: This reproduces on any session, regardless of length or complexity.

## Expected Behavior

cagent should be able to resume any previously saved session via `cagent run /sessions`, correctly restoring the conversation history in a format compliant with the Bedrock ConverseStream API contract (i.e., every `toolResult` block must correspond to a `toolUse` block in the preceding assistant message).

## Actual Behavior

Session resume **immediately fails** on every attempt, regardless of how short or simple the original conversation was. The following error is returned (sanitized):

```
all models failed: error receiving from stream: POST "http://localhost:4000/chat/completions": 503 Service Unavailable

{"message":"litellm.ServiceUnavailableError: litellm.MidStreamFallbackError: litellm.APIConnectionError:
APIConnectionError: OpenAIException - Stream generation failed: ProviderException: (400, \"AWS client error
encountered: <class 'botocore.errorfactory.ValidationException'>: An error occurred (ValidationException) when
calling the ConverseStream operation (reached max retries: 0): The number of toolResult blocks at
messages.128.content exceeds the number of toolUse blocks of previous turn.\", {'Error': {'Message': 'The number
of toolResult blocks at messages.128.content exceeds the number of toolUse blocks of previous turn.', 'Code':
'ValidationException'}, 'ResponseMetadata': {'RequestId': '<redacted>', 'HTTPStatusCode': 400, 'HTTPHeaders':
{'date': 'Tue, 10 Feb 2026 08:58:41 GMT', 'content-type': 'application/json', 'content-length': '124',
'connection': 'keep-alive', 'x-amzn-requestid': '<redacted>', 'x-amzn-errortype':
'ValidationException:http://internal.amazon.com/coral/com.amazon.bedrock/'}, 'MaxAttemptsReached': True,
'RetryAttempts': 0}, 'message': 'The number of toolResult blocks at messages.128.content exceeds the number of
toolUse blocks of previous turn.'}). Received Model Group=haiku\nAvailable Model Group Fallbacks=None",
"type":null,"param":null,"code":"503"}
```

## Analysis

### Observations

The following observations may help narrow down the cause:

1. **The original session worked fine** — the conversation history was valid during the live session
2. **The failure happens immediately on resume** — the very first API call with the restored history is rejected
3. **It reproduces on any session** — even trivially short ones, ruling out conversation length as a factor
4. **The error is about message structure** — the restored history has `toolResult` blocks without matching `toolUse` blocks in the preceding assistant message

### How the Error Occurs

When a session is resumed via `cagent run /sessions`, the conversation history is reconstructed and sent to the LLM provider. The reconstructed history contains `toolResult` blocks in user messages that do not have corresponding `toolUse` blocks in the preceding assistant messages — violating Bedrock's API contract.

### Possible Contributing Factors

1. **Session serialization may drop or reorder tool-related message blocks**: When saving the session state, the exact structure of assistant messages containing `toolUse` blocks may not be preserved, while the corresponding user messages with `toolResult` blocks are preserved — breaking the pairing.

2. **Deserialization may reconstruct messages incorrectly**: When loading the session state back, the conversation may be reconstructed in a way that orphans `toolResult` blocks (e.g., merging multiple assistant messages, dropping `tool_calls` from assistant messages, or reordering content blocks).

3. **Internal tool calls may not round-trip correctly**: cagent's internal tool operations (reading files, searching, directory listing, etc.) generate `toolUse`/`toolResult` pairs in the conversation history. These may not survive the save → load cycle correctly.

4. **Format translation on restore**: If session state is stored in one format (e.g., OpenAI's `tool_calls`/`tool` message roles) and reconstructed differently when sending to Bedrock (which uses `toolUse`/`toolResult` content blocks), the translation may introduce mismatches.

### Error Chain

1. User runs `cagent run /sessions` and selects a previous session to resume
2. cagent loads saved session state and reconstructs the conversation history
3. cagent sends the reconstructed conversation to litellm proxy → AWS Bedrock
4. AWS Bedrock rejects the request with HTTP 400 `ValidationException` (toolResult/toolUse block count mismatch)
5. litellm wraps the error as `MidStreamFallbackError` → `ServiceUnavailableError` (503)
6. cagent reports `all models failed` — session resume fails completely

## Impact

- **Severity**: High — session resumption via `/sessions` does not work
- **Scope**: Affects all sessions when using Bedrock as the LLM provider. Even minimal conversations cannot be resumed.
- **Workaround**: None for session resumption. Users must start new sessions, losing prior context.

## Suggested Investigation Areas

- Inspect how cagent serializes conversation history to disk (session save)
- Inspect how cagent deserializes and reconstructs conversation history on resume (session load)
- Verify that `toolUse`/`toolResult` block pairing is preserved through the save → load cycle
- Check if the session format assumes a specific LLM provider's message schema and whether Bedrock's requirements are handled correctly
- Add validation before sending restored history to ensure `toolResult`/`toolUse` pairing invariant holds

## Related Issues

- #1593 — AWS Bedrock `ValidationException`: Protocol violation when assistant message contains both `tool_calls` and content (related Bedrock validation issue with different root cause)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session resume via /sessions immediately fails with Bedrock ValidationException (toolResult/toolUse mismatch) #1676

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Analysis

Observations

How the Error Occurs

Possible Contributing Factors

Error Chain

Impact

Suggested Investigation Areas

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Session resume via /sessions immediately fails with Bedrock ValidationException (toolResult/toolUse mismatch) #1676

Description

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Analysis

Observations

How the Error Occurs

Possible Contributing Factors

Error Chain

Impact

Suggested Investigation Areas

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions