Overview
This patch release makes summary generation failures non-blocking by default so transient upstream errors no longer interrupt the active chat. It also adds an explicit operator valve to preserve the old hard-failure behavior when needed.
Bug Fixes
- Graceful summary failure handling (Issue #74): Background summary LLM failures now default to a silent skip instead of re-raising into the active chat flow.
- Chat continuity preserved during transient upstream errors: Short-lived upstream provider failures such as 502s now log the summary error and continue the current chat without saving a summary for that turn.
New Features
summary_fail_modevalve: Added a new valve withsilentandraisemodes so operators can choose between chat-friendly degradation and strict failure visibility.- Regression coverage for both modes: Added tests for the default silent path and the opt-in raise path.
Migration Notes
No breaking changes. Default behavior is now summary_fail_mode="silent". Set summary_fail_mode="raise" if you need the previous hard-failure behavior for debugging.
Overview
This patch release broadens summary-response parsing so the filter can accept both classic chat-completions payloads and Responses-style output payloads. It also improves empty-summary diagnostics without persisting reasoning-only fields.
Bug Fixes
- Alternate summary payload support:
_call_summary_llm()now accepts summary text fromchoices[].message.content,output_textcontent parts, and Responses-styleoutputmessage items. - Stale choices-only gate removed: The summary call path no longer rejects valid provider payloads just because they omit
choices. - Clearer empty-summary errors: When no final summary text is present, the filter now reports a compact response-shape summary instead of a misleading generic format error.
Behavior Notes
- Reasoning-only output is ignored:
reasoning_content,thinking, and reasoning output items are not treated as summary text, so private chain-of-thought is not written into chat memory. - No change to 1.6.3 fail-mode behavior:
summary_fail_modecontinues to control whether upstream summary-call errors are silent or raised.
Migration Notes
No breaking changes. If a provider returns only reasoning fields and no final answer text, the filter will skip saving a summary for that turn and log the response shape for debugging.
Overview
This release adds branch-aware summary storage and reuse. Cached summaries are now validated against ordered message references and payload fingerprints before they are injected, so summaries from sibling branches or edited history are rejected instead of being reused in the wrong conversation branch.
Because the persisted summary schema changed from count-only/current-summary storage to branch-aware rows, this release is versioned as 1.7.0.
New Features
- Branch-aware summary reuse: Stored summaries carry message ids and fingerprints. The filter reuses only the newest summary that is valid for the current branch.
- Single-table summary storage: Branch-aware rows now live directly in
chat_summary; all branch summaries are stored in that table. - Safer schema upgrades: Count-only legacy
chat_summarytables, or tables that still enforce one row per chat, are rebuilt so unsafe summaries are regenerated.
Branch Example
OpenWebUI chats can form a tree when a user edits or forks from an earlier message. Version 1.7.0 treats a summary as valid only for the branch whose ordered message ids and payload fingerprints it covers.
Example with a fork that does not align with a compression boundary:
- A chat first grows on the main branch:
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9 -> 10. - The first compression stores a branch-aware summary covering messages
1-5. - More messages arrive on the same branch, and a later compression derives a new summary covering
1-10by summarizing the previous1-5summary plus messages6-10. - The user then forks from message
7, which sits between the two compression boundaries, and creates a sibling branch:1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8b -> 9b. - On that sibling branch, the
1-10summary is rejected because it contains live sibling refs8-10from the original branch. The filter can still reuse the nearest valid ancestor summary,1-5, then keep6 -> 7 -> 8b -> 9bas live tail context. - When the sibling branch is compressed, it derives and stores a separate summary such as
1-9bfrom the old1-5summary plus the sibling branch's live tail. Both1-10and1-9bremain available, but only the newest summary valid for the current branch is injected.
This is why the filter records exact message identity instead of only storing a compressed message count. A count-based summary for "10 messages" cannot tell whether those 10 messages belong to the current branch or a sibling branch.
Configuration Notes
max_summary_tokens must be strictly less than 80% of the summary model input window. The reserved space lets the next compression pass send the previous summary plus new messages to the summary model; if the previous summary can occupy the whole input window, repeated compression cannot make meaningful progress. Invalid valve settings raise a configuration error instead of being silently lowered.
Migration Notes
This release changes the summary database schema:
- Older count-only
chat_summarytables are dropped and recreated; those old summaries are discarded and regenerated on future compression. - If schema inspection fails, the plugin leaves existing tables untouched and disables summary persistence instead of running destructive DDL.
Update or reinstall the filter so OpenWebUI's stored function content includes the new schema logic and validation changes.
Version Changes
Plugin Updates
- Async Context Compression: v1.6.5 → v1.7.0 | 📖 README
New Contributors
📚 Documentation Portal
🐛 Report Issues
Full Changelog: async-context-compression-v1.6.5...async-context-compression-v1.7.0