Skip to content

Async Context Compression v1.7.0

Latest

Choose a tag to compare

@github-actions github-actions released this 26 Jun 00:03
01f3110

Overview

This patch release makes summary generation failures non-blocking by default so transient upstream errors no longer interrupt the active chat. It also adds an explicit operator valve to preserve the old hard-failure behavior when needed.

Bug Fixes

  • Graceful summary failure handling (Issue #74): Background summary LLM failures now default to a silent skip instead of re-raising into the active chat flow.
  • Chat continuity preserved during transient upstream errors: Short-lived upstream provider failures such as 502s now log the summary error and continue the current chat without saving a summary for that turn.

New Features

  • summary_fail_mode valve: Added a new valve with silent and raise modes so operators can choose between chat-friendly degradation and strict failure visibility.
  • Regression coverage for both modes: Added tests for the default silent path and the opt-in raise path.

Migration Notes

No breaking changes. Default behavior is now summary_fail_mode="silent". Set summary_fail_mode="raise" if you need the previous hard-failure behavior for debugging.

Overview

This patch release broadens summary-response parsing so the filter can accept both classic chat-completions payloads and Responses-style output payloads. It also improves empty-summary diagnostics without persisting reasoning-only fields.

Bug Fixes

  • Alternate summary payload support: _call_summary_llm() now accepts summary text from choices[].message.content, output_text content parts, and Responses-style output message items.
  • Stale choices-only gate removed: The summary call path no longer rejects valid provider payloads just because they omit choices.
  • Clearer empty-summary errors: When no final summary text is present, the filter now reports a compact response-shape summary instead of a misleading generic format error.

Behavior Notes

  • Reasoning-only output is ignored: reasoning_content, thinking, and reasoning output items are not treated as summary text, so private chain-of-thought is not written into chat memory.
  • No change to 1.6.3 fail-mode behavior: summary_fail_mode continues to control whether upstream summary-call errors are silent or raised.

Migration Notes

No breaking changes. If a provider returns only reasoning fields and no final answer text, the filter will skip saving a summary for that turn and log the response shape for debugging.

Overview

This release adds branch-aware summary storage and reuse. Cached summaries are now validated against ordered message references and payload fingerprints before they are injected, so summaries from sibling branches or edited history are rejected instead of being reused in the wrong conversation branch.

Because the persisted summary schema changed from count-only/current-summary storage to branch-aware rows, this release is versioned as 1.7.0.

New Features

  • Branch-aware summary reuse: Stored summaries carry message ids and fingerprints. The filter reuses only the newest summary that is valid for the current branch.
  • Single-table summary storage: Branch-aware rows now live directly in chat_summary; all branch summaries are stored in that table.
  • Safer schema upgrades: Count-only legacy chat_summary tables, or tables that still enforce one row per chat, are rebuilt so unsafe summaries are regenerated.

Branch Example

OpenWebUI chats can form a tree when a user edits or forks from an earlier message. Version 1.7.0 treats a summary as valid only for the branch whose ordered message ids and payload fingerprints it covers.

Example with a fork that does not align with a compression boundary:

  1. A chat first grows on the main branch: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9 -> 10.
  2. The first compression stores a branch-aware summary covering messages 1-5.
  3. More messages arrive on the same branch, and a later compression derives a new summary covering 1-10 by summarizing the previous 1-5 summary plus messages 6-10.
  4. The user then forks from message 7, which sits between the two compression boundaries, and creates a sibling branch: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8b -> 9b.
  5. On that sibling branch, the 1-10 summary is rejected because it contains live sibling refs 8-10 from the original branch. The filter can still reuse the nearest valid ancestor summary, 1-5, then keep 6 -> 7 -> 8b -> 9b as live tail context.
  6. When the sibling branch is compressed, it derives and stores a separate summary such as 1-9b from the old 1-5 summary plus the sibling branch's live tail. Both 1-10 and 1-9b remain available, but only the newest summary valid for the current branch is injected.

This is why the filter records exact message identity instead of only storing a compressed message count. A count-based summary for "10 messages" cannot tell whether those 10 messages belong to the current branch or a sibling branch.

Configuration Notes

max_summary_tokens must be strictly less than 80% of the summary model input window. The reserved space lets the next compression pass send the previous summary plus new messages to the summary model; if the previous summary can occupy the whole input window, repeated compression cannot make meaningful progress. Invalid valve settings raise a configuration error instead of being silently lowered.

Migration Notes

This release changes the summary database schema:

  • Older count-only chat_summary tables are dropped and recreated; those old summaries are discarded and regenerated on future compression.
  • If schema inspection fails, the plugin leaves existing tables untouched and disables summary persistence instead of running destructive DDL.

Update or reinstall the filter so OpenWebUI's stored function content includes the new schema logic and validation changes.

Version Changes

Plugin Updates

  • Async Context Compression: v1.6.5 → v1.7.0 | 📖 README

New Contributors


📚 Documentation Portal
🐛 Report Issues

Full Changelog: async-context-compression-v1.6.5...async-context-compression-v1.7.0