Skip to content

feat: implement agent caches and fix invalid prompt cache configs#1339

Merged
MODSetter merged 1 commit intomainfrom
dev
May 3, 2026
Merged

feat: implement agent caches and fix invalid prompt cache configs#1339
MODSetter merged 1 commit intomainfrom
dev

Conversation

@MODSetter
Copy link
Copy Markdown
Owner

@MODSetter MODSetter commented May 3, 2026

  • Added a new function _warm_agent_jit_caches to pre-warm agent caches at startup, reducing cold invocation costs.
  • Updated the SurfSenseContextSchema to include per-invocation fields for better state management during agent execution.
  • Introduced caching mechanisms in various tools to ensure fresh database sessions are used, improving performance and reliability.
  • Enhanced middleware to support new context features and improve error handling during connector and document type discovery.

Description

Motivation and Context

FIX #

Screenshots

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR implements a comprehensive performance optimization for the agent system through multi-phase caching and session management improvements. The core changes include: introducing a TTL-LRU compiled-agent cache to reuse graph instances across turns (reducing cold invocation from 4-5s to <50µs on cache hits), refactoring all connector tools to use per-call database sessions instead of cached closures to enable safe cache sharing, implementing a connector discovery TTL cache to reduce repeated database queries, fixing Anthropic's 4-cache-control-block limit by flattening multi-block system messages, switching prompt cache injection from role: system to index: 0 to avoid overflow, parallelizing agent build with LLM preflight checks, adding JIT warmup at startup to pre-pay compilation costs, and converting SurfSenseContextSchema to a dataclass for better runtime context management. These changes collectively improve both cold-start and warm-path performance while maintaining backward compatibility through feature flags.

⏱️ Estimated Review Time: 3+ hours

💡 Review Order Suggestion
Order File Path
1 .env.example
2 app/agents/new_chat/feature_flags.py
3 app/agents/new_chat/context.py
4 app/agents/new_chat/agent_cache.py
5 app/agents/new_chat/middleware/flatten_system.py
6 app/agents/new_chat/prompt_caching.py
7 app/services/connector_service.py
8 app/agents/new_chat/tools/registry.py
9 app/agents/new_chat/chat_deepagent.py
10 app/agents/new_chat/middleware/knowledge_search.py
11 app/agents/new_chat/tools/search_surfsense_docs.py
12 app/agents/new_chat/tools/update_memory.py
13 app/agents/new_chat/tools/connected_accounts.py
14 app/agents/new_chat/tools/notion/create_page.py
15 app/agents/new_chat/tools/notion/update_page.py
16 app/agents/new_chat/tools/notion/delete_page.py
17 app/agents/new_chat/tools/confluence/create_page.py
18 app/agents/new_chat/tools/confluence/update_page.py
19 app/agents/new_chat/tools/confluence/delete_page.py
20 app/agents/new_chat/tools/gmail/create_draft.py
21 app/agents/new_chat/tools/gmail/send_email.py
22 app/agents/new_chat/tools/gmail/trash_email.py
23 app/agents/new_chat/tools/gmail/update_draft.py
24 app/agents/new_chat/tools/gmail/read_email.py
25 app/agents/new_chat/tools/gmail/search_emails.py
26 app/agents/new_chat/tools/google_drive/create_file.py
27 app/agents/new_chat/tools/google_drive/trash_file.py
28 app/agents/new_chat/tools/dropbox/create_file.py
29 app/agents/new_chat/tools/dropbox/trash_file.py
30 app/agents/new_chat/tools/onedrive/create_file.py
31 app/agents/new_chat/tools/onedrive/trash_file.py
32 app/agents/new_chat/tools/google_calendar/create_event.py
33 app/agents/new_chat/tools/google_calendar/update_event.py
34 app/agents/new_chat/tools/google_calendar/delete_event.py
35 app/agents/new_chat/tools/google_calendar/search_events.py
36 app/agents/new_chat/tools/jira/create_issue.py
37 app/agents/new_chat/tools/jira/update_issue.py
38 app/agents/new_chat/tools/jira/delete_issue.py
39 app/agents/new_chat/tools/linear/create_issue.py
40 app/agents/new_chat/tools/linear/update_issue.py
41 app/agents/new_chat/tools/linear/delete_issue.py
42 app/agents/new_chat/tools/discord/list_channels.py
43 app/agents/new_chat/tools/discord/read_messages.py
44 app/agents/new_chat/tools/discord/send_message.py
45 app/agents/new_chat/tools/teams/list_channels.py
46 app/agents/new_chat/tools/teams/read_messages.py
47 app/agents/new_chat/tools/teams/send_message.py
48 app/agents/new_chat/tools/luma/create_event.py
49 app/agents/new_chat/tools/luma/list_events.py
50 app/agents/new_chat/tools/luma/read_event.py
51 app/agents/new_chat/middleware/__init__.py
52 app/tasks/chat/stream_new_chat.py
53 app/app.py
54 tests/unit/agents/new_chat/test_agent_cache.py
55 tests/unit/agents/new_chat/test_feature_flags.py
56 tests/unit/agents/new_chat/test_flatten_system.py
57 tests/unit/agents/new_chat/test_prompt_caching.py
58 tests/unit/middleware/test_knowledge_search.py
59 tests/unit/test_stream_new_chat_contract.py
60 surfsense_web/components/pricing/pricing-section.tsx
⚠️ Inconsistent Changes Detected
File Path Warning
surfsense_web/components/pricing/pricing-section.tsx Minor whitespace formatting change in a frontend pricing component appears unrelated to the backend agent caching and performance optimization focus of this PR

Need help? Join our Discord

Summary by CodeRabbit

  • New Features

    • Added agent caching with configurable TTL and size limits via environment variables for improved performance.
    • Added connector discovery caching to reduce database queries.
    • Introduced per-turn mentioned documents tracking for enhanced context awareness.
  • Improvements

    • Concurrent tool building for faster agent initialization.
    • Agent startup warmup routine for better first-request performance.
    • Updated prompt caching strategy for improved compatibility.
  • Bug Fixes

    • Fixed pricing section UI text formatting.

- Added a new function `_warm_agent_jit_caches` to pre-warm agent caches at startup, reducing cold invocation costs.
- Updated the `SurfSenseContextSchema` to include per-invocation fields for better state management during agent execution.
- Introduced caching mechanisms in various tools to ensure fresh database sessions are used, improving performance and reliability.
- Enhanced middleware to support new context features and improve error handling during connector and document type discovery.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
surf-sense-frontend Building Building Preview, Comment May 3, 2026 1:04pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e2b925c7-65b8-40cf-90b4-e4f367bdbe63

📥 Commits

Reviewing files that changed from the base of the PR and between 90a653c and a34f1fb.

📒 Files selected for processing (60)
  • surfsense_backend/.env.example
  • surfsense_backend/app/agents/new_chat/agent_cache.py
  • surfsense_backend/app/agents/new_chat/chat_deepagent.py
  • surfsense_backend/app/agents/new_chat/context.py
  • surfsense_backend/app/agents/new_chat/feature_flags.py
  • surfsense_backend/app/agents/new_chat/middleware/__init__.py
  • surfsense_backend/app/agents/new_chat/middleware/flatten_system.py
  • surfsense_backend/app/agents/new_chat/middleware/knowledge_search.py
  • surfsense_backend/app/agents/new_chat/prompt_caching.py
  • surfsense_backend/app/agents/new_chat/tools/confluence/create_page.py
  • surfsense_backend/app/agents/new_chat/tools/confluence/delete_page.py
  • surfsense_backend/app/agents/new_chat/tools/confluence/update_page.py
  • surfsense_backend/app/agents/new_chat/tools/connected_accounts.py
  • surfsense_backend/app/agents/new_chat/tools/discord/list_channels.py
  • surfsense_backend/app/agents/new_chat/tools/discord/read_messages.py
  • surfsense_backend/app/agents/new_chat/tools/discord/send_message.py
  • surfsense_backend/app/agents/new_chat/tools/dropbox/create_file.py
  • surfsense_backend/app/agents/new_chat/tools/dropbox/trash_file.py
  • surfsense_backend/app/agents/new_chat/tools/gmail/create_draft.py
  • surfsense_backend/app/agents/new_chat/tools/gmail/read_email.py
  • surfsense_backend/app/agents/new_chat/tools/gmail/search_emails.py
  • surfsense_backend/app/agents/new_chat/tools/gmail/send_email.py
  • surfsense_backend/app/agents/new_chat/tools/gmail/trash_email.py
  • surfsense_backend/app/agents/new_chat/tools/gmail/update_draft.py
  • surfsense_backend/app/agents/new_chat/tools/google_calendar/create_event.py
  • surfsense_backend/app/agents/new_chat/tools/google_calendar/delete_event.py
  • surfsense_backend/app/agents/new_chat/tools/google_calendar/search_events.py
  • surfsense_backend/app/agents/new_chat/tools/google_calendar/update_event.py
  • surfsense_backend/app/agents/new_chat/tools/google_drive/create_file.py
  • surfsense_backend/app/agents/new_chat/tools/google_drive/trash_file.py
  • surfsense_backend/app/agents/new_chat/tools/jira/create_issue.py
  • surfsense_backend/app/agents/new_chat/tools/jira/delete_issue.py
  • surfsense_backend/app/agents/new_chat/tools/jira/update_issue.py
  • surfsense_backend/app/agents/new_chat/tools/linear/create_issue.py
  • surfsense_backend/app/agents/new_chat/tools/linear/delete_issue.py
  • surfsense_backend/app/agents/new_chat/tools/linear/update_issue.py
  • surfsense_backend/app/agents/new_chat/tools/luma/create_event.py
  • surfsense_backend/app/agents/new_chat/tools/luma/list_events.py
  • surfsense_backend/app/agents/new_chat/tools/luma/read_event.py
  • surfsense_backend/app/agents/new_chat/tools/notion/create_page.py
  • surfsense_backend/app/agents/new_chat/tools/notion/delete_page.py
  • surfsense_backend/app/agents/new_chat/tools/notion/update_page.py
  • surfsense_backend/app/agents/new_chat/tools/onedrive/create_file.py
  • surfsense_backend/app/agents/new_chat/tools/onedrive/trash_file.py
  • surfsense_backend/app/agents/new_chat/tools/registry.py
  • surfsense_backend/app/agents/new_chat/tools/search_surfsense_docs.py
  • surfsense_backend/app/agents/new_chat/tools/teams/list_channels.py
  • surfsense_backend/app/agents/new_chat/tools/teams/read_messages.py
  • surfsense_backend/app/agents/new_chat/tools/teams/send_message.py
  • surfsense_backend/app/agents/new_chat/tools/update_memory.py
  • surfsense_backend/app/app.py
  • surfsense_backend/app/services/connector_service.py
  • surfsense_backend/app/tasks/chat/stream_new_chat.py
  • surfsense_backend/tests/unit/agents/new_chat/test_agent_cache.py
  • surfsense_backend/tests/unit/agents/new_chat/test_feature_flags.py
  • surfsense_backend/tests/unit/agents/new_chat/test_flatten_system.py
  • surfsense_backend/tests/unit/agents/new_chat/test_prompt_caching.py
  • surfsense_backend/tests/unit/middleware/test_knowledge_search.py
  • surfsense_backend/tests/unit/test_stream_new_chat_contract.py
  • surfsense_web/components/pricing/pricing-section.tsx

📝 Walkthrough

Walkthrough

This PR introduces compiled-agent caching with per-call session management across tools to enable safe graph reuse, adds system-message flattening middleware for provider compatibility, refactors connector discovery caching, and implements startup JIT warmup for LangChain schema compilation.

Changes

Agent Caching & Context Refactoring

Layer / File(s) Summary
Configuration & Primitives
.env.example, surfsense_backend/app/agents/new_chat/agent_cache.py, surfsense_backend/app/agents/new_chat/feature_flags.py
Environment variables for cache sizing/TTL; new AgentFeatureFlags.enable_agent_cache and enable_agent_cache_share_gp_subagent with env-var binding; core agent_cache.py module introduces stable_hash, signature functions, and _AgentCache with TTL-LRU + per-key in-flight locking.
Runtime Context
surfsense_backend/app/agents/new_chat/context.py
SurfSenseContextSchema converted from TypedDict to @dataclass with optional/nullable fields; added mentioned_document_ids: list[int] field for per-turn document mention tracking.
Compiled Agent Cache Integration
surfsense_backend/app/agents/new_chat/chat_deepagent.py, surfsense_backend/app/agents/new_chat/tools/registry.py
Deep agent builder now computes stable_hash cache key and retrieves/builds via get_cache().get_or_build(...) when enable_agent_cache is on; refactored connector/document discovery into separate async lookups; tool registry parallelize built-in + MCP loading.
Middleware Layer
surfsense_backend/app/agents/new_chat/middleware/__init__.py, surfsense_backend/app/agents/new_chat/middleware/flatten_system.py, surfsense_backend/app/agents/new_chat/middleware/knowledge_search.py, surfsense_backend/app/agents/new_chat/prompt_caching.py
New FlattenSystemMessageMiddleware collapses multi-block system messages; inserted before model call in deepagent stack; KnowledgePriorityMiddleware now reads mentioned_document_ids from runtime.context; prompt caching switched from role: system to index: 0 injection.
Per-Call Session Management
surfsense_backend/app/agents/new_chat/tools/confluence/*, surfsense_backend/app/agents/new_chat/tools/discord/*, surfsense_backend/app/agents/new_chat/tools/dropbox/*, surfsense_backend/app/agents/new_chat/tools/gmail/*, surfsense_backend/app/agents/new_chat/tools/google_calendar/*, surfsense_backend/app/agents/new_chat/tools/google_drive/*, surfsense_backend/app/agents/new_chat/tools/jira/*, surfsense_backend/app/agents/new_chat/tools/linear/*, surfsense_backend/app/agents/new_chat/tools/luma/*, surfsense_backend/app/agents/new_chat/tools/notion/*, surfsense_backend/app/agents/new_chat/tools/onedrive/*, surfsense_backend/app/agents/new_chat/tools/teams/*, surfsense_backend/app/agents/new_chat/tools/connected_accounts.py, surfsense_backend/app/agents/new_chat/tools/search_surfsense_docs.py, surfsense_backend/app/agents/new_chat/tools/update_memory.py
All tool factories now discard passed db_session and open fresh AsyncSession per invocation via async_session_maker; configuration validation requires only search_space_id/user_id (not db_session); enables safe agent graph caching across requests.
Service Caching
surfsense_backend/app/services/connector_service.py
Added TTL caching (default 30s) for get_available_connectors and get_available_document_types per search_space_id; SQLAlchemy ORM event listeners auto-invalidate on SearchSourceConnector/Document mutations.
App Startup & Stream
surfsense_backend/app/app.py, surfsense_backend/app/tasks/chat/stream_new_chat.py
New _warm_agent_jit_caches() routine precompiles LangChain schemas with bounded timeout during startup (Phase 1.7); stream_new_chat/stream_resume_chat now pass SurfSenseContextSchema instance into agent.astream_events for per-invocation context; added parallel preflight + speculative agent build with fallback settling.
Tests & Documentation
surfsense_backend/tests/unit/agents/new_chat/test_agent_cache.py, surfsense_backend/tests/unit/agents/new_chat/test_flatten_system.py, surfsense_backend/tests/unit/agents/new_chat/test_prompt_caching.py, surfsense_backend/tests/unit/agents/new_chat/test_feature_flags.py, surfsense_backend/tests/unit/middleware/test_knowledge_search.py, surfsense_backend/tests/unit/test_stream_new_chat_contract.py
New unit tests for cache primitives (determinism, hit/miss/TTL/LRU/concurrency/invalidation behavior); middleware integration tests for system-message flattening and idempotency; mention-draining semantics; speculative build settling; feature flag defaults.

UI Cleanup

Layer / File(s) Summary
Text Formatting
surfsense_web/components/pricing/pricing-section.tsx
Removed line-number artifacts from FAQ heading and reformatted paragraph text for readability.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant StreamNewChat
    participant DeepAgent as Deep Agent<br/>(Compiled)
    participant AgentCache
    participant Tools
    participant DB as Per-Call<br/>DB Session

    Client->>StreamNewChat: stream_new_chat(request)
    
    Note over StreamNewChat: Phase 1: Build context & check model
    StreamNewChat->>StreamNewChat: Create SurfSenseContextSchema<br/>(mentioned_document_ids, turn_id, etc.)
    
    Note over StreamNewChat: Phase 2: Parallel preflight & speculative agent
    par Preflight LLM
        StreamNewChat->>StreamNewChat: Preflight ping (concurrent)
    and Speculative Build
        StreamNewChat->>DeepAgent: Speculative create_surfsense_deep_agent()
        DeepAgent->>AgentCache: Compute stable_hash(config, flags, tools, ...)
        DeepAgent->>AgentCache: get_or_build(cache_key, builder)
        alt Cache Hit
            AgentCache->>DeepAgent: Return cached compiled graph
        else Cache Miss
            AgentCache->>DeepAgent: Run builder in asyncio.to_thread
            DeepAgent->>DeepAgent: Compile middleware stack<br/>(+ FlattenSystemMessageMiddleware)
            DeepAgent->>AgentCache: Store in cache with TTL
        end
    end
    
    Note over StreamNewChat: Phase 3: Stream agent events
    StreamNewChat->>DeepAgent: agent.astream_events(...,<br/>context=runtime_context)
    
    loop For each tool invocation
        DeepAgent->>Tools: Invoke tool(search_space_id, user_id, ...)
        Tools->>DB: async_session_maker() → new AsyncSession
        Tools->>DB: Query/mutate using per-call session
        DB->>Tools: Result
        Tools->>DeepAgent: Return
    end
    
    DeepAgent->>StreamNewChat: Agent events + final response
    StreamNewChat->>Client: Stream response chunks
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hops with glee through cache-lines clean,
Per-call sessions keep the state pristine,
System messages flatten with grace and care,
Agent graphs compiled once, reused everywhere!
Warmth at startup, locks held tight—
Concurrent dreams now compile just right! 🎉

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

@MODSetter MODSetter merged commit 3b84cf8 into main May 3, 2026
8 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant