feat: implement agent caches and fix invalid prompt cache configs by MODSetter · Pull Request #1339 · MODSetter/SurfSense

MODSetter · 2026-05-03T13:04:13Z

Added a new function _warm_agent_jit_caches to pre-warm agent caches at startup, reducing cold invocation costs.
Updated the SurfSenseContextSchema to include per-invocation fields for better state management during agent execution.
Introduced caching mechanisms in various tools to ensure fresh database sessions are used, improving performance and reliability.
Enhanced middleware to support new context features and improve error handling during connector and document type discovery.

Description

Motivation and Context

FIX #

Screenshots

API Changes

This PR includes API changes

Change Type

Testing Performed

Tested locally
Manual/QA verification

Checklist

Follows project coding standards and conventions
Documentation updated as needed
Dependencies updated as needed
No lint/build errors or new warnings
All relevant tests are passing

High-level PR Summary

This PR implements a comprehensive performance optimization for the agent system through multi-phase caching and session management improvements. The core changes include: introducing a TTL-LRU compiled-agent cache to reuse graph instances across turns (reducing cold invocation from 4-5s to <50µs on cache hits), refactoring all connector tools to use per-call database sessions instead of cached closures to enable safe cache sharing, implementing a connector discovery TTL cache to reduce repeated database queries, fixing Anthropic's 4-cache-control-block limit by flattening multi-block system messages, switching prompt cache injection from role: system to index: 0 to avoid overflow, parallelizing agent build with LLM preflight checks, adding JIT warmup at startup to pre-pay compilation costs, and converting SurfSenseContextSchema to a dataclass for better runtime context management. These changes collectively improve both cold-start and warm-path performance while maintaining backward compatibility through feature flags.

⏱️ Estimated Review Time: 3+ hours

💡 Review Order Suggestion

Order	File Path
1	`.env.example`
2	`app/agents/new_chat/feature_flags.py`
3	`app/agents/new_chat/context.py`
4	`app/agents/new_chat/agent_cache.py`
5	`app/agents/new_chat/middleware/flatten_system.py`
6	`app/agents/new_chat/prompt_caching.py`
7	`app/services/connector_service.py`
8	`app/agents/new_chat/tools/registry.py`
9	`app/agents/new_chat/chat_deepagent.py`
10	`app/agents/new_chat/middleware/knowledge_search.py`
11	`app/agents/new_chat/tools/search_surfsense_docs.py`
12	`app/agents/new_chat/tools/update_memory.py`
13	`app/agents/new_chat/tools/connected_accounts.py`
14	`app/agents/new_chat/tools/notion/create_page.py`
15	`app/agents/new_chat/tools/notion/update_page.py`
16	`app/agents/new_chat/tools/notion/delete_page.py`
17	`app/agents/new_chat/tools/confluence/create_page.py`
18	`app/agents/new_chat/tools/confluence/update_page.py`
19	`app/agents/new_chat/tools/confluence/delete_page.py`
20	`app/agents/new_chat/tools/gmail/create_draft.py`
21	`app/agents/new_chat/tools/gmail/send_email.py`
22	`app/agents/new_chat/tools/gmail/trash_email.py`
23	`app/agents/new_chat/tools/gmail/update_draft.py`
24	`app/agents/new_chat/tools/gmail/read_email.py`
25	`app/agents/new_chat/tools/gmail/search_emails.py`
26	`app/agents/new_chat/tools/google_drive/create_file.py`
27	`app/agents/new_chat/tools/google_drive/trash_file.py`
28	`app/agents/new_chat/tools/dropbox/create_file.py`
29	`app/agents/new_chat/tools/dropbox/trash_file.py`
30	`app/agents/new_chat/tools/onedrive/create_file.py`
31	`app/agents/new_chat/tools/onedrive/trash_file.py`
32	`app/agents/new_chat/tools/google_calendar/create_event.py`
33	`app/agents/new_chat/tools/google_calendar/update_event.py`
34	`app/agents/new_chat/tools/google_calendar/delete_event.py`
35	`app/agents/new_chat/tools/google_calendar/search_events.py`
36	`app/agents/new_chat/tools/jira/create_issue.py`
37	`app/agents/new_chat/tools/jira/update_issue.py`
38	`app/agents/new_chat/tools/jira/delete_issue.py`
39	`app/agents/new_chat/tools/linear/create_issue.py`
40	`app/agents/new_chat/tools/linear/update_issue.py`
41	`app/agents/new_chat/tools/linear/delete_issue.py`
42	`app/agents/new_chat/tools/discord/list_channels.py`
43	`app/agents/new_chat/tools/discord/read_messages.py`
44	`app/agents/new_chat/tools/discord/send_message.py`
45	`app/agents/new_chat/tools/teams/list_channels.py`
46	`app/agents/new_chat/tools/teams/read_messages.py`
47	`app/agents/new_chat/tools/teams/send_message.py`
48	`app/agents/new_chat/tools/luma/create_event.py`
49	`app/agents/new_chat/tools/luma/list_events.py`
50	`app/agents/new_chat/tools/luma/read_event.py`
51	`app/agents/new_chat/middleware/__init__.py`
52	`app/tasks/chat/stream_new_chat.py`
53	`app/app.py`
54	`tests/unit/agents/new_chat/test_agent_cache.py`
55	`tests/unit/agents/new_chat/test_feature_flags.py`
56	`tests/unit/agents/new_chat/test_flatten_system.py`
57	`tests/unit/agents/new_chat/test_prompt_caching.py`
58	`tests/unit/middleware/test_knowledge_search.py`
59	`tests/unit/test_stream_new_chat_contract.py`
60	`surfsense_web/components/pricing/pricing-section.tsx`

⚠️ Inconsistent Changes Detected

File Path	Warning
`surfsense_web/components/pricing/pricing-section.tsx`	Minor whitespace formatting change in a frontend pricing component appears unrelated to the backend agent caching and performance optimization focus of this PR

Summary by CodeRabbit

New Features
- Added agent caching with configurable TTL and size limits via environment variables for improved performance.
- Added connector discovery caching to reduce database queries.
- Introduced per-turn mentioned documents tracking for enhanced context awareness.
Improvements
- Concurrent tool building for faster agent initialization.
- Agent startup warmup routine for better first-request performance.
- Updated prompt caching strategy for improved compatibility.
Bug Fixes
- Fixed pricing section UI text formatting.

- Added a new function `_warm_agent_jit_caches` to pre-warm agent caches at startup, reducing cold invocation costs. - Updated the `SurfSenseContextSchema` to include per-invocation fields for better state management during agent execution. - Introduced caching mechanisms in various tools to ensure fresh database sessions are used, improving performance and reliability. - Enhanced middleware to support new context features and improve error handling during connector and document type discovery.

vercel · 2026-05-03T13:04:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
surf-sense-frontend	Building	Preview, Comment	May 3, 2026 1:04pm

coderabbitai · 2026-05-03T13:04:33Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e2b925c7-65b8-40cf-90b4-e4f367bdbe63

📥 Commits

Reviewing files that changed from the base of the PR and between 90a653c and a34f1fb.

📒 Files selected for processing (60)

surfsense_backend/.env.example
surfsense_backend/app/agents/new_chat/agent_cache.py
surfsense_backend/app/agents/new_chat/chat_deepagent.py
surfsense_backend/app/agents/new_chat/context.py
surfsense_backend/app/agents/new_chat/feature_flags.py
surfsense_backend/app/agents/new_chat/middleware/__init__.py
surfsense_backend/app/agents/new_chat/middleware/flatten_system.py
surfsense_backend/app/agents/new_chat/middleware/knowledge_search.py
surfsense_backend/app/agents/new_chat/prompt_caching.py
surfsense_backend/app/agents/new_chat/tools/confluence/create_page.py
surfsense_backend/app/agents/new_chat/tools/confluence/delete_page.py
surfsense_backend/app/agents/new_chat/tools/confluence/update_page.py
surfsense_backend/app/agents/new_chat/tools/connected_accounts.py
surfsense_backend/app/agents/new_chat/tools/discord/list_channels.py
surfsense_backend/app/agents/new_chat/tools/discord/read_messages.py
surfsense_backend/app/agents/new_chat/tools/discord/send_message.py
surfsense_backend/app/agents/new_chat/tools/dropbox/create_file.py
surfsense_backend/app/agents/new_chat/tools/dropbox/trash_file.py
surfsense_backend/app/agents/new_chat/tools/gmail/create_draft.py
surfsense_backend/app/agents/new_chat/tools/gmail/read_email.py
surfsense_backend/app/agents/new_chat/tools/gmail/search_emails.py
surfsense_backend/app/agents/new_chat/tools/gmail/send_email.py
surfsense_backend/app/agents/new_chat/tools/gmail/trash_email.py
surfsense_backend/app/agents/new_chat/tools/gmail/update_draft.py
surfsense_backend/app/agents/new_chat/tools/google_calendar/create_event.py
surfsense_backend/app/agents/new_chat/tools/google_calendar/delete_event.py
surfsense_backend/app/agents/new_chat/tools/google_calendar/search_events.py
surfsense_backend/app/agents/new_chat/tools/google_calendar/update_event.py
surfsense_backend/app/agents/new_chat/tools/google_drive/create_file.py
surfsense_backend/app/agents/new_chat/tools/google_drive/trash_file.py
surfsense_backend/app/agents/new_chat/tools/jira/create_issue.py
surfsense_backend/app/agents/new_chat/tools/jira/delete_issue.py
surfsense_backend/app/agents/new_chat/tools/jira/update_issue.py
surfsense_backend/app/agents/new_chat/tools/linear/create_issue.py
surfsense_backend/app/agents/new_chat/tools/linear/delete_issue.py
surfsense_backend/app/agents/new_chat/tools/linear/update_issue.py
surfsense_backend/app/agents/new_chat/tools/luma/create_event.py
surfsense_backend/app/agents/new_chat/tools/luma/list_events.py
surfsense_backend/app/agents/new_chat/tools/luma/read_event.py
surfsense_backend/app/agents/new_chat/tools/notion/create_page.py
surfsense_backend/app/agents/new_chat/tools/notion/delete_page.py
surfsense_backend/app/agents/new_chat/tools/notion/update_page.py
surfsense_backend/app/agents/new_chat/tools/onedrive/create_file.py
surfsense_backend/app/agents/new_chat/tools/onedrive/trash_file.py
surfsense_backend/app/agents/new_chat/tools/registry.py
surfsense_backend/app/agents/new_chat/tools/search_surfsense_docs.py
surfsense_backend/app/agents/new_chat/tools/teams/list_channels.py
surfsense_backend/app/agents/new_chat/tools/teams/read_messages.py
surfsense_backend/app/agents/new_chat/tools/teams/send_message.py
surfsense_backend/app/agents/new_chat/tools/update_memory.py
surfsense_backend/app/app.py
surfsense_backend/app/services/connector_service.py
surfsense_backend/app/tasks/chat/stream_new_chat.py
surfsense_backend/tests/unit/agents/new_chat/test_agent_cache.py
surfsense_backend/tests/unit/agents/new_chat/test_feature_flags.py
surfsense_backend/tests/unit/agents/new_chat/test_flatten_system.py
surfsense_backend/tests/unit/agents/new_chat/test_prompt_caching.py
surfsense_backend/tests/unit/middleware/test_knowledge_search.py
surfsense_backend/tests/unit/test_stream_new_chat_contract.py
surfsense_web/components/pricing/pricing-section.tsx

📝 Walkthrough

Walkthrough

This PR introduces compiled-agent caching with per-call session management across tools to enable safe graph reuse, adds system-message flattening middleware for provider compatibility, refactors connector discovery caching, and implements startup JIT warmup for LangChain schema compilation.

Changes

Agent Caching & Context Refactoring

Layer / File(s)	Summary
Configuration & Primitives `.env.example`, `surfsense_backend/app/agents/new_chat/agent_cache.py`, `surfsense_backend/app/agents/new_chat/feature_flags.py`	Environment variables for cache sizing/TTL; new `AgentFeatureFlags.enable_agent_cache` and `enable_agent_cache_share_gp_subagent` with env-var binding; core `agent_cache.py` module introduces `stable_hash`, signature functions, and `_AgentCache` with TTL-LRU + per-key in-flight locking.
Runtime Context `surfsense_backend/app/agents/new_chat/context.py`	`SurfSenseContextSchema` converted from `TypedDict` to `@dataclass` with optional/nullable fields; added `mentioned_document_ids: list[int]` field for per-turn document mention tracking.
Compiled Agent Cache Integration `surfsense_backend/app/agents/new_chat/chat_deepagent.py`, `surfsense_backend/app/agents/new_chat/tools/registry.py`	Deep agent builder now computes `stable_hash` cache key and retrieves/builds via `get_cache().get_or_build(...)` when `enable_agent_cache` is on; refactored connector/document discovery into separate async lookups; tool registry parallelize built-in + MCP loading.
Middleware Layer `surfsense_backend/app/agents/new_chat/middleware/__init__.py`, `surfsense_backend/app/agents/new_chat/middleware/flatten_system.py`, `surfsense_backend/app/agents/new_chat/middleware/knowledge_search.py`, `surfsense_backend/app/agents/new_chat/prompt_caching.py`	New `FlattenSystemMessageMiddleware` collapses multi-block system messages; inserted before model call in deepagent stack; `KnowledgePriorityMiddleware` now reads `mentioned_document_ids` from `runtime.context`; prompt caching switched from `role: system` to `index: 0` injection.
Per-Call Session Management `surfsense_backend/app/agents/new_chat/tools/confluence/`, `surfsense_backend/app/agents/new_chat/tools/discord/`, `surfsense_backend/app/agents/new_chat/tools/dropbox/`, `surfsense_backend/app/agents/new_chat/tools/gmail/`, `surfsense_backend/app/agents/new_chat/tools/google_calendar/`, `surfsense_backend/app/agents/new_chat/tools/google_drive/`, `surfsense_backend/app/agents/new_chat/tools/jira/`, `surfsense_backend/app/agents/new_chat/tools/linear/`, `surfsense_backend/app/agents/new_chat/tools/luma/`, `surfsense_backend/app/agents/new_chat/tools/notion/`, `surfsense_backend/app/agents/new_chat/tools/onedrive/`, `surfsense_backend/app/agents/new_chat/tools/teams/`, `surfsense_backend/app/agents/new_chat/tools/connected_accounts.py`, `surfsense_backend/app/agents/new_chat/tools/search_surfsense_docs.py`, `surfsense_backend/app/agents/new_chat/tools/update_memory.py`	All tool factories now discard passed `db_session` and open fresh `AsyncSession` per invocation via `async_session_maker`; configuration validation requires only `search_space_id`/`user_id` (not `db_session`); enables safe agent graph caching across requests.
Service Caching `surfsense_backend/app/services/connector_service.py`	Added TTL caching (default 30s) for `get_available_connectors` and `get_available_document_types` per `search_space_id`; SQLAlchemy ORM event listeners auto-invalidate on `SearchSourceConnector`/`Document` mutations.
App Startup & Stream `surfsense_backend/app/app.py`, `surfsense_backend/app/tasks/chat/stream_new_chat.py`	New `_warm_agent_jit_caches()` routine precompiles LangChain schemas with bounded timeout during startup (Phase 1.7); `stream_new_chat`/`stream_resume_chat` now pass `SurfSenseContextSchema` instance into `agent.astream_events` for per-invocation context; added parallel preflight + speculative agent build with fallback settling.
Tests & Documentation `surfsense_backend/tests/unit/agents/new_chat/test_agent_cache.py`, `surfsense_backend/tests/unit/agents/new_chat/test_flatten_system.py`, `surfsense_backend/tests/unit/agents/new_chat/test_prompt_caching.py`, `surfsense_backend/tests/unit/agents/new_chat/test_feature_flags.py`, `surfsense_backend/tests/unit/middleware/test_knowledge_search.py`, `surfsense_backend/tests/unit/test_stream_new_chat_contract.py`	New unit tests for cache primitives (determinism, hit/miss/TTL/LRU/concurrency/invalidation behavior); middleware integration tests for system-message flattening and idempotency; mention-draining semantics; speculative build settling; feature flag defaults.

UI Cleanup

Layer / File(s)	Summary
Text Formatting `surfsense_web/components/pricing/pricing-section.tsx`	Removed line-number artifacts from FAQ heading and reformatted paragraph text for readability.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant StreamNewChat
    participant DeepAgent as Deep Agent<br/>(Compiled)
    participant AgentCache
    participant Tools
    participant DB as Per-Call<br/>DB Session

    Client->>StreamNewChat: stream_new_chat(request)
    
    Note over StreamNewChat: Phase 1: Build context & check model
    StreamNewChat->>StreamNewChat: Create SurfSenseContextSchema<br/>(mentioned_document_ids, turn_id, etc.)
    
    Note over StreamNewChat: Phase 2: Parallel preflight & speculative agent
    par Preflight LLM
        StreamNewChat->>StreamNewChat: Preflight ping (concurrent)
    and Speculative Build
        StreamNewChat->>DeepAgent: Speculative create_surfsense_deep_agent()
        DeepAgent->>AgentCache: Compute stable_hash(config, flags, tools, ...)
        DeepAgent->>AgentCache: get_or_build(cache_key, builder)
        alt Cache Hit
            AgentCache->>DeepAgent: Return cached compiled graph
        else Cache Miss
            AgentCache->>DeepAgent: Run builder in asyncio.to_thread
            DeepAgent->>DeepAgent: Compile middleware stack<br/>(+ FlattenSystemMessageMiddleware)
            DeepAgent->>AgentCache: Store in cache with TTL
        end
    end
    
    Note over StreamNewChat: Phase 3: Stream agent events
    StreamNewChat->>DeepAgent: agent.astream_events(...,<br/>context=runtime_context)
    
    loop For each tool invocation
        DeepAgent->>Tools: Invoke tool(search_space_id, user_id, ...)
        Tools->>DB: async_session_maker() → new AsyncSession
        Tools->>DB: Query/mutate using per-call session
        DB->>Tools: Result
        Tools->>DeepAgent: Return
    end
    
    DeepAgent->>StreamNewChat: Agent events + final response
    StreamNewChat->>Client: Stream response chunks

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[Feature] Add Google calendar connector #246: Modifies connector_service.py to add Google Calendar/Gmail search methods; shares code context with this PR's connector discovery caching layer.
[Feature] Add Gmail connector #257: Extends connector_service.py with new Gmail search functionality; overlaps with connector handling code updated in this PR.
feat: Introduce RAPTOR Search Mode #90: Adds search-mode and document retriever support to ConnectorService; related to the connector/document discovery caching added here.

Poem

🐰 Hops with glee through cache-lines clean,
Per-call sessions keep the state pristine,
System messages flatten with grace and care,
Agent graphs compiled once, reused everywhere!
Warmth at startup, locks held tight—
Concurrent dreams now compile just right! 🎉

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

MODSetter merged commit 3b84cf8 into main May 3, 2026
8 of 15 checks passed

vercel Bot deployed to Preview May 3, 2026 13:05 View deployment

coderabbitai Bot mentioned this pull request May 4, 2026

feat: moved chat persistance to Server Side #1341

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement agent caches and fix invalid prompt cache configs#1339

feat: implement agent caches and fix invalid prompt cache configs#1339
MODSetter merged 1 commit intomainfrom
dev

MODSetter commented May 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 3, 2026

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MODSetter commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Screenshots

API Changes

Change Type

Testing Performed

Checklist

High-level PR Summary

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 3, 2026

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MODSetter commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading