Skip to content

refactor(core): ♻️ Decompose agent runtime modules#136

Merged
jorben merged 12 commits into
masterfrom
refact/arch-optimization-2604
Apr 28, 2026
Merged

refactor(core): ♻️ Decompose agent runtime modules#136
jorben merged 12 commits into
masterfrom
refact/arch-optimization-2604

Conversation

@HayWolf
Copy link
Copy Markdown
Contributor

@HayWolf HayWolf commented Apr 25, 2026

Summary

  • Split large Rust runtime and extension modules into focused companion files while preserving facade-style public APIs.
  • Extract reusable workbench UI helpers/components from runtime-thread-surface.tsx and dashboard-workbench.tsx.
  • Update agent-facing project structure guidance in AGENTS.md.

Test Plan

  • cargo fmt --manifest-path src-tauri/Cargo.toml --check
  • cargo test --manifest-path src-tauri/Cargo.toml — 499 unit + 197 integration tests passed, matching the Phase 0 baseline.
  • npm run typecheck
  • npm run test:unit — 20 files / 213 tests passed.
  • npm run build:web — passed with existing Vite chunk-size warnings.

🤖 Generated with TiyCode

…implify workbench UI

Break down the monolithic agent run manager and session logic into smaller,
purpose-built modules and extract workbench UI helpers to improve
maintainability and readability.

Core Rust changes:
- Move runtime event handling into `agent_run_event_handler.rs`
- Extract run summary and title generation into `agent_run_summary.rs`
  and `agent_run_title.rs`
- Split agent session logic into `agent_session_compression.rs`,
  `agent_session_events.rs`, `agent_session_history.rs`,
  `agent_session_tools.rs`, and `agent_session_types.rs`
- Make `ActiveRun` and `AgentRunManager` fields public for better
  modularity
- Re-export new submodules from `agent_run_manager.rs` and update
  `mod.rs`
- Remove obsolete plan approval and runtime event loop code from
  `agent_run_manager.rs`
- Update `AGENTS.md` to reflect the new module organization

Extensions and UI changes:
- Add extension host modules: `config_io.rs`, `marketplace.rs`,
  `mcp.rs`, `plugins.rs`, and `skills.rs`
- Refactor extensions facade in `extensions/mod.rs`
- Extract workbench UI helpers into dedicated files:
  `dashboard-terminal-orchestrator.tsx`, `long-message-body.tsx`,
  `runtime-thread-surface-diff.tsx`, and
  `runtime-thread-surface-helpers.ts`
- Simplify `dashboard-workbench.tsx` by moving logic into orchestrators
- Minor cleanup in `task_item_repo.rs`
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 25, 2026

AI Code Review Summary

PR: #136 (refactor(core): ♻️ Decompose agent runtime modules)
Preferred language: English

Overall Assessment

Detected 22 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

  • CRITICAL (1)
    • src-tauri/src/core/agent_run_event_handler.rs:102 - agent_run_event_handler.rs has zero test coverage for the main event handler
  • HIGH (8)
    • src-tauri/src/core/agent_run_compaction.rs:118 - compact_thread_context and run_compact_background have no tests
    • src-tauri/src/core/agent_run_event_handler.rs:501 - finish_run reasoning discard logic has no test coverage
    • src-tauri/src/core/agent_session_execution.rs:62 - execute_tool_call dispatch has no test coverage for any branch
    • src-tauri/src/extensions/plugins.rs:868 - execute_hook path traversal prevention is untested
    • src-tauri/src/extensions/plugins.rs:920 - execute_plugin_tool and execute_command_ have no test coverage
    • src-tauri/src/extensions/plugins.rs:953 - Plugin tool commands execute arbitrary executables without path confinement
    • src/modules/workbench-shell/ui/runtime-thread-surface-metadata.ts:96 - No unit tests for metadata parsing functions in runtime-thread-surface-metadata.ts
    • src/modules/workbench-shell/ui/runtime-thread-surface-state.ts:484 - No unit tests added for newly-extracted pure functions in runtime-thread-surface-state.ts
  • MEDIUM (9)
    • src-tauri/src/extensions/marketplace.rs:275 - SSRF via unvalidated git clone URL in sync_marketplace_source_repo
    • src-tauri/src/extensions/marketplace.rs:311 - N+1 redundant DB query in marketplace_list_items via marketplace_items_for_source
    • src-tauri/src/extensions/marketplace.rs:554 - DefaultHasher used for persistent marketplace source IDs is not cross-version stable
    • src-tauri/src/extensions/marketplace.rs:204 - marketplace_install_item does full list scan to find a single item
    • src-tauri/src/extensions/plugins.rs:1480 - substitute_variables has no test coverage
    • src-tauri/src/extensions/plugins.rs:1488 - mask_sensitive_value and mask_url have no test coverage
    • src/modules/workbench-shell/ui/dashboard-overlays.tsx:234 - Same indentation issue in DashboardOverlays
    • src/modules/workbench-shell/ui/dashboard-sidebar.tsx:120 - Extracted JSX retains parent-level indentation, hurting readability
  • LOW (4)
    • src-tauri/Cargo.toml:57 - RC version of tiycore used as a production dependency
    • src-tauri/Cargo.toml:57 - Pre-release tiycore dependency introduces supply-chain risk
    • src-tauri/src/extensions/plugins.rs:1030 - Stdin write error silently ignored in execute_command_
    • src/modules/workbench-shell/ui/runtime-thread-surface-state.ts:123 - Double blank lines in runtime-thread-surface-state.ts

Actionable Suggestions

  • Add tool_call_repo::update_result calls in all result branches of execute_tool_call (Denied, EscalationRequired, Cancelled, Err, Executed) to prevent stale 'requested' records
  • Replace let _ = stdin.write_all(...) with error logging or propagation in execute_command_
  • Use dunce::canonicalize consistently in execute_hook to match the plugin path resolution in load_plugin_from_dir
  • Add a tracking issue or TODO comment for the tiycore RC version and plan to upgrade to stable before release
  • Consider documenting the plugin trust model (trusted vs untrusted sources) and adding permission-level validation for tool.command values
  • Add path confinement for plugin tool commands similar to the hook path-traversal check already in execute_hook
  • Validate MCP server URLs against private/internal IP ranges before auto-starting plugin-managed MCP servers
  • Canonicalize and validate plugin tool CWD to ensure it stays within the workspace or plugin directory
  • Add an environment variable denylist (PATH, LD_PRELOAD, DYLD_INSERT_LIBRARIES, etc.) for plugin tool env maps
  • Pin tiycore to a stable release before production deployment
  • Consider requiring explicit user approval before installing plugins or enabling plugin-managed MCP servers
  • Add unit tests for the 4 free functions in agent_run_event_handler.rs (build_orphaned_run_terminal_event, terminal_event_status, is_terminal_runtime_event, should_complete_reasoning_for_event) — these require no mocking

Potential Risks

  • Stale tool_call records accumulating in 'requested' status could degrade DB query performance over time and mislead the UI
  • RC tiycore dependency could introduce breaking changes or runtime bugs in agent sessions
  • Plugin tool commands with no sandboxing could be exploited if plugins are loaded from untrusted marketplace sources
  • Silent stdin write failures could cause intermittent plugin/hook failures that are hard to diagnose
  • A malicious plugin can execute arbitrary system commands via tool.command
  • A malicious plugin can probe internal network services via MCP URL SSRF
  • A malicious plugin can subvert spawned process execution via env var injection (LD_PRELOAD, PATH override)
  • A malicious plugin can set arbitrary working directories for spawned commands
  • Pre-release tiycore dependency may introduce unreviewed security changes
  • State machine regressions in agent_run_event_handler.rs will only be caught in manual end-to-end testing
  • Path traversal bypass in execute_hook could allow arbitrary command execution
  • Compaction lifecycle bugs could leave threads stuck in Running state

Test Suggestions

  • Add integration test verifying tool_call_repo status is updated to 'failed' for all error paths in execute_tool_call
  • Add Windows-specific test for hook path traversal check with dunce vs std::fs::canonicalize
  • Add test for execute_command_ with stdin write failure (mock broken pipe)
  • Add test for compact_thread_context concurrent with an active run (race condition)
  • Verify all new modules compile and pass cargo test with the RC tiycore version
  • Write security tests verifying hook path-traversal prevention works with symlinks and ../ sequences
  • Add tests that plugin tool commands cannot specify executables outside expected directories
  • Add tests that MCP URLs pointing to 127.0.0.1, 169.254.169.254, 10.x.x.x, and 192.168.x.x are rejected or require approval
  • Add tests that PATH, LD_PRELOAD, and other sensitive env var keys are denied in plugin tool env maps
  • Add tests that plugin CWD containing ../ or absolute paths outside workspace are rejected
  • Fuzz test parse_plugin_manifest and parse_plugin_command_markdown with malformed inputs
  • Unit test all pure/free functions first (lowest effort, highest coverage density)

File-Level Coverage Notes

  • src-tauri/src/core/agent_run_manager.rs: Patch content unavailable (binary/large/renamed), reviewed as file-level risk only. (cannot_inline_without_patch)
  • src-tauri/src/core/agent_session.rs: Patch content unavailable (binary/large/renamed), reviewed as file-level risk only. (cannot_inline_without_patch)
  • src-tauri/src/core/agent_session_tests.rs: Patch content unavailable (binary/large/renamed), reviewed as file-level risk only. (cannot_inline_without_patch)
  • src-tauri/src/extensions/mcp.rs: Patch content unavailable (binary/large/renamed), reviewed as file-level risk only. (cannot_inline_without_patch)
  • src-tauri/src/extensions/mod.rs: Patch content unavailable (binary/large/renamed), reviewed as file-level risk only. (cannot_inline_without_patch)
  • src-tauri/Cargo.toml: low_risk (Simple dependency version change; testing risk is in downstream modules consuming the new API.)
  • src-tauri/src/extensions/plugins.rs: high_risk (Good test coverage for manifest parsing, DTO builders, and free-function helpers. Critical gap is the I/O-heavy async methods (execute_hook, execute_plugin_tool, install/uninstall/sync) which have zero coverage.)
  • src-tauri/src/core/agent_run_event_handler.rs: critical_risk (This is the most critical testing gap in the batch. The entire event-driven state machine that governs run lifecycle has no automated verification.)
  • src-tauri/src/core/agent_session_execution.rs: high_risk (resolve_helper_tool_task has good test coverage. The main execution paths and all other sub-handlers are untested.)
  • src-tauri/src/core/subagent/orchestrator.rs: medium_risk (The refactoring of merge_payload and addition of thinking_level propagation are behaviorally significant but lightly tested.)
  • src-tauri/src/core/agent_run_compaction.rs: high_risk (persist_clear_context_reset_to_pool has a good test. The main orchestration methods are completely untested.)
  • src-tauri/src/core/agent_session_compression.rs: medium_risk (Best test coverage in the batch. persist_compression_markers_to_pool and run_auto_compression (non-LLM paths) have solid tests. The calibration and LLM-dependent paths remain gaps.)
  • src-tauri/src/extensions/marketplace.rs: Has several performance issues: N+1 DB queries, full-list scan for single-item install, sequential git sync, and per-call allocations in builtin source helpers. (The N+1 pattern in marketplace_list_items is the most impactful issue to address first.)
  • src-tauri/src/extensions/skills.rs: Lacks any caching layer; every existence check or scope lookup does a full filesystem walk and parse. Linear search in state application. (The skill_exists / lookup_skill_actual_scope full-scan pattern is the highest-impact issue here.)
  • src-tauri/src/extensions/config_io.rs: Minor I/O concern with display_config_path canonicalize. Mutex-based diagnostic recording is acceptable for small lists. Overall low risk. (Low priority; canonicalize caching is a nice-to-have.)
  • src-tauri/src/extensions/runtime_tools.rs: resolve_tool does linear scan across all plugins and MCP servers. Hook execution is sequential with interleaved DB writes. Acceptable for typical counts but no indexing. (No critical issues; consider a tool-name index if plugin count grows.)
  • src-tauri/src/core/settings_manager.rs: Only a test data addition (reasoning_content_constrained field). No performance impact.
  • src-tauri/src/core/thread_manager.rs: Adds a single DB call (discard_dangling_reasoning) to startup recovery. Low overhead; the query is a simple indexed UPDATE.
  • src-tauri/src/commands/git.rs: Test-only change adding a field. No performance impact.
  • src-tauri/src/commands/thread.rs: Test-only change adding a field. No performance impact.
  • ... and 31 more file-level entries.

Inline Downgraded Items (processed but not inline)

  • None

Coverage Status

  • Target files: 51
  • Covered files: 35
  • Uncovered files: 16
  • No-patch/binary covered as file-level: 5
  • Findings with unknown confidence (N/A): 0

Uncovered list:

  • src-tauri/src/core/agent_run_manager_tests.rs: planner_signaled_done_before_coverage
  • src-tauri/src/core/agent_run_summary.rs: planner_signaled_done_before_coverage
  • src-tauri/src/core/agent_run_title.rs: planner_signaled_done_before_coverage
  • src-tauri/src/core/agent_session_events.rs: planner_signaled_done_before_coverage
  • src-tauri/src/core/agent_session_history.rs: planner_signaled_done_before_coverage
  • src-tauri/src/core/agent_session_tools.rs: planner_signaled_done_before_coverage
  • src-tauri/src/core/agent_session_types.rs: planner_signaled_done_before_coverage
  • src-tauri/src/core/mod.rs: planner_signaled_done_before_coverage
  • src/modules/settings-center/model/run-model-plan.test.ts: planner_signaled_done_before_coverage
  • src/modules/settings-center/model/run-model-plan.ts: planner_signaled_done_before_coverage
  • src/modules/settings-center/ui/settings-center-overlay.tsx: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/dashboard-terminal-orchestrator.tsx: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/runtime-thread-surface-diff.tsx: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/runtime-thread-surface-helpers.ts: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/runtime-thread-surface-tools.tsx: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/thread-rename-input.tsx: planner_signaled_done_before_coverage

No-patch covered list:

  • src-tauri/src/core/agent_run_manager.rs: patch_unavailable_or_binary
  • src-tauri/src/core/agent_session.rs: patch_unavailable_or_binary
  • src-tauri/src/core/agent_session_tests.rs: patch_unavailable_or_binary
  • src-tauri/src/extensions/mcp.rs: patch_unavailable_or_binary
  • src-tauri/src/extensions/mod.rs: patch_unavailable_or_binary

Runtime/Budget

  • Rounds used: 1/4
  • Planned batches: 3
  • Executed batches: 3
  • Sub-agent runs: 8
  • Planner calls: 1
  • Reviewer calls: 8
  • Model calls: 9/64
  • Structured-output summary-only degradation: NO

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 40
  • Findings with unknown confidence: 0
  • Inline comments attempted: 32
  • Target files: 27
  • Covered files: 27
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

@@ -0,0 +1,593 @@
use tauri::{AppHandle, Emitter};

This comment was marked as outdated.

@@ -0,0 +1,840 @@
use tiycore::agent::AgentMessage;

This comment was marked as outdated.

let _ = event_tx.send(ThreadStreamEvent::ReasoningUpdated {
run_id: run_id.to_string(),
message_id,
reasoning: buffer.clone(),

This comment was marked as outdated.

// message in the same run whose created_at >= started_at, minus a
// small delta so the tool call lands just before it. If no match
// is found, place it at the end.
let insert_pos = msg_positions

This comment was marked as outdated.

if rendered.len() > MAX_TOOL_RESULT_SIZE {
rendered.truncate(MAX_TOOL_RESULT_SIZE);
// Ensure we don't cut in the middle of a multi-byte UTF-8 char
while !rendered.is_char_boundary(rendered.len()) {

This comment was marked as outdated.

/// scope is read from the `SkillRecordDto.scope` field, which `load_skills`
/// now derives from the discovered source label rather than the query
/// parameter.
pub(super) async fn lookup_skill_actual_scope(

This comment was marked as outdated.

}
}

pub(super) async fn skill_exists(

This comment was marked as outdated.

return Number.isFinite(parsed) ? parsed : 0;
}

export function buildDiffPreviewRows(diff: string): Array<DiffPreviewRow> {

This comment was marked as outdated.

totalToolCalls: number;
};

export function mapSnapshotHelperStatus(

This comment was marked as outdated.

setRegenerating(false);
refocusAfterRegenerate();
});
}, [threadId, modelPlan, isRegenerating, refocusAfterRegenerate, t]);

This comment was marked as outdated.

…d extensions

Extract agent run compaction, session execution, and related tests into
dedicated modules (agent_run_compaction.rs, agent_session_execution.rs,
agent_run_manager_tests.rs, agent_session_tests.rs) to reduce complexity
in agent_run_manager.rs and agent_session.rs. Simplify the extensions
facade by removing mod.rs and introducing focused modules for config_io,
plugins, mcp, skills, and marketplace. Add dashboard sidebar and overlays
components to the workbench shell UI. Update documentation and module
references to reflect the new structure.
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 26
  • Findings with unknown confidence: 0
  • Inline comments attempted: 26
  • Target files: 33
  • Covered files: 33
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

description: read_string_keys(tool, &["description"])
.unwrap_or_else(|| name.clone()),
command,
args: read_string_array_keys(tool, &["args"]),

This comment was marked as outdated.

)
}

#[cfg(test)]

This comment was marked as outdated.

.collect()
}

pub(crate) fn convert_history_messages(

This comment was marked as outdated.

}
}

pub(crate) fn validate_clarify_input(value: &serde_json::Value) -> Result<(), String> {

This comment was marked as outdated.

/// this leaves headroom for protocol overhead and JSON escaping.
const MAX_TOOL_RESULT_SIZE: usize = 8_000_000;

pub(crate) fn agent_tool_result_from_output(

This comment was marked as outdated.

"path": "/compact",
"description": "Clear history but keep a summary in context.",
"argumentHint": "[instructions=...]",
"argumentsText": instructions.clone().unwrap_or_default(),

This comment was marked as outdated.

format!(
"Create a short thread title for this conversation.\n\
Rules:\n\
- {language_rule}\n\

This comment was marked as outdated.

}
}

pub(super) async fn skill_exists(

This comment was marked as outdated.

import { ThreadRenameInput } from "@/modules/workbench-shell/ui/thread-rename-input";
import { ThreadStatusIndicator } from "@/modules/workbench-shell/ui/thread-status-indicator";

const WORKSPACE_THREAD_PAGE_SIZE = 10;

This comment was marked as outdated.

state: string;
};

function asToolDataRecord(value: unknown) {

This comment was marked as outdated.

Move ResolvedTool, ToolProviderContext, and related tool resolution,
execution, and hook methods from extensions/mod.rs into a new
runtime_tools.rs module to improve separation of concerns and reduce
the size of the facade file.

refactor(workbench): ♻️ extract dashboard logic and thread surface state

- Move dashboard orchestration helpers and constants from
  dashboard-workbench.tsx into a dedicated dashboard-workbench-logic.ts
  module for better reuse and readability
- Extract runtime thread surface state, timeline mapping, and related
  types into a new runtime-thread-surface-state.ts module to slim down
  the main surface component
- Update imports and references across workbench-shell UI modules
- Update AGENTS.md to reflect the new module structure
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 11
  • Findings with unknown confidence: 0
  • Inline comments attempted: 11
  • Target files: 36
  • Covered files: 19
  • Uncovered files: 17
    See the summary comment for detailed analysis and coverage details.

self.handle_runtime_event(run_id, terminal_event).await
}

pub(crate) async fn handle_runtime_event(

This comment was marked as outdated.

use crate::ipc::frontend_channels::ThreadStreamEvent;
use crate::model::thread::RunUsageDto;

pub(crate) fn handle_agent_event(

This comment was marked as outdated.

use super::agent_session::{standard_tool_timeout, AgentSession, CLARIFY_TOOL_NAME};

impl AgentSession {
pub(crate) async fn execute_tool_call(

This comment was marked as outdated.


pub async fn marketplace_install_item(&self, id: &str) -> Result<PluginDetailDto, AppError> {
let item = self
.marketplace_list_items()

This comment was marked as outdated.

.arg("clone")
.arg("--depth")
.arg("1")
.arg(&source.url)

This comment was marked as outdated.

/// audit. It always emits a terminal event (RunCompleted / RunFailed /
/// RunCancelled) and always clears the `ActiveRun`, even on panic-like
/// early returns, so the thread can't get stuck in Running state.
async fn run_compact_background(

This comment was marked as outdated.

/// directly rather than via the Agent runtime.
///
/// [`generate_discard_summary`]: crate::core::context_compression::generate_discard_summary
pub(crate) async fn run_auto_compression(

This comment was marked as outdated.

&cache_dir.join(".claude-plugin/marketplace.json"),
)?;
let source_name = source_manifest.name.unwrap_or_else(|| source.name.clone());
let installed = self

This comment was marked as outdated.

}

pub(super) fn marketplace_source_id(url: &str) -> String {
let mut hasher = std::collections::hash_map::DefaultHasher::new();

This comment was marked as outdated.

}

pub(super) fn is_builtin_marketplace_source_id(id: &str) -> bool {
builtin_marketplace_sources()

This comment was marked as outdated.

jorben added 3 commits April 26, 2026 08:25
…ar persistence

- Extract `persist_clear_context_reset_to_pool` to isolate context reset persistence logic
- Refactor `resolve_helper_tool_task` into a reusable function with clear error handling
- Add comprehensive unit tests for context clear persistence and helper tool task resolution
- Improve test coverage for terminal events and reasoning completion policies
…y cleanup

- Always attach the payload hook so the DeepSeek thinking normalizer runs
  even when no provider_options are present
- Add `is_deepseek_provider` detection by provider type and base URL
- Add `normalize_deepseek_thinking_payload` to sanitize assistant messages
  by filling missing reasoning_content, preventing null content, and
  stripping reasoning when thinking is disabled
- Pass `thinking_level` through helper agent orchestrator requests and
  enable thinking only when the model supports it
- Skip empty reasoning records in `convert_history_messages` to avoid
  serialization issues with providers like DeepSeek
- Clear pending thinking at tool-result boundaries to prevent orphan
  reasoning from leaking across assistant messages
- Make `merge_payload` public and reuse it from `agent_session`
- Add comprehensive tests for DeepSeek detection and payload normalization
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 30
  • Findings with unknown confidence: 0
  • Inline comments attempted: 24
  • Target files: 37
  • Covered files: 36
  • Uncovered files: 1
    See the summary comment for detailed analysis and coverage details.

.arg("clone")
.arg("--depth")
.arg("1")
.arg(&source.url)

This comment was marked as outdated.

event: &str,
payload: serde_json::Value,
) -> Result<HookOutput, AppError> {
let command_path = plugin.path.join(handler);

This comment was marked as outdated.


let output = self
.execute_command_json(
OsStr::new(&tool.command),

This comment was marked as outdated.

// mutates) will saturate the IPC queue and block thread list rendering.
export const SIDEBAR_SYNC_MIN_GAP_MS = 300;

export function buildInitialWorkspaceThreadDisplayCounts() {

This comment was marked as outdated.

return Number.isFinite(parsed) ? parsed : 0;
}

export function buildDiffPreviewRows(diff: string): Array<DiffPreviewRow> {

This comment was marked as outdated.

* a tool's state when a stale snapshot resolves after live stream events
* have already advanced the tool.
*/
function isMoreAdvancedToolState(

This comment was marked as outdated.

showOutputLabel?: boolean;
};

export function getReadToolPresentation(tool: RuntimeSurfaceToolEntry): ReadToolPresentation | null {

This comment was marked as outdated.

)
})?;

if !discovered && !plugin_dir.exists() {

This comment was marked as outdated.

</div>
</div>
) : (
sortWorkspacesWithWorktrees(workspaces as WorkspaceItem[]).map((workspace) => {

This comment was marked as outdated.

};
}


This comment was marked as outdated.

Introduce turn_index propagation across agent session events, stream
events, and persistence to enable proper response boundary tracking
and reasoning lifecycle management.

Key changes:
- Add turn_index field to ThreadStreamEvent::MessageCompleted and
  ReasoningUpdated, and propagate it from AgentEvent::MessageUpdate
  and MessageEnd through the event handling pipeline
- Persist turn_index into message metadata when messages complete
- Improve reasoning message termination logic to discard invalid or
  empty reasoning blocks without valid thinking signatures
- Expand DeepSeek payload normalizer to backfill reasoning_content
  on text-only assistant messages, not just those with tool_calls
- Add discard_dangling_reasoning cleanup for interrupted runs on
  startup recovery
- Update all related tests and frontend integration tests to cover
  turn_index propagation and new DeepSeek normalization scenarios

Closes: Phase 4 reasoning mis-allocation, DeepSeek 400 errors
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 36
  • Findings with unknown confidence: 0
  • Inline comments attempted: 32
  • Target files: 42
  • Covered files: 42
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.


use super::agent_run_manager::AgentRunManager;

pub(crate) fn build_orphaned_run_terminal_event(

This comment was marked as outdated.

self.handle_runtime_event(run_id, terminal_event).await
}

pub(crate) async fn handle_runtime_event(

This comment was marked as outdated.

});
}

match outcome.result {

This comment was marked as outdated.

}

impl AgentSession {
pub(crate) async fn execute_tool_call(

This comment was marked as outdated.

.collect()
}

pub(crate) fn convert_history_messages(

This comment was marked as outdated.

import { ThreadRenameInput } from "@/modules/workbench-shell/ui/thread-rename-input";
import { ThreadStatusIndicator } from "@/modules/workbench-shell/ui/thread-status-indicator";

const WORKSPACE_THREAD_PAGE_SIZE = 10;

This comment was marked as outdated.

// mutates) will saturate the IPC queue and block thread list rendering.
export const SIDEBAR_SYNC_MIN_GAP_MS = 300;

export function buildInitialWorkspaceThreadDisplayCounts() {

This comment was marked as outdated.

return Number.isFinite(parsed) ? parsed : 0;
}

export function buildDiffPreviewRows(diff: string): Array<DiffPreviewRow> {

This comment was marked as outdated.

showOutputLabel?: boolean;
};

export function getReadToolPresentation(tool: RuntimeSurfaceToolEntry): ReadToolPresentation | null {

This comment was marked as outdated.

input?: unknown;
name: string;
result?: unknown;
state: string;

This comment was marked as outdated.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 24
  • Findings with unknown confidence: 0
  • Inline comments attempted: 22
  • Target files: 42
  • Covered files: 29
  • Uncovered files: 13
    See the summary comment for detailed analysis and coverage details.

self.handle_runtime_event(run_id, terminal_event).await
}

pub(crate) async fn handle_runtime_event(

This comment was marked as outdated.

use crate::ipc::frontend_channels::ThreadStreamEvent;
use crate::model::thread::RunUsageDto;

pub(crate) fn handle_agent_event(

This comment was marked as outdated.

}
}

pub(crate) fn validate_clarify_input(value: &serde_json::Value) -> Result<(), String> {

This comment was marked as outdated.

/// this leaves headroom for protocol overhead and JSON escaping.
const MAX_TOOL_RESULT_SIZE: usize = 8_000_000;

pub(crate) fn agent_tool_result_from_output(

This comment was marked as outdated.

})
}

pub(super) fn read_string_array_keys(

This comment was marked as outdated.

})
}

pub(super) fn mask_sensitive_value(value: &str) -> String {

This comment was marked as outdated.

.then_with(|| left.id.cmp(&right.id))
}

pub(super) fn apply_skill_state(record: &mut SkillRecordDto, state: &SkillStateStore) {

This comment was marked as outdated.

run_id: "run-1".into(),
message_id: "msg-1".into(),
content: "Full response".into(),
turn_index: None,

This comment was marked as outdated.

.await;
}

let tool_call_storage_id = uuid::Uuid::now_v7().to_string();

This comment was marked as outdated.

)
})?;

if !discovered && !plugin_dir.exists() {

This comment was marked as outdated.

Add a new `supports_reasoning` field across the model configuration
stack to properly track and propagate reasoning capabilities:

- Introduce `supports_reasoning` to `RuntimeModelRole`, `RunModelPlanRoleDto`, and related types
- Infer reasoning support automatically from model IDs with keyword patterns (e.g. o1, r1, gpt-5)
- Allow manual overrides via `capabilityOverrides` to disable/enable reasoning per model
- Extract `inferModelCapabilities` and `getEffectiveModelCapabilities` into a shared module
- Update `resolve_runtime_model_role` to respect declared reasoning capability instead of forcing it
- Refactor `resolve_model_plan` to apply thinking level only to reasoning-capable roles via `apply_thinking_level_to_model_role`
- Adjust `configure_agent` to check both thinking level and `model.reasoning` flag
- Add comprehensive tests for reasoning capability resolution, thinking level application, and helper role behavior

BREAKING CHANGE: model reasoning is no longer forced on when thinking level is enabled; it now depends on the model's declared capability
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 39
  • Findings with unknown confidence: 0
  • Inline comments attempted: 32
  • Target files: 49
  • Covered files: 49
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

/// audit. It always emits a terminal event (RunCompleted / RunFailed /
/// RunCancelled) and always clears the `ActiveRun`, even on panic-like
/// early returns, so the thread can't get stuck in Running state.
async fn run_compact_background(

This comment was marked as outdated.

/// (errors, final output) of large tool results. When the omitted middle
/// section is very small (< 50 chars), a simple head truncation is used
/// instead to avoid a gap marker that hides barely any content.
pub(crate) fn truncate_tool_result_head_tail(text: &str, max_chars: usize) -> String {

This comment was marked as outdated.

.collect()
}

pub(crate) fn convert_history_messages(

This comment was marked as outdated.

Ok(items)
}

pub async fn marketplace_install_item(&self, id: &str) -> Result<PluginDetailDto, AppError> {

This comment was marked as outdated.

event: &str,
payload: serde_json::Value,
) -> Result<HookOutput, AppError> {
let command_path = plugin.path.join(handler);

This comment was marked as outdated.

return Number.isFinite(parsed) ? parsed : 0;
}

export function buildDiffPreviewRows(diff: string): Array<DiffPreviewRow> {

This comment was marked as outdated.

.map(([toolName, count]) => `${toolName} ${count}`);
}

export function getHelperElapsedSeconds(

This comment was marked as outdated.

input?: unknown;
name: string;
result?: unknown;
state: string;

This comment was marked as outdated.

Comment thread src-tauri/Cargo.toml Outdated
git2 = { version = "0.20", features = ["vendored-libgit2", "vendored-openssl"] }
similar = "2"
tiycore = "0.1.19"
tiycore = "0.1.21-rc.26042620"

This comment was marked as outdated.

let spawn_model_role = preview_spec.model_plan.primary.clone();
let spawn_response_language = response_language.map(str::to_owned);
let spawn_frontend_tx = frontend_tx.clone();
tokio::spawn(async move {

This comment was marked as outdated.

…to tiycore

Move DeepSeek reasoning_content normalization from application layer into tiycore,
eliminating duplicate handling across agent_session and subagent orchestrator.

- Remove is_deepseek_provider and normalize_deepseek_thinking_payload from agent_session
- Remove redundant normalize_deepseek_thinking_payload tests from agent_session_tests
- Update subagent orchestrator payload hook to rely on tiycore built-in handling
- Add reasoning_content_constrained field in settings_manager tests
- Update agent_run integration tests with supportsReasoning flag
- Bump tiycore to 0.2.1-rc.26042720 and refresh Cargo.lock
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 35
  • Findings with unknown confidence: 0
  • Inline comments attempted: 32
  • Target files: 51
  • Covered files: 51
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

let _ = event_tx.send(ThreadStreamEvent::ReasoningUpdated {
run_id: run_id.to_string(),
message_id,
reasoning: buffer.clone(),

This comment was marked as outdated.

use crate::ipc::frontend_channels::ThreadStreamEvent;
use crate::model::thread::RunUsageDto;

pub(crate) fn handle_agent_event(

This comment was marked as outdated.

);
}

self.checkpoint_requested.store(true, Ordering::SeqCst);

This comment was marked as outdated.

.collect()
}

pub(crate) fn convert_history_messages(

This comment was marked as outdated.

// candidate text message and insert_pos — a reasoning message in
// between indicates the tool call came from a different model
// response and must remain a separate assistant message.
let merge_target_pos = msg_positions[..insert_pos]

This comment was marked as outdated.

return Number.isFinite(parsed) ? parsed : 0;
}

export function buildDiffPreviewRows(diff: string): Array<DiffPreviewRow> {

This comment was marked as outdated.

input?: unknown;
name: string;
result?: unknown;
state: string;

This comment was marked as outdated.

model_name: model_id.to_string(),
provider_type: "openai".to_string(),
provider_name: "OpenAI".to_string(),
api_key: Some("test-key".to_string()),

This comment was marked as outdated.

// Merge per-model provider_options into every outgoing LLM request payload.
{
let provider_options = request.model_role.provider_options.clone();
if provider_options.is_some() {

This comment was marked as outdated.

Comment thread src-tauri/src/core/thread_manager.rs Outdated
tool_calls = tc_count,
helpers = helper_count,
reasoning = reasoning_count,
"interrupted dangling runs/tool_calls/helpers on startup"

This comment was marked as outdated.

Add path traversal prevention by canonicalizing the hook path and
checking it stays within the plugin directory. Previously, a malicious
plugin could escape its directory via crafted hook paths.

Additionally:
- Update log message in thread_manager to include "reasoning" in
  startup dangling runs/tool_calls/helpers/reasoning count.
- Import WORKSPACE_THREAD_PAGE_SIZE from shared logic instead of
  local constant in dashboard-sidebar.
- Change state property type from string to SurfaceToolState in
  RuntimeSurfaceToolEntry for better type safety.
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 10
  • Findings with unknown confidence: 0
  • Inline comments attempted: 10
  • Target files: 51
  • Covered files: 25
  • Uncovered files: 26
    See the summary comment for detailed analysis and coverage details.

description: read_string_keys(tool, &["description"])
.unwrap_or_else(|| name.clone()),
command,
args: read_string_array_keys(tool, &["args"]),

This comment was marked as outdated.

.collect::<Vec<_>>()
});

let output = self

This comment was marked as outdated.


let setup = async {
message_repo::insert(&self.pool, &user_message).await?;
message_repo::insert(&self.pool, &reset_message).await?;

This comment was marked as outdated.

}

pub(super) fn marketplace_source_id(url: &str) -> String {
let mut hasher = std::collections::hash_map::DefaultHasher::new();

This comment was marked as outdated.

.arg("clone")
.arg("--depth")
.arg("1")
.arg(&source.url)

This comment was marked as outdated.

let installed = self.load_installed_plugin_records().await?;
let mut items = Vec::with_capacity(installed.len());
for record in installed {
let mut runtime = self.load_plugin_from_dir(Path::new(&record.path), false)?;

This comment was marked as outdated.

args: Some(read_string_array_keys(spec, &["args"])),
env: read_string_map_keys(spec, &["env"]),
cwd: read_string_keys(spec, &["cwd"]),
url: read_string_keys(spec, &["url"]),

This comment was marked as outdated.

maxOutputTokens: model.maxOutputTokens ?? null,
supportsImageInput: model.capabilityOverrides.vision ?? null,
supportsImageInput: capabilities.vision,
supportsReasoning: capabilities.reasoning,

This comment was marked as outdated.

if did_update_sources {
self.save_marketplace_sources(&store)?;
}
let installed = self.load_installed_plugin_records().await?;

This comment was marked as outdated.

)
})?;

if !discovered && !plugin_dir.exists() {

This comment was marked as outdated.

Update the Homebrew cask URL to remove duplicate 'v' prefix in
version path.

Add additional application data directories to the zap uninstall
cleanup list (HTTPStorages, Logs, and WebKit) to ensure complete
removal of local application data.
@jorben jorben merged commit dd55d5c into master Apr 28, 2026
4 checks passed
@jorben jorben deleted the refact/arch-optimization-2604 branch April 28, 2026 00:39
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 22
  • Findings with unknown confidence: 0
  • Inline comments attempted: 22
  • Target files: 51
  • Covered files: 35
  • Uncovered files: 16
    See the summary comment for detailed analysis and coverage details.

self.handle_runtime_event(run_id, terminal_event).await
}

pub(crate) async fn handle_runtime_event(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] agent_run_event_handler.rs has zero test coverage for the main event handler

The central runtime event handler that drives all agent run state transitions has no automated tests. Race conditions in status transitions (e.g. MessageDelta → MessageCompleted, or duplicate terminal events) are entirely untested.

Suggestion: Add unit tests for each match arm of handle_runtime_event using a mock AgentRunManager or by extracting pure decision functions (terminal_event_status, should_complete_reasoning_for_event, is_terminal_runtime_event) and testing those with synthetic ThreadStreamEvent variants. The free functions at R16-R59 are ideal candidates for immediate unit testing.

Risk: State machine regressions (e.g. wrong thread status after cancellation, reasoning messages left in 'streaming' state) will only be caught manually.

Confidence: 0.95

[From SubAgent: testing]

///
/// Returns `(run_id, event_rx)` so the caller can forward events over a
/// Tauri `Channel` identical to `start_run`.
pub async fn compact_thread_context(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] compact_thread_context and run_compact_background have no tests

The /compact command entry point — including ActiveRun registration, message persistence order, broadcast channel setup, and the spawned background task — has no test coverage. The existing test only covers the simpler /clear path.

Suggestion: Add a test that: (1) calls compact_thread_context on a seeded thread; (2) verifies the user message, reset marker, and run row are persisted; (3) verifies the frontend channel receives RunStarted and ContextCompressing events. The LLM-dependent part (run_compact_background) can be tested with a mock or by verifying the final state after a short timeout.

Risk: Compaction lifecycle bugs (stuck Running state, missing reset markers, missing events) will only be caught in manual testing.

Confidence: 0.88

[From SubAgent: testing]

message_repo::update_status(&self.pool, &message_id, finalized_message_status).await?;
}
// Reasoning termination: classify by content/signature validity
if let Some(message_id) = reasoning_message_id {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] finish_run reasoning discard logic has no test coverage

The reasoning message classification in finish_run (discard if empty or missing signature for non-completed runs, keep otherwise) directly controls what thinking content users see after interrupted/cancelled runs. No tests validate any of the four combinations (empty vs non-empty × signature present vs absent).

Suggestion: Extract the discard decision into a pure function and add parameterized tests covering: (1) completed run → never discard; (2) non-completed + empty content → discard; (3) non-completed + no signature → discard; (4) non-completed + content + signature → keep as completed.

Risk: Users may lose valid thinking content on interrupted runs, or see garbage thinking content that should have been discarded.

Confidence: 0.87

[From SubAgent: testing]

}

impl AgentSession {
pub(crate) async fn execute_tool_call(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] execute_tool_call dispatch has no test coverage for any branch

The primary tool execution dispatcher has no integration tests. Branch-specific logic like approval flow event emission (ApprovalRequired → ApprovalResolved), timeout result persistence, and clarify request lifecycle are all untested.

Suggestion: At minimum, add unit tests for the event emission patterns: (1) tool approval required then approved emits both events; (2) tool timeout persists the error to tool_call_repo; (3) clarify request emits ClarifyRequired then ClarifyResolved. These can be tested with mock event_tx and tool_gateway.

Risk: Event emission ordering bugs (e.g. missing ToolRunning event, double ToolFailed) would cause frontend state desynchronization.

Confidence: 0.85

[From SubAgent: testing]

) -> Result<HookOutput, AppError> {
let command_path = plugin.path.join(handler);
// Prevent path traversal: ensure the resolved command stays within the plugin directory.
let command_path = std::fs::canonicalize(&command_path).map_err(|error| {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] execute_hook path traversal prevention is untested

The path traversal guard in execute_hook is security-critical but untested. A regression in the starts_with comparison or canonicalize behavior could silently allow execution of arbitrary binaries.

Suggestion: Add tests that: (1) a handler '../../../etc/passwd' is rejected with the hook_escape error; (2) a symlink inside the plugin dir that points outside is rejected; (3) a valid handler within the plugin dir succeeds.

Risk: A path traversal bypass would allow plugin manifests to execute arbitrary commands outside their directory.

Confidence: 0.92

[From SubAgent: testing]

});
}

export function mapRunStateToWorkbenchThreadStatus(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] No unit tests for dashboard-workbench-logic.ts pure functions

The extracted workbench logic module contains status mapping and data computation functions with no test coverage. These functions drive sidebar status indicators and context badge display.

Suggestion: Add tests for: each RunState value in mapRunStateToWorkbenchThreadStatus; each status in mapRunFinishedStatusToThreadStatus; buildThreadContextBadgeData with zero/positive context window; mergeLocalFallbackThreads with overlapping and disjoint thread sets; formatCompactTokenCount boundary values.

Risk: Incorrect sidebar thread status badges or context usage display after refactoring

Confidence: 0.88

[From SubAgent: testing]

Comment thread src-tauri/Cargo.toml
git2 = { version = "0.20", features = ["vendored-libgit2", "vendored-openssl"] }
similar = "2"
tiycore = "0.1.19"
tiycore = { version = "0.2.1-rc.26042720" }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] RC version of tiycore used as a production dependency

The tiycore dependency was bumped from a stable version (0.1.19) to a release candidate (0.2.1-rc.26042720). RC versions may have unstable APIs or bugs and shouldn't ship to production without explicit sign-off.

Suggestion: Ensure this RC is intentional and temporary. Add a comment or tracking issue reference in Cargo.toml, and plan to pin to the stable 0.2.1 once it's released before merging to main.

Risk: An RC dependency could introduce regressions, breaking changes, or instability in agent runtime behavior.

Confidence: 0.90

[From SubAgent: general]

Comment thread src-tauri/Cargo.toml
git2 = { version = "0.20", features = ["vendored-libgit2", "vendored-openssl"] }
similar = "2"
tiycore = "0.1.19"
tiycore = { version = "0.2.1-rc.26042720" }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Pre-release tiycore dependency introduces supply-chain risk

The tiycore dependency is updated from a stable version (0.1.19) to a pre-release RC version (0.2.1-rc.26042720), which may carry additional supply-chain and stability risk.

Suggestion: Pin to a stable release before shipping to production. If an RC is required for feature access, verify the upstream tag and commit signature.

Risk: RC builds may contain unreviewed changes, security regressions, or unstable behavior not present in the stable release.

Confidence: 0.90

[From SubAgent: security]


if let Some(mut stdin) = child.stdin.take() {
tokio::spawn(async move {
let _ = stdin.write_all(&payload).await;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Stdin write error silently ignored in execute_command_

The stdin write to the child process in execute_command_ silently ignores errors. If writing the JSON payload fails, the child may hang waiting for input or process empty input, producing confusing results.

Suggestion: Log the error at minimum: if let Err(e) = stdin.write_all(&payload).await { tracing::warn!(...); }. Consider propagating the error or killing the child process if the write fails.

Risk: Silent stdin write failures can cause plugins/hooks to hang or produce incorrect output without any diagnostic signal.

Confidence: 0.85

[From SubAgent: general]

status: message.status,
};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Double blank lines in runtime-thread-surface-state.ts

Inconsistent spacing with double blank lines in the new state module.

Suggestion: Remove extra blank lines to maintain single-blank-line convention.

Risk: Minor style inconsistency; no functional impact.

Confidence: 0.90

[From SubAgent: general]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants