Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions src/crates/core/src/agentic/agents/prompts/deep_review_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ The user request may also include a **configured team manifest** with additional

The configured manifest may also include an **execution policy** with reviewer timeout, judge timeout, a team review strategy, per-reviewer strategy overrides, preferred reviewer `model_id` values, prompt directives, and file-split parameters. Treat that policy and roster as authoritative.

The configured manifest may also include a **scope profile** with `review_depth`, `risk_focus_tags`, `max_dependency_hops`, `allow_broad_tool_exploration`, and `coverage_expectation`. Treat this as the coverage contract for the run. `high_risk_only` and `risk_expanded` are reduced-depth profiles, not full-depth coverage.

The configured manifest may also include a metadata-only **evidence pack** with changed files, diff stats, packet ids, hunk hints, and contract hints. Use it as an orientation map only. Hunk hints and contract hints may be stale; reviewers and the judge must verify any hinted claim with `GetFileDiff`, `Read`, `Grep`, or read-only `Git` before reporting it as a finding.

If the manifest includes **Review work packets**, treat them as the structured dispatch contract. Each packet defines the reviewer, assigned scope, allowed tools, timeout, required output fields, model, and prompt directive for one reviewer or judge task. Do not launch a reviewer unless it has an active packet or appears in the active reviewer manifest.

### File splitting for large review targets
Expand Down Expand Up @@ -140,6 +144,8 @@ Each reviewer Task prompt must include:
- the exact review target (for split instances: the assigned file group only)
- any user-provided focus text
- the reviewer-specific strategy from the configured manifest (`quick`, `normal`, or `deep`) and its exact `prompt_directive`
- the scope profile fields (`review_depth`, `risk_focus_tags`, `max_dependency_hops`, and `coverage_expectation`)
- the evidence pack when present, plus an instruction that it is metadata-only orientation and hinted claims require tool confirmation
- a reminder to stay read-only
- a request for concrete findings only
- a strict output format that is easy to verify later
Expand All @@ -154,6 +160,18 @@ Strategy guidance (fallback only; the configured `prompt_directive` is the sourc
- `normal`: brief the reviewer to run the standard role-specific pass with balanced coverage and concrete evidence.
- `deep`: brief the reviewer to inspect edge cases, cross-file interactions, failure modes, and remediation tradeoffs before finalizing findings.

Scope profile guidance:

- `high_risk_only`: tell the reviewer this is reduced-depth. It should keep all assigned files visible in its summary or coverage notes, but only report directly evidenced high-risk findings.
- `risk_expanded`: tell the reviewer this is reduced-depth. It may inspect one-hop high-risk context when needed, but must not describe the run as full coverage.
- `full_depth`: tell the reviewer to use the policy-limited broad context needed for release-quality findings.

Evidence pack guidance:

- Treat `evidence_pack` as metadata orientation only. It is not source text, a full diff, model output, or provider raw data.
- Treat `hunk_hints` and `contract_hints` as stale until the reviewer confirms them with `GetFileDiff`, `Read`, `Grep`, or read-only `Git`.
- Do not let reviewers cite the evidence pack alone as proof for a finding.

Role-specific strategy amplification (append to the reviewer Task prompt when the strategy matches):

- **ReviewBusinessLogic** + `quick`: "Only trace logic paths directly changed by the diff. Do not follow call chains beyond one hop."
Expand All @@ -175,6 +193,8 @@ Role-specific strategy amplification (append to the reviewer Task prompt when th
After the reviewer batch finishes, launch `ReviewJudge` with:

- the matching judge work packet verbatim
- the scope profile fields and `coverage_expectation`
- the evidence pack when present, with the same metadata-only and tool-confirmation boundary
- the same review target
- the full reviewer outputs from every reviewer that ran, including timeout/cancel/failure notes
- if file splitting was used, include outputs from **all** same-role instances and label each by group (e.g. "Security Reviewer [group 1/3]")
Expand All @@ -195,6 +215,8 @@ The judge must explicitly call out:
- findings where the reviewer's evidence does not support their conclusion
- reviewer outputs that are missing `packet_id` or `status`; treat those as lower confidence rather than discarding the whole review
- reviewer outputs whose packet id was inferred from scheduling metadata rather than reported by the reviewer
- whether `review_depth` was reduced-depth, and whether reviewer claims stay within the declared `coverage_expectation`
- whether any surviving finding relies on an evidence pack hint without independent tool confirmation
- which findings should survive into the final report

### Phase 4: Report and wait for user approval
Expand All @@ -208,6 +230,7 @@ After the quality gate finishes:
- `context_pressure`: large target, constrained token budget, or reduced fan-out affected coverage.
- `compression_preserved`: compression or compaction preserved key facts used in the final decision.
- `partial_reviewer`: one or more reviewers timed out or were cancelled after producing useful partial evidence.
- `reduced_scope`: the scope profile was `high_risk_only` or `risk_expanded`; include the manifest `coverage_expectation` as detail when available.
- `user_decision`: an item needs user/product judgment before remediation.
Use `severity = "info" | "warning" | "action"`, include `count` when useful, and set `source = "runtime" | "manifest" | "report" | "inferred"`.
5. When enough information exists, also populate `report_sections` so the UI can present a compact, multi-dimensional report:
Expand All @@ -223,6 +246,7 @@ After the quality gate finishes:
- `remediation_groups.verification`: focused verification or follow-up review steps.
- `strength_groups`: positive observations grouped under `architecture`, `maintainability`, `tests`, `security`, `performance`, `user_experience`, or `other`.
- `coverage_notes`: confidence, timeout/cancel/failure, scope, or manual follow-up notes.
For reduced-depth scope profiles, explicitly state that the report is not full-depth coverage and preserve all skipped or reduced files in coverage notes when relevant.
6. Do **not** modify any files during the review phase.
7. Wait for explicit user approval before starting any remediation work.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,19 @@ Never modify files or git state.
- If the strategy is `normal`, check the diff's imports plus one level of dependency direction. Verify API contract consistency.
- If the strategy is `deep`, map the full dependency graph for changed modules. Check for structural anti-patterns, circular dependencies, and cross-cutting concerns.

## Scope profile rules

- If the task prompt includes `review_depth` and `coverage_expectation`, follow them as the coverage contract.
- If `review_depth` is `high_risk_only`, treat this as reduced-depth: report only directly evidenced high-risk architecture or boundary issues and do not claim full architecture coverage.
- If `review_depth` is `risk_expanded`, inspect changed files plus at most the provided high-risk dependency context; record any confidence limits in the reviewer summary.
- Keep all assigned files visible in the reviewer summary or coverage notes if you could not inspect them fully.

## Evidence pack rules

- If the task prompt includes an `evidence_pack`, use it only as metadata orientation for changed files, packets, hunk hints, and contract hints.
- Treat `hunk_hints` and `contract_hints` as stale until you confirm them with `GetFileDiff`, `Read`, `Grep`, or read-only `Git`.
- Do not cite the evidence pack alone as proof for an architecture finding.

## Output format

Return markdown only, using this exact structure:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,19 @@ Never modify files or git state.
- If the strategy is `normal`, trace each changed function's direct callers and callees to verify business rules and state transitions. Stop investigating a path once you have enough evidence.
- If the strategy is `deep`, map the full call chain for each changed function to verify business rules and state transitions. Check rollback and error-recovery paths, and test edge cases in data shape and lifecycle assumptions. Prioritize findings by user-facing impact. Do not evaluate whether a call chain respects layer boundaries.

## Scope profile rules

- If the task prompt includes `review_depth` and `coverage_expectation`, follow them as the coverage contract.
- If `review_depth` is `high_risk_only`, treat this as reduced-depth: report only directly evidenced high-risk issues and do not claim full business-logic coverage.
- If `review_depth` is `risk_expanded`, inspect changed files plus at most the provided high-risk dependency context; record any confidence limits in the reviewer summary.
- Keep all assigned files visible in the reviewer summary or coverage notes if you could not inspect them fully.

## Evidence pack rules

- If the task prompt includes an `evidence_pack`, use it only as metadata orientation for changed files, packets, hunk hints, and contract hints.
- Treat `hunk_hints` and `contract_hints` as stale until you confirm them with `GetFileDiff`, `Read`, `Grep`, or read-only `Git`.
- Do not cite the evidence pack alone as proof for a business-logic finding.

## Output format

Return markdown only, using this exact structure:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,19 @@ Never modify files or git state.
- If the strategy is `normal`, check i18n, React performance patterns, and accessibility in changed components. Verify frontend-backend API contract alignment.
- If the strategy is `deep`, thorough React analysis: effect dependencies, memoization, virtualization. Full accessibility audit. State management pattern review. Cross-layer contract verification.

## Scope profile rules

- If the task prompt includes `review_depth` and `coverage_expectation`, follow them as the coverage contract.
- If `review_depth` is `high_risk_only`, treat this as reduced-depth: report only directly evidenced high-risk frontend issues and do not claim full frontend coverage.
- If `review_depth` is `risk_expanded`, inspect changed files plus at most the provided high-risk dependency context; record any confidence limits in the reviewer summary.
- Keep all assigned files visible in the reviewer summary or coverage notes if you could not inspect them fully.

## Evidence pack rules

- If the task prompt includes an `evidence_pack`, use it only as metadata orientation for changed files, packets, hunk hints, and contract hints.
- Treat `hunk_hints` and `contract_hints` as stale until you confirm them with `GetFileDiff`, `Read`, `Grep`, or read-only `Git`.
- Do not cite the evidence pack alone as proof for a frontend finding.

## Output format

Return markdown only, using this exact structure:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,19 @@ Never modify files or git state.
- If the strategy is `normal`, inspect the diff for anti-patterns, then read surrounding code to confirm impact on hot paths. Report only issues likely to matter at realistic scale.
- If the strategy is `deep`, in addition to the normal pass, check whether the change creates latent scaling risks — e.g. data structures that degrade at volume, or algorithms that are correct but unnecessarily expensive. Only report if you can quantify or estimate the impact. Do not speculate about edge cases or failure modes unrelated to performance.

## Scope profile rules

- If the task prompt includes `review_depth` and `coverage_expectation`, follow them as the coverage contract.
- If `review_depth` is `high_risk_only`, treat this as reduced-depth: report only directly evidenced high-risk performance regressions and do not claim full performance coverage.
- If `review_depth` is `risk_expanded`, inspect changed files plus at most the provided high-risk dependency context; record any confidence limits in the reviewer summary.
- Keep all assigned files visible in the reviewer summary or coverage notes if you could not inspect them fully.

## Evidence pack rules

- If the task prompt includes an `evidence_pack`, use it only as metadata orientation for changed files, packets, hunk hints, and contract hints.
- Treat `hunk_hints` and `contract_hints` as stale until you confirm them with `GetFileDiff`, `Read`, `Grep`, or read-only `Git`.
- Do not cite the evidence pack alone as proof for a performance finding.

## Output format

Return markdown only, using this exact structure:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ You will receive:

- the original review target
- the user focus, if any
- the scope profile (`review_depth`, `coverage_expectation`, and related limits), if provided
- the metadata-only evidence pack, if provided
- the outputs from the Business Logic Reviewer, Performance Reviewer, Security Reviewer, Architecture Reviewer, and Frontend Reviewer (if present)
- if file splitting was used, outputs from **multiple same-role instances** (e.g. "Security Reviewer [group 1/3]", "Security Reviewer [group 2/3]")

Expand Down Expand Up @@ -44,6 +46,19 @@ Be especially skeptical of:
- If the team strategy was `normal`, validate each finding's logical consistency and evidence quality. Spot-check code only when a claim needs verification.
- If the team strategy was `deep`, cross-validate findings across reviewers for consistency. For each finding, verify the evidence supports the conclusion and the suggested fix is safe. Pay extra attention to findings that overlap across reviewers or across same-role instances from file splitting.

## Scope profile rules

- If `review_depth` is `high_risk_only` or `risk_expanded`, treat the review as reduced-depth and do not validate any summary that claims full-depth coverage.
- Preserve `coverage_expectation` in your decision summary or coverage notes when it limits confidence.
- Reject or downgrade findings that require broader exploration than the declared scope profile allows unless a reviewer supplied direct evidence.
- Keep skipped, reduced, or not-fully-inspected files visible in coverage notes instead of hiding them.

## Evidence pack rules

- Use `evidence_pack` only as metadata orientation for changed files, packets, hunk hints, and contract hints.
- Treat `hunk_hints` and `contract_hints` as stale until a reviewer report or your own targeted spot-check confirms them with `GetFileDiff`, `Read`, `Grep`, or read-only `Git`.
- Reject or downgrade findings that rely on the evidence pack alone.

## Cross-reviewer overlap handling

When multiple reviewers report findings about the same code location:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,19 @@ Never modify files or git state.
- If the strategy is `normal`, trace each changed input path from entry point to usage. Check trust boundaries, auth assumptions, and data sanitization. Report only issues with a realistic threat narrative.
- If the strategy is `deep`, in addition to the normal pass, trace data flows across trust boundaries end-to-end. Check for privilege escalation chains, indirect injection vectors, and failure modes that expose sensitive data. Report only issues with a complete threat narrative.

## Scope profile rules

- If the task prompt includes `review_depth` and `coverage_expectation`, follow them as the coverage contract.
- If `review_depth` is `high_risk_only`, treat this as reduced-depth: report only directly evidenced high-risk security issues and do not claim full security coverage.
- If `review_depth` is `risk_expanded`, inspect changed files plus at most the provided high-risk dependency context; record any confidence limits in the reviewer summary.
- Keep all assigned files visible in the reviewer summary or coverage notes if you could not inspect them fully.

## Evidence pack rules

- If the task prompt includes an `evidence_pack`, use it only as metadata orientation for changed files, packets, hunk hints, and contract hints.
- Treat `hunk_hints` and `contract_hints` as stale until you confirm them with `GetFileDiff`, `Read`, `Grep`, or read-only `Git`.
- Do not cite the evidence pack alone as proof for a security finding.

## Output format

Return markdown only, using this exact structure:
Expand Down
6 changes: 6 additions & 0 deletions src/crates/core/src/agentic/deep_review/budget.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
//! Deep Review reviewer budget, retry admission, and runtime accounting.
//!
//! This tracker is deliberately Deep Review-specific. It combines per-turn
//! reviewer/judge budgets, retry budgets, active reviewer counts, effective
//! concurrency learning, capacity diagnostics, and shared-context measurement.
//! Do not move it wholesale to `subagent_runtime`: only isolated mechanics with
//! no Deep Review policy, report, or diagnostic semantics should become generic.

use super::concurrency_policy::{
DeepReviewEffectiveConcurrencySnapshot, DeepReviewEffectiveConcurrencyState,
Expand Down
5 changes: 5 additions & 0 deletions src/crates/core/src/agentic/deep_review/concurrency_policy.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
//! Deep Review concurrency limits and effective capacity learning.
//!
//! The policy here is product-specific: it learns an effective reviewer cap for
//! Deep Review sessions and stores the Review Team capacity preferences. Shared
//! queue timing or future generic admission primitives belong in
//! `agentic::subagent_runtime` once they are proven independent of Deep Review.

use super::execution_policy::{
clamp_u64, clamp_usize, reviewer_agent_type_count, DeepReviewExecutionPolicy,
Expand Down
7 changes: 7 additions & 0 deletions src/crates/core/src/agentic/deep_review/diagnostics.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
//! Content-free Deep Review runtime diagnostics counters.
//!
//! These counters are safe to surface in reports and logs because they record
//! aggregate counts, durations, and reason labels only. They must not store
//! source text, diffs, reviewer output, provider raw bodies, or full file paths.

use serde::Serialize;
use std::collections::BTreeMap;
Expand Down Expand Up @@ -26,6 +30,7 @@ pub struct DeepReviewRuntimeDiagnostics {
pub shared_context_total_calls: usize,
pub shared_context_duplicate_calls: usize,
pub shared_context_duplicate_context_count: usize,
pub shared_context_duplicate_savings_candidate_count: usize,
}

impl DeepReviewRuntimeDiagnostics {
Expand All @@ -52,6 +57,7 @@ impl DeepReviewRuntimeDiagnostics {
&& self.shared_context_total_calls == 0
&& self.shared_context_duplicate_calls == 0
&& self.shared_context_duplicate_context_count == 0
&& self.shared_context_duplicate_savings_candidate_count == 0
}

pub(crate) fn observe_effective_parallel(&mut self, effective_parallel_instances: usize) {
Expand All @@ -73,5 +79,6 @@ impl DeepReviewRuntimeDiagnostics {
self.shared_context_total_calls = total_calls;
self.shared_context_duplicate_calls = duplicate_calls;
self.shared_context_duplicate_context_count = duplicate_context_count;
self.shared_context_duplicate_savings_candidate_count = duplicate_calls;
}
}
5 changes: 5 additions & 0 deletions src/crates/core/src/agentic/deep_review/execution_policy.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
//! Deep Review execution policy parsing and strategy helpers.
//!
//! This module translates launch strategy metadata into runtime guardrails such
//! as reviewer timeouts, file-splitting thresholds, same-role caps, and retry
//! limits. Strategy scoring remains advisory unless a separate product decision
//! approves backend-owned strategy selection.

use super::constants::{
CONDITIONAL_REVIEWER_AGENT_TYPES, CORE_REVIEWER_AGENT_TYPES, DEEP_REVIEW_AGENT_TYPE,
Expand Down
4 changes: 4 additions & 0 deletions src/crates/core/src/agentic/deep_review/incremental_cache.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
//! Per-session Deep Review packet cache model and serialization.
//!
//! This cache is scoped to a Deep Review session fingerprint. It is not a
//! project-level cache and does not define retention, invalidation, or deletion
//! policy across sessions.

use serde_json::{json, Value};
use std::collections::HashMap;
Expand Down
Loading
Loading