Skip to content

feat: add stall-recovery cold-start fallback for automated op recovery#330

Merged
l50 merged 4 commits into
mainfrom
fix/stall-detection-cold-start
May 17, 2026
Merged

feat: add stall-recovery cold-start fallback for automated op recovery#330
l50 merged 4 commits into
mainfrom
fix/stall-detection-cold-start

Conversation

@l50
Copy link
Copy Markdown
Contributor

@l50 l50 commented May 17, 2026

Key Changes:

  • Implemented a new cold-start fallback for automated stall recovery when no
    users or credentials have been discovered but DCs are known
  • Added deduplication logic for cold-start tasks across recovery attempts
  • Enhanced logging and dispatch tracking for stall recovery operations

Added:

  • Cold-start deduplication key and selection logic to build unique keys per
    domain and recovery attempt, ensuring idempotent dispatch of fallback tasks
  • select_stall_cold_start_work function to select eligible cold-start work
    items when the op is stalled with known DCs but no users/creds
  • New fallback branch in auto_stall_detection to submit AS-REP roast-based
    user enumeration when previous strategies yield no results, gated by the
    asrep_roast strategy allowlist
  • Unit tests for cold-start dedup key construction, work selection logic, and
    deduplication across attempts

Changed:

  • Enhanced the stall detection logic to track and log the number of fallback
    tasks dispatched per recovery attempt, improving observability
  • Updated deduplication constant lists and test coverage in state management
    modules to include the new stall_cold_start dedup set

l50 added 2 commits May 17, 2026 10:55
…enarios

**Added:**

- Introduced cold-start stall recovery branch that triggers user enumeration
  against known domain controllers when no users or credentials have been
  discovered but DCs are known, falling back to AS-REP roast via seclists and
  kerbrute if the technique is allowed
- Added `stall_cold_start_dedup_key` function to build deduplication keys for
  cold-start recovery attempts, with tests verifying key construction and
  lowercasing
- Implemented `select_stall_cold_start_work` to choose DCs for cold-start
  enumeration, respecting deduplication and domain domination, with comprehensive
  unit tests for edge cases
- Registered `DEDUP_STALL_COLD_START` in deduplication set constants and
  relevant deduplication tracking infrastructure

**Changed:**

- Modified stall detection logic to dispatch and log cold-start recovery actions,
  tracking the number of dispatched actions and improving logging granularity for
  fallback actions
- Updated tests and deduplication set assertions to include new cold-start
  deduplication set
…ection

**Changed:**

- Reformatted the call to `build_asrep_payload` to use a single-line style,
  improving code readability and consistency in `auto_stall_detection` function
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

❌ Patch coverage is 93.65352% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.96%. Comparing base (bedcd99) to head (df75a4a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...cli/src/orchestrator/automation/stall_detection.rs 93.64% 37 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
+ Coverage   78.85%   78.96%   +0.11%     
==========================================
  Files         438      438              
  Lines      125614   126123     +509     
==========================================
+ Hits        99050    99597     +547     
+ Misses      26564    26526      -38     
Files with missing lines Coverage Δ
ares-cli/src/orchestrator/state/inner.rs 92.83% <100.00%> (+0.01%) ⬆️
ares-cli/src/orchestrator/state/mod.rs 97.82% <ø> (ø)
...cli/src/orchestrator/automation/stall_detection.rs 91.81% <93.64%> (+20.72%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

l50 added 2 commits May 17, 2026 14:11
**Added:**

- Introduced `ActionKind` enum and `RecoveryAction` struct to represent and track
  stall recovery actions in a structured way
- Added `StallContext` struct to encapsulate relevant state for planning recovery
  actions
- Implemented `plan_stall_recovery` function for generating prioritized fallback
  actions based on current operation state and gating flags
- Added `StallTracker` struct to encapsulate progress/cooldown bookkeeping and
  provide testable logic for stall detection and cooldown handling
- Defined `StallRecoveryAdapter` trait to abstract dispatcher operations for
  recovery actions, supporting both production and test adapters
- Added `execute_recovery_actions` async function to execute a list of
  `RecoveryAction`s and handle deduplication marking and dispatch outcomes
- Implemented production adapter (`DispatcherStallAdapter`) for wiring
  stall detection to the real dispatcher
- Added extensive unit tests for the new modular functions and types, including
  fake adapters for testing async recovery dispatch logic

**Changed:**

- Refactored the main `auto_stall_detection` loop to use new modular recovery
  planning and execution logic, replacing inlined logic with calls to
  `StallTracker`, `plan_stall_recovery`, and `execute_recovery_actions`
- Updated documentation and comments to reflect new fallback actions and clarify
  cold-start logic
- Changed the fallback action order and gating: cold-start AS-REP enumeration
  now only fires when both users and credentials are absent
- Adjusted test coverage to assert on new modular behaviors, including dedup key
  logic, gating, and state transitions

**Removed:**

- Eliminated duplicated and inlined logic for stall detection, fallback planning,
  and recovery attempt tracking from the main detection loop in favor of modular,
  testable components
@l50 l50 merged commit e1ee8a6 into main May 17, 2026
12 checks passed
@l50 l50 deleted the fix/stall-detection-cold-start branch May 17, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant