Skip to content

fix: Windows worktree creation falls back gracefully instead of using temp dirs#440

Open
PureWeen wants to merge 1 commit intomainfrom
fix/windows-worktree-fallback
Open

fix: Windows worktree creation falls back gracefully instead of using temp dirs#440
PureWeen wants to merge 1 commit intomainfrom
fix/windows-worktree-fallback

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Problem

On Windows, "Implement & Challenge" multi-agent groups end up with sessions pointing to temp directories instead of proper git worktrees. This works fine on Mac.

Root Cause

CreateGroupFromPresetAsync has try/catch blocks around worktree creation that silently swallow exceptions (only Debug log). When worktree creation fails on Windows (long paths, file locking), orchWorkDir stays null, CreateSessionAsync gets null workingDirectory, and creates a temp dir.

Additionally, GroupShared strategy was falling into the wrong code path, creating -orchestrator- branches instead of -shared- branches.

Fix

  1. RepoManager.cs: Set core.longpaths on new worktrees for Windows
  2. CopilotService.Organization.cs:
    • Handle GroupShared strategy explicitly with -shared- branch naming
    • Fall back to existing worktree (TryGetExistingWorktreePath) when creation fails
    • Better error logging with exception type
  3. WorktreeStrategyTests.cs: 4 new regression tests for fallback behavior

Testing

  • All 38 WorktreeStrategy tests pass (including 4 new ones)
  • Tests cover: GroupShared failure fallback, fallback with existing workingDirectory, FullyIsolated failure fallback, and shared branch naming

…tly using temp dirs

- Set core.longpaths on new worktrees for Windows (RepoManager)
- Handle GroupShared strategy explicitly with -shared- branch naming
- Fall back to existing worktree when creation fails instead of temp dir
- Add 4 regression tests for worktree failure fallback scenarios

Fixes: sessions in Implement & Challenge groups on Windows end up with
WorkingDirectory pointing to %TEMP%\polypilot-sessions\{guid} instead
of proper git worktrees.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen force-pushed the fix/windows-worktree-fallback branch from 931d56b to 68aa1e9 Compare March 26, 2026 20:47
@PureWeen
Copy link
Copy Markdown
Owner Author

PR #440 Review — R1

Tests: ✅ 2970/2970 passed
Verdict: ⚠️ Request Changes

The core approach is correct — GroupShared now properly creates -shared- branches and falls back gracefully instead of using temp dirs. However, there are two substantive issues and a few improvements worth making before merge.


🔴 Major: TryGetExistingWorktreePath picks an arbitrary worktree

FirstOrDefault(w => w.RepoId == repoId) returns the first worktree in insertion order with no preference logic. If a user has multiple worktrees for the same repo (e.g., one from a previous group, one from manual worktree creation), this could return a worktree that is:

  • Already being used by another active group (shared mutation risk)
  • On a stale or dirty branch inappropriate for new work
  • At a path that no longer exists on disk (the path is stored in state but WorktreeInfo.Path isn't validated)
private string? TryGetExistingWorktreePath(string repoId, ref string? worktreeId, SessionGroup group)
{
    var existing = _repoManager.Worktrees.FirstOrDefault(w => w.RepoId == repoId); // arbitrary

Recommended fix: At minimum, validate that existing.Path exists on disk before returning it. Consider adding a comment that this is intentionally read-only (no CreatedWorktreeIds.Add), and/or prefer worktrees not already claimed by an active group.


🔴 Major: Corrupted bare repo deletion failure leads to fetch on corrupted data

In EnsureRepoCloneInCurrentRootAsync, the corrupted repo is deleted with a silent catch { }. On Windows, file locks (antivirus, git index readers) can cause this delete to fail silently. The code then falls through to Directory.Exists(targetBarePath) — which is still true — and calls git fetch on the corrupted directory, which will fail with a confusing git error rather than a diagnostic about the lock:

if (Directory.Exists(targetBarePath) && !IsValidBareRepository(targetBarePath))
{
    Console.WriteLine($"[RepoManager] Bare clone corrupted (missing HEAD/refs), removing: {targetBarePath}");
    try { Directory.Delete(targetBarePath, recursive: true); } catch { }  // ← silent failure
}

if (Directory.Exists(targetBarePath))
{
    // ← still reaches here with corrupt data on Windows file-lock failure
    onProgress?.Invoke($"Fetching {repo.Id}…");

Recommended fix: After the delete, check !IsValidBareRepository(targetBarePath) again and throw with a clear message if the directory is still corrupted/locked:

catch (Exception ex)
{
    Console.WriteLine($"[RepoManager] Cannot remove corrupted bare clone at {targetBarePath}: {ex.Message}");
    throw new InvalidOperationException(
        $"Bare clone at {targetBarePath} is corrupted and locked. Close any tools accessing it and retry.", ex);
}

🟡 Minor: core.longpaths gap for existing bare clones

The core.longpaths fix is applied:

  • ✅ To new bare clones (after git clone --bare)
  • ✅ To worktrees (after git worktree add, line ~763)
  • Missing for existing bare clones that are reused (the Directory.Exists(targetBarePath) branch just fetches)

Since git worktree add uses the bare clone's config, users with a pre-existing bare clone will still get long-path failures on worktree add. The worktree-level config set after add helps subsequent operations but not the add itself.

Recommended fix: Apply core.longpaths unconditionally when the bare clone is reused, not only at clone time:

if (Directory.Exists(targetBarePath))
{
    onProgress?.Invoke($"Fetching {repo.Id}…");
    await RunGitWithProgressAsync(...);
    if (OperatingSystem.IsWindows())
        try { await RunGitAsync(targetBarePath, ct, "config", "core.longpaths", "true"); } catch { }
}

🟡 Minor: GroupShared success path silently overrides provided workingDirectory

The block condition is string.IsNullOrEmpty(worktreeId) — it doesn't check workingDirectory. So if a caller passes workingDirectory="/some/path" + no worktreeId, worktree creation succeeds, and orchWorkDir is overwritten with the new worktree path, silently discarding the caller's directory. The failure path correctly preserves workingDirectory (due to if (orchWorkDir == null) guard). This asymmetry is intentional per the comment, but there's no test for the success-with-workingDirectory case, and the behavior could surprise callers.

Recommended fix: Add a test for the workingDirectory + creation-succeeds path and clarify in the comment that GroupShared always creates its own branch regardless of workingDirectory.


🟡 Minor: Worker worktree failure has no explicit fallback in logs

When a worker's CreateWorktreeAsync throws (the FullyIsolated worker loop), the catch only logs "Failed to create worker-N worktree" without indicating the actual fallback path. Workers implicitly fall back to orchWorkDir at line ~3410, which is correct, but the log makes it look like a complete failure:

catch (Exception ex)
{
    Debug($"[WorktreeStrategy] Failed to create worker-{i + 1} worktree: {ex.GetType().Name}: {ex.Message}");
    // no mention of orchWorkDir fallback
}

Recommended: Add Debug($"[WorktreeStrategy] Worker-{i+1} will fall back to orchestrator dir: {orchWorkDir}") so diagnostics are clear.


🟡 Minor: IsValidBareRepository is a heuristic, not a full validity check

Checking HEAD + refs/ misses objects/ — a repo missing its object store would pass validation but fail on any git operation. Also, a partially-cloned repo with HEAD + refs/ but no objects/ or config would be falsely accepted.

Recommended: Add Directory.Exists(Path.Combine(barePath, "objects")) to the check, and/or add a comment that this is intentionally a lightweight heuristic for the specific corruption pattern (only objects/ survived).


🟡 Minor: Reflection in test stubs is fragile and triplicated

FailingRepoManager, FakeRepoManager, and FailingRepoManagerWithExistingWorktree all set private _state/_loaded via reflection. If these fields are renamed, the failure is at runtime (NullReferenceException) not compile time.

Recommended: Add an internal constructor or internal static RepoManager CreateForTesting(RepositoryState state) factory with [assembly: InternalsVisibleTo("PolyPilot.Tests")] to eliminate the reflection pattern. Minor improvement but worth doing given how many tests depend on it.


Nit: Missing blank line in RepoManagerTests.cs

    private static void CreateFakeBareRepoSkeleton(string bareDir)
    {
        ...
    }
    [Theory]   // ← needs a blank line before [Theory]

The GroupShared strategy fix and -shared- branch naming correction are exactly right. The main blockers are the corrupted-delete silent failure path and the arbitrary-worktree selection in the fallback. The core.longpaths gap for existing users also seems worth addressing since that's the primary Windows use case.

@PureWeen
Copy link
Copy Markdown
Owner Author

🔄 Re-Review Round 2

No new commits since R1 (still 1 commit, last updated 2026-03-26). The R1 findings remain open.

R1 Findings Status

Finding Status
🔴 C1 — TryGetExistingWorktreePath returns arbitrary worktree (no path validation, no preference logic) ⏳ Still present
🔴 C2 — Silent delete failure on locked corrupted bare clone → fetch on corrupt data ⏳ Still present
🟡 M1 — core.longpaths not applied to existing bare clones on reuse ⏳ Still present
🟡 M2 — GroupShared silently overrides workingDirectory on creation success (no test) ⏳ Still present
🟡 M3 — Worker fallback-to-orchWorkDir not logged ⏳ Still present
🟡 M4 — IsValidBareRepository missing objects/ check ⏳ Still present
🟡 M5 — Reflection in test stubs is fragile and triplicated ⏳ Still present

⚠️ Still requesting changes

When the two 🔴 issues (C1 + C2) are addressed, happy to re-review. The core feature (GroupShared with -shared- branch naming + graceful fallback) is solid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant