Skip to content

Fix flaky MountVolume tests: replace fixed delays with polling loops#125914

Open
Copilot wants to merge 3 commits intomainfrom
copilot/fix-mountvolume-test-flakiness
Open

Fix flaky MountVolume tests: replace fixed delays with polling loops#125914
Copilot wants to merge 3 commits intomainfrom
copilot/fix-mountvolume-test-flakiness

Conversation

Copy link
Contributor

Copilot AI commented Mar 22, 2026

Description

Directory_Delete_MountVolume and Directory_ReparsePoints_MountVolume are flaky on loaded CI machines because fixed-duration waits are insufficient for NTFS mount point operations to propagate.

Delete_MountVolume.cs

7 locations used Task.Delay(300).Wait() before asserting !Directory.Exists() after deleting through a mount point. 300 ms is not enough under load.

  • Added WaitForDirectoryGone(string path): polls Directory.Exists every 100 ms for up to 10 s using Environment.TickCount64 for accurate elapsed tracking
  • Replaced all 7 fixed delays with WaitForDirectoryGone(<path>)
  • Replaced Task.Delay(300).Wait() in the DeleteDir retry loop with Thread.Sleep(300); removed unused System.Threading.Tasks import
// Before
Directory.Delete(dirNameReferredFromMountedDrive, true);
Task.Delay(300).Wait();
Eval(!Directory.Exists(dirName), "Err_20387g! ...");

// After
Directory.Delete(dirNameReferredFromMountedDrive, true);
WaitForDirectoryGone(dirName);
Eval(!Directory.Exists(dirName), "Err_20387g! ...");

ReparsePoints_MountVolume.cs

DeleteDir (called in finally blocks after MountHelper.Unmount) had no retry logic, so transient IOException during volume teardown would fail cleanup silently or throw.

  • Replaced the single-shot Directory.Delete with a retry loop: catches IOException specifically (which can occur transiently when the volume is still being unmounted — the reparse point directory may be momentarily locked by the kernel while the mount is being torn down), retries up to 10× with 200 ms back-off
  • Added explanatory comment on the catch (IOException) block documenting the observed transient failure mode
  • Added using System.Threading

Changes

  • Directory/Delete_MountVolume.cs — polling helper + 7 delay replacements
  • Directory/ReparsePoints_MountVolume.cs — robust DeleteDir with IOException retry and explanatory comment

Testing

Tests are Windows-only ([PlatformSpecific(TestPlatforms.Windows)], requires NTFS) and require an elevated environment with mount point access. Validation requires observing reduced flakiness on CI.

Original prompt

This section details on the original issue you should resolve

<issue_title>MountVolume test flakiness: Delete_MountVolume and ReparsePoints_MountVolume</issue_title>
<issue_description>> [!NOTE]

This issue was created with Copilot assistance.

Tracking flaky MountVolume tests that are hitting blocking-clean-ci:

Root cause (Delete_MountVolume): 7 locations use a fixed Task.Delay(300) after Directory.Delete through NTFS mount points, then assert !Directory.Exists(). 300ms is insufficient on loaded CI machines.

Root cause (ReparsePoints_MountVolume): Needs investigation — no Task.Delay in the test; may be mount setup/teardown timing.

Proposed fix: Replace fixed delays with a polling loop (retry Directory.Exists with backoff, ~10s timeout).

Known Issue Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

</issue_description>

<agent_instructions>create a pr with the recommended test-only fix. the PR should be set to close the open bugs when merged</agent_instructions>

Comments on the Issue (you are @copilot in this section)


📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

… loops

- Delete_MountVolume.cs: Replace 7 fixed Task.Delay(300).Wait() calls with
  WaitForDirectoryGone() polling helper (10s timeout, 100ms intervals using
  Environment.TickCount64) before Directory.Exists assertions. Also replace
  Task.Delay in DeleteDir retry loop with Thread.Sleep.
- ReparsePoints_MountVolume.cs: Add retry logic to DeleteDir with IOException
  handling and Thread.Sleep backoff to handle mount teardown timing.

Fixes #125295, #125624

Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/90f5ab9b-bd66-4286-9b2c-72778993ada3
Copilot AI changed the title [WIP] Fix MountVolume test flakiness in Delete_MountVolume and ReparsePoints_MountVolume Fix flaky MountVolume tests: replace fixed delays with polling loops Mar 22, 2026
Copilot AI requested a review from danmoseley March 22, 2026 04:46
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Directory.Delete(fileName, sub);
return;
}
catch (IOException) when (attempts > 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment that we have seen this IOException occasionlly (and if possible why it might be)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 7348e64. The comment explains that IOException can occur transiently when the volume is still being unmounted — the reparse point directory can be momentarily locked by the kernel while the mount is being torn down.

@danmoseley
Copy link
Member

@copilot address feedback

Copilot AI requested a review from danmoseley March 22, 2026 05:34
@danmoseley danmoseley marked this pull request as ready for review March 22, 2026 05:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces flakiness in Windows-only NTFS MountVolume filesystem tests by replacing fixed sleeps with polling/retry logic around directory deletion and cleanup.

Changes:

  • Delete_MountVolume.cs: replaces multiple fixed 300ms delays with a WaitForDirectoryGone polling helper (up to 10s), and switches a retry-loop delay to Thread.Sleep.
  • ReparsePoints_MountVolume.cs: hardens cleanup by retrying Directory.Delete on transient IOException during unmount teardown.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/Directory/Delete_MountVolume.cs Introduces a polling helper and removes fixed-delay assumptions after deletes via mount points.
src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/Directory/ReparsePoints_MountVolume.cs Adds targeted retry logic for transient unmount-related IOException during directory cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MountVolume test flakiness: Delete_MountVolume and ReparsePoints_MountVolume

3 participants