Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SymLinksMayExistIndependentlyOfTarget failure #65119

Open
danmoseley opened this issue Feb 10, 2022 · 18 comments
Open

SymLinksMayExistIndependentlyOfTarget failure #65119

danmoseley opened this issue Feb 10, 2022 · 18 comments
Milestone

Comments

@danmoseley
Copy link
Member

danmoseley commented Feb 10, 2022

Frequency:

Day Run Notes
4/1-7/22
7/12 Official run Win10
7/6 PR #71263 Win10
6/3 PR #69885 Win10
let failedTests = (methodName : string, includePR : bool, messageSubstr: string, includePassedOnRerun : bool) {
cluster('engsrvprod.kusto.windows.net').database('engineeringdata').AzureDevOpsTests
    //| where TestName startswith "System.Net" 
    | where TestName contains methodName
    | where includePassedOnRerun or (Outcome == 'Failed')
    | extend startOfTestName = indexof_regex(TestName, @"[^.]+$")
    | extend Method = substring(TestName, startOfTestName)
    | extend Type = substring(TestName, 0, startOfTestName - 1)
    | project-away startOfTestName
    | where Method == methodName
    | where Message contains messageSubstr
    | distinct JobId, WorkItemId, Message, StackTrace, Method, Type, Arguments, Outcome
    | join kind=inner (cluster('engsrvprod.kusto.windows.net').database('engineeringdata').Jobs
        | where ((Branch == 'refs/heads/main') or (Branch == 'refs/heads/master') or (includePR and (Source startswith "pr/")))
        | where Type startswith "test/functional/cli/"
            and not(Properties contains "runtime-staging")
        | summarize arg_max(Finished, Properties, Type, Branch, Source, Started, QueueName) by JobId
        | project-rename JobType = Type) on JobId
    | extend PropertiesJson = parse_json(Properties)
    | extend OS = replace_regex(tostring(PropertiesJson.operatingSystem), @'\((.*)\).*|([^\(].*)', @'\1\2')
    | extend Runtime = iif(PropertiesJson.runtimeFlavor == "mono", "Mono", iif(PropertiesJson.DefinitionName contains "coreclr", "CoreCLR", ""))
    | extend TargetBranch = extractjson("$.['System.PullRequest.TargetBranch']", Properties)
    | extend Architecture = PropertiesJson.architecture
    | extend Scenario = iif(isempty(PropertiesJson.scenario), "--", PropertiesJson.scenario)
    //| extend DefinitionName = PropertiesJson.DefinitionName
    | project-away PropertiesJson
};
failedTests('SymLinksMayExistIndependentlyOfTarget', true, '', true);

Runfo query for last 60 days (incl. PRs) with less hits than Kusto (from unknown reason)

ystem.IO.Tests.File_Exists.SymLinksMayExistIndependentlyOfTarget [FAIL]
      linkPath should no longer exist
      Expected: False
      Actual:   True
      Stack Trace:
        /_/src/libraries/System.IO.FileSystem/tests/File/Exists.cs(138,0): at System.IO.Tests.File_Exists.SymLinksMayExistIndependentlyOfTarget()

'System.IO.FileSystem.Tests' from job 3683fd05-6451-4dba-848b-64f9d459eac3 workitem 46af4fdd-dcb6-4ead-9081-44d83202df31 (windows.10.amd64.open.rt) executed on machine a009Z5S

hit in #64677

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 10, 2022
@ghost
Copy link

ghost commented Feb 10, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details
ystem.IO.Tests.File_Exists.SymLinksMayExistIndependentlyOfTarget [FAIL]
      linkPath should no longer exist
      Expected: False
      Actual:   True
      Stack Trace:
        /_/src/libraries/System.IO.FileSystem/tests/File/Exists.cs(138,0): at System.IO.Tests.File_Exists.SymLinksMayExistIndependentlyOfTarget()

'System.IO.FileSystem.Tests' from job 3683fd05-6451-4dba-848b-64f9d459eac3 workitem 46af4fdd-dcb6-4ead-9081-44d83202df31 (windows.10.amd64.open.rt) executed on machine a009Z5S

hit in #64677

Author: danmoseley
Assignees: -
Labels:

area-System.IO

Milestone: -

@danmoseley
Copy link
Member Author

            // Now delete the symlink.
            File.Delete(linkPath);
            Assert.False(File.Exists(linkPath), "linkPath should no longer exist");

odd

@jozkee
Copy link
Member

jozkee commented Feb 10, 2022

I was thinking on name collision but that can't be the case for this test, this is how path and linkPath look like:

  path: C:\Users\david\AppData\Local\Temp\File_Exists_rvyyngfh.y1s\SymLinksMayExistIndependentlyOfTarget_121_9d65b352
  linkPath: C:\Users\david\AppData\Local\Temp\File_Exists_rvyyngfh.y1s\GetRandomLinkName_84_57004724.link

  path: C:\Users\david\AppData\Local\Temp\PathFile_Exists_vlbp50lx.sjy\SymLinksMayExistIndependentlyOfTarget_121_fe1a95b8
  linkPath: C:\Users\david\AppData\Local\Temp\PathFile_Exists_vlbp50lx.sjy\GetRandomLinkName_84_6cbea16a.link

Even though GetRandomLinkName_84 persist, we also use GetType() + Path.GetRandomFileName() and that should avoid the collision.

@jozkee jozkee added this to the Future milestone Feb 10, 2022
@jozkee jozkee added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Feb 10, 2022
@danmoseley
Copy link
Member Author

The way Windows handles file deletes has been a bit unintuitive to me in the past, and I believe changed at one point. @JeremyKuhne may have thoughts.

@danmoseley
Copy link
Member Author

Another possibility perhaps is that the delete failed for some reason and we ate the error somehow

@jozkee
Copy link
Member

jozkee commented Feb 10, 2022

FWIW the only error we eat on Delete is File Not Found.

bool r = Interop.Kernel32.DeleteFile(fullPath);
if (!r)
{
int errorCode = Marshal.GetLastWin32Error();
if (errorCode == Interop.Errors.ERROR_FILE_NOT_FOUND)
return;
else
throw Win32Marshal.GetExceptionForWin32Error(errorCode, fullPath);

@JeremyKuhne
Copy link
Member

The way Windows handles file deletes has been a bit unintuitive to me in the past, and I believe changed at one point

For NTFS the name would not be freed for use until all file handles were closed. It changed in 1903- we have tests that check this behavior:

if (PlatformDetection.IsWindows10Version1903OrGreater)
{
// On 1903 the filename is immediately released after delete is called
Assert.Throws<FileNotFoundException>(() => CreateFileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Delete | FileShare.ReadWrite));
}

@danmoseley
Copy link
Member Author

Maybe this is pre-1903 and it was a virus checker or something.

@jozkee
Copy link
Member

jozkee commented Feb 10, 2022

For NTFS the name would not be freed for use until all file handles were closed.

@JeremyKuhne does this imply that File.Exists(path) will return true if not all handles to that file are yet closed? Cause that's what is happening here, File.Exists() is returning true right after the file was deleted.

@danmoseley CI says windows.10.amd64.open.rt, matching that to the list in https://helix.dot.net/ (ctrl+f) shows:

Operating System
Windows Server 2016-Datacenter
Architecture
AMD64

I don't know how to check if that is pre-1903 cc @dotnet/area-infrastructure-libraries

@JeremyKuhne
Copy link
Member

will return true if not all handles to that file are yet closed?

I believe so. And 2016 is way before 1903 (2019 March).

@jozkee
Copy link
Member

jozkee commented Feb 10, 2022

@JeremyKuhne, thanks. Will close this issue as it is non actionable for us IMO.

@jozkee jozkee closed this as completed Feb 10, 2022
@danmoseley
Copy link
Member Author

@MattGal does anything like a virus checker run on these Server 2016 machines? Something that could grab random handles and potentially defeat deleting things.

@MattGal
Copy link
Member

MattGal commented Feb 11, 2022

@MattGal does anything like a virus checker run on these Server 2016 machines? Something that could grab random handles and potentially defeat deleting things.

Any SKU that's effectively Windows 8 or newer has anti-virus stuff that you can't defeat even if it's a throwaway machine only used for testing. More recently, we've started allowing (because we absolutely must and it's unfortunately not up for debate... would that it were) Azure Security Pack and Geneva Monitoring onto the machines. These are services that can try to reconfigure antivirus as well as perform their own scans on the machine. When you combine this with enough executions, the behavior you're suspecting is pretty darn likely to be the case. See https://github.com/dotnet/core-eng/issues/15161 for more details.

One thing to note though, is that people who would run this test locally are also likely to have some form of antivirus turned on, and hardening the tests to both understand and tolerate this is likely the only long term solution that keeps us off automatic-scanning radar.

@danmoseley
Copy link
Member Author

Thanks that makes it clear. OK if this happens again I'll put some retry around it. Right now there's zero evidence that this line is more likely to need a retry than any of the other places.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 13, 2022
@jtschuster
Copy link
Member

Hit again in rolling build 1873896 on net7.0-windows-Release-x86-CoreCLR_checked-Windows.10.Amd64.Open
Log

@jtschuster jtschuster reopened this Jul 12, 2022
@jtschuster jtschuster added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Jul 12, 2022
@danmoseley
Copy link
Member Author

@jozkee what do you recommend here? this is blocking-clean-ci so we either need to fix, workaround or disable.

@jozkee
Copy link
Member

jozkee commented Jul 21, 2022

A retry should be the best. This kind of failure is not exclusive to this unit test. The same happens on this other one that exercises the same assert:

https://dataexplorer.azure.com/clusters/engsrvprod/databases/engineeringdata?query=H4sIAAAAAAAAA22PsU7DYAyE9zzFqUth6guQraoUBAtUYv6JT4lJaqPfLgWJhydqJMTAZN3d9w0+MvKJcZ4z0HzjzdUwqUmrZqx48Tp1yVPA7Td08h96769XarmdLPtlZCUOahojBXctzC83t81uV0zwyIgyEL1bFrXAhh+0hKgsXML7/lxRAvx8Z5+UTbNqObr8sfacmWrD89fpQW3aO8PyWvJY6sBcvfVFtC22h6Lz9gfcOQWS9wAAAA==

JobId WorkItemId EventId JobName WorkItemName WorkItemFriendlyName Type Method ArgumentHash Arguments Result Duration Exception Message StackTrace Traits Reason Attempt AzurePipelinesTestRunId AzurePipelinesTestResultId JobId1 WorkItemId1 JobName1 Name FriendlyName Status PassCount FailCount SkipCount WarnCount Queued Started Finished ExitCode ConsoleUri MachineName WorkItemType Uri PassOnRetryCount Attempt1 QueueName LogUploadSeconds LogSizeBytes JobId2 Name1 Source Type1 Build Creator System IsExternal PullRequestId Queued1 Started1 Finished1 TestsPass TestsFail TestsSkip ItemsPass ItemsWarning ItemsBadExit ItemsFail ItemsError ItemsTimeout InitialItems TotalItems JobList Properties QueueName1 ItemsNotRun TestsPassedOnRetry ItemsPassedOnRetry Attempt2 ItemsDeadLetter QueueAlias DockerTag TeamProject Repository Branch
19906508 974759625 de30fe19-f93c-4d00-9023-25a9e2979c01 0070981a-3424-4773-b1c7-2675326f385c System.IO.FileSystem.Tests System.IO.Tests.DirectoryInfo_Delete DeletingSymLinkDoesntDeleteTarget Fail 0.077 linkPath should no longer exist Expected: False Actual: True at System.IO.Tests.Directory_Delete_str.DeletingSymLinkDoesntDeleteTarget() in //src/libraries/System.IO.FileSystem/tests/Directory/Delete.cs:line 121 at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor) at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr) in //src/libraries/System.Private.CoreLib/src/System/Reflection/MethodInvoker.cs:line 81 48061842 171875 19906508 974759625 de30fe19-f93c-4d00-9023-25a9e2979c01 0070981a-3424-4773-b1c7-2675326f385c System.IO.FileSystem.Tests Pass 0 0 0 0 2022-06-02T23:24:30.597Z 2022-06-02T23:41:31.987Z 2022-06-02T23:44:00.819Z 0 https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-70180-merge-de30fe19f93c4d0090/System.IO.FileSystem.Tests/1/console.77514c80.log?helixlogtype=result a0005EE other 0 1 windows.10.amd64.open.rt 0 0 19906508 de30fe19-f93c-4d00-9023-25a9e2979c01 pr/public/dotnet/runtime/refs/pull/70180/merge test/functional/cli/innerloop/ 27 true 2022-06-02T23:24:30.488Z 2022-06-02T23:24:30.519Z 2022-06-02T23:49:23.744Z 96092 1 1259 278 0 0 0 0 0 277 278 https://helixde107v0xdeko0k025g8.blob.core.windows.net/helix-job-2efdb516-920f-4186-8036-0b6aeb199ff777aff45fee040748a/job-list-ed8ed7fa-1ef2-483f-9d39-55ac6b530110.json?helixlogtype=result {"operatingSystem":"Windows.10.Amd64.Open","System.JobId":"215a953c-1e4b-5cc0-95e2-5484aabbe659","configuration":"Debug","runtimeFlavor":"coreclr","DefinitionId":"686","System.JobName":"__default","architecture":"x86","DefinitionName":"runtime","System.StageAttempt":"1","BuildNumber":"20220602.105","System.PullRequest.TargetBranch":"main","Reason":"PullRequest","BuildId":"1803799","System.PhaseAttempt":"1","AzurePipelinesTestRunId":"48061842","Project":"public","System.JobAttempt":"1","CollectionUri":"https://dev.azure.com/dnceng/","System.PhaseName":"libraries_test_run_checked_coreclr_windows_x86_Debug","System.StageName":"__default"} windows.10.amd64.open.rt 0 0 0 0 windows.10.amd64.open public dotnet/runtime refs/pull/70180/merge

@jozkee jozkee removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jul 21, 2022
@karelz
Copy link
Member

karelz commented Jul 22, 2022

Only 1 hit in last 60 days (as of 7/22).
Only 3 hits since 4/1 in Kusto -- see frequency in top post.
Removing blocking-clean-ci label.

@karelz karelz removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Jul 22, 2022
@dotnet dotnet unlocked this conversation Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants