Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildfarm-worker on Windows Server 2022 fails to clean up operation files #1744

Closed
mikalailapko opened this issue May 20, 2024 · 3 comments
Closed

Comments

@mikalailapko
Copy link

buildfarm-worker with a simple config (below) throws Java errors failing to delete temporary operations files. In my example, I had a simple helloworld c++ project that queues 16 operations on remote build. The first two run fine and clean up fine, the ones after run fine but the worker is unable to delete temporary files:

[SEVERE ] build.buildfarm.worker.ReportResultStage after - error destroying exec dir \tmp\worker\shard\operations\75362898-d9d2-4854-a91d-cc3b9cde09c2
java.nio.file.AccessDeniedException: \tmp\worker\shard\operations\75362898-d9d2-4854-a91d-cc3b9cde09c2\external\clang_toolchain-win64~~clang_toolchain_win64_files_ext~clang_toolchain-win64_files\win64\bin\clang++.exe
        at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89)
        at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
        at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108)
        at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:275)
        at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
        at java.base/java.nio.file.Files.delete(Files.java:1152)
        at build.buildfarm.common.io.Directories$1.visitFile(Directories.java:126)
        at build.buildfarm.common.io.Directories$1.visitFile(Directories.java:114)
        at java.base/java.nio.file.Files.walkFileTree(Files.java:2811)
        at java.base/java.nio.file.Files.walkFileTree(Files.java:2882)
        at build.buildfarm.common.io.Directories.remove(Directories.java:112)
        at build.buildfarm.worker.shard.CFCExecFileSystem.destroyExecDir(CFCExecFileSystem.java:531)
        at build.buildfarm.worker.shard.ShardWorkerContext.destroyExecDir(ShardWorkerContext.java:803)
        at build.buildfarm.worker.ReportResultStage.after(ReportResultStage.java:203)
        at build.buildfarm.worker.PipelineStage.iterate(PipelineStage.java:160)
        at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:51)
        at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:64)
        at java.base/java.lang.Thread.run(Thread.java:833)

As pointed out by @werkt, this might have something to do with all links for one inode seem to not be deletable when any process has it open (for execute in this case).
Worker config just in case:

backplane:
  redisUri: "reachableredis"
  queues:
    - name: "linux_x86_64"
      allowUnmatched: false
      properties:
        - name: "platform"
          value: "linux_x86_64"
    - name: "windows_x86_64"
      allowUnmatched: false
      properties:
        - name: "platform"
          value: "windows_x86_64"
    - name: "mac_arm64"
      allowUnmatched: false
      properties:
        - name: "platform"
          value: "mac_arm64"
server:
  publicName: "grpcs://reachablebuildfarmserver"
  port: 443
worker:
  linkInputDirectories: false
  execOwner: "Administrator"
  publicName: "reachableworker"
  port: 8981
  dequeueMatchSettings:
    allowUnmatched: false
    properties:
      - name: "platform"
        value: "windows_x86_64"

linkInputDirectories suggested by @werkt but didn't help. Pretty much the same config for Linux workers works fine for the same project.

And a procmon logfile that shows windows file operations filtered by clang++.exe from start (before queuing build) to finish (when the server windows task queue reaches 0). No other process is accessing the files, and the worker is run under Administrator.
Logfile.CSV

@mikalailapko mikalailapko changed the title buildfarm-worker on Windows Server 2022 fails to ckean up operation files buildfarm-worker on Windows Server 2022 fails to clean up operation files May 20, 2024
@werkt
Copy link
Collaborator

werkt commented May 20, 2024

Please give the branch https://github.com/werkt/bazel-buildfarm/tree/copy-exec-fs a try, with the following config:

worker:
  linkExecFileSystem: false

If this works, I'll flesh it into a mergable PR and get it available, with recommendations for Windows.

@mikalailapko
Copy link
Author

Thanks a lot @werkt, this option did help! I'm not seeing any file deletion errors with it indeed.

@werkt
Copy link
Collaborator

werkt commented Jun 15, 2024

Closing this, as we're leaving linkExecFileSystem: false as the mitigation (and should probably be default) on windows.

@werkt werkt closed this as completed Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants