Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.0rc1: Remote cache read failure "java.lang.RuntimeException: java.io.IOException: file.tmp (Not a directory)" #14228

Closed
brentleyjones opened this issue Nov 4, 2021 · 8 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@brentleyjones
Copy link
Contributor

Description of the problem / feature request:

We encountered an error where bazel fails to read from the remote cache because it throws an execution locally. Seems to be from this code path: cff2ea5#diff-c0c72bba110eca25447686392b1f265feabe77c8a53ed7387f70f82bcdaba441R112.

WARNING: Reading from Remote Cache:
--
  | com.google.devtools.build.lib.remote.common.BulkTransferException: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
  | at com.google.devtools.build.lib.remote.RemoteCache.waitForBulkTransfer(RemoteCache.java:195)
  | at com.google.devtools.build.lib.remote.RemoteExecutionService.downloadOutputs(RemoteExecutionService.java:971)
  | at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:124)
  | at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:141)
  | at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:108)
  | at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
  | at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:68)
  | at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:328)
  | at com.google.devtools.build.lib.actions.Action.execute(Action.java:134)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:909)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1078)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1033)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:496)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
  | at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
  | at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
  | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  | at java.base/java.lang.Thread.run(Unknown Source)
  | Suppressed: java.io.IOException: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
  | at com.google.devtools.build.lib.remote.GrpcCacheClient.lambda$downloadBlob$13(GrpcCacheClient.java:320)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:192)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:179)
  | at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:124)
  | at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
  | at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
  | at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
  | at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:814)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:203)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:179)
  | at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:133)
  | at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
  | at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
  | at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
  | at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:814)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:203)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:179)
  | at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:133)
  | at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
  | at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
  | at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
  | at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
  | at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:53)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onError(GrpcCacheClient.java:376)
  | at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
  | at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
  | at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
  | at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
  | at com.google.devtools.build.lib.remote.NetworkTimeInterceptor$NetworkTimeCall$1.onClose(NetworkTimeInterceptor.java:81)
  | at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
  | at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
  | at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
  | at com.google.devtools.build.lib.remote.ReferenceCountedChannel$ConnectionCleanupCall$1.onClose(ReferenceCountedChannel.java:90)
  | at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
  | at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
  | at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
  | at com.google.devtools.build.lib.remote.logging.LoggingInterceptor$LoggingForwardingCall$1.onClose(LoggingInterceptor.java:157)
  | at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:557)
  | at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:69)
  | at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:738)
  | at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:717)
  | at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
  | at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
  | ... 3 more
  | Suppressed: java.io.IOException: /private/var/tmp/_bazel_iosci/7c81b0540945e51fddc7bd01ee8d0abb/execroot/lyftios/bazel-out/darwin-fastbuild/bin/tools/CodeAnalysis/Tests/LinuxMain.swift.tmp (Not a directory)
  | at com.google.devtools.build.lib.unix.NativePosixFiles.openWrite(Native Method)
  | at com.google.devtools.build.lib.unix.UnixFileSystem.createFileOutputStream(UnixFileSystem.java:493)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:174)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:188)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:425)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:413)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.ensureOpen(LazyFileOutputStream.java:66)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.close(LazyFileOutputStream.java:60)
  | at com.google.devtools.build.lib.remote.RemoteCache$ReportingOutputStream.close(RemoteCache.java:527)
  | at com.google.devtools.build.lib.remote.RemoteCache$3.onFailure(RemoteCache.java:400)
  | at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1066)
  | at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
  | at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
  | at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
  | at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:814)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:203)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:179)
  | at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:133)
  | ... 42 more
  | Caused by: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
  | at io.grpc.Status.asRuntimeException(Status.java:535)
  | ... 22 more
  | Caused by: java.lang.RuntimeException: java.io.IOException: /private/var/tmp/_bazel_iosci/7c81b0540945e51fddc7bd01ee8d0abb/execroot/lyftios/bazel-out/darwin-fastbuild/bin/tools/CodeAnalysis/Tests/LinuxMain.swift.tmp (Not a directory)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:356)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:347)
  | at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:465)
  | at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
  | at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
  | at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
  | at com.google.devtools.build.lib.remote.logging.LoggingInterceptor$LoggingForwardingCall$1.onMessage(LoggingInterceptor.java:138)
  | at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:656)
  | at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:641)
  | ... 5 more
  | Caused by: java.io.IOException: /private/var/tmp/_bazel_iosci/7c81b0540945e51fddc7bd01ee8d0abb/execroot/lyftios/bazel-out/darwin-fastbuild/bin/tools/CodeAnalysis/Tests/LinuxMain.swift.tmp (Not a directory)
  | at com.google.devtools.build.lib.unix.NativePosixFiles.openWrite(Native Method)
  | at com.google.devtools.build.lib.unix.UnixFileSystem.createFileOutputStream(UnixFileSystem.java:493)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:174)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:188)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:425)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:413)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.ensureOpen(LazyFileOutputStream.java:66)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.write(LazyFileOutputStream.java:42)
  | at com.google.devtools.build.lib.remote.RemoteCache$ReportingOutputStream.write(RemoteCache.java:510)
  | at com.google.devtools.build.lib.remote.util.DigestOutputStream.write(DigestOutputStream.java:58)
  | at java.base/java.io.FilterOutputStream.write(Unknown Source)
  | at com.google.protobuf.ByteString$LiteralByteString.writeTo(ByteString.java:1381)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:352)
  | ... 13 more

LinuxMain.swift is single file output from a genrule.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I don't know of one.

What operating system are you running Bazel on?

macOS 11.6.1

What's the output of bazel info release?

release 5.0.0rc1

@Wyverald Wyverald added this to the Bazel 5.0 Release Blockers milestone Nov 4, 2021
@coeuvre
Copy link
Member

coeuvre commented Nov 5, 2021

According to the stack trace:

 java.io.IOException: /private/var/tmp/_bazel_iosci/7c81b0540945e51fddc7bd01ee8d0abb/execroot/lyftios/bazel-out/darwin-fastbuild/bin/tools/CodeAnalysis/Tests/LinuxMain.swift.tmp (Not a directory)
  | at com.google.devtools.build.lib.unix.NativePosixFiles.openWrite(Native Method)
  | at com.google.devtools.build.lib.unix.UnixFileSystem.createFileOutputStream(UnixFileSystem.java:493)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:174)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:188)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:425)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:413)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.ensureOpen(LazyFileOutputStream.java:66)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.write(LazyFileOutputStream.java:42)
  | at com.google.devtools.build.lib.remote.RemoteCache$ReportingOutputStream.write(RemoteCache.java:510)
  | at com.google.devtools.build.lib.remote.util.DigestOutputStream.write(DigestOutputStream.java:58)
  | at java.base/java.io.FilterOutputStream.write(Unknown Source)
  | at com.google.protobuf.ByteString$LiteralByteString.writeTo(ByteString.java:1381)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:352)
  | ... 13 more

it calls into NativePosixFiles.openWrite() which calls open which throws "Not a directory" or ENOTDIR.

According to the doc, ENOTDIR means "A component of the path prefix is not a directory". So I am guessing the issue is the output directory is not prepared probably for the action.

@coeuvre coeuvre added team-Local-Exec Issues and PRs for the Execution (Local) team type: bug untriaged labels Nov 5, 2021
@coeuvre
Copy link
Member

coeuvre commented Nov 5, 2021

Might related #6393.

@coeuvre coeuvre added team-Remote-Exec Issues and PRs for the Execution (Remote) team P1 I'll work on this now. (Assignee required) and removed team-Local-Exec Issues and PRs for the Execution (Local) team untriaged labels Nov 5, 2021
@coeuvre coeuvre self-assigned this Nov 5, 2021
@coeuvre
Copy link
Member

coeuvre commented Nov 5, 2021

@brentleyjones Is that possible to create a repro?

@coeuvre
Copy link
Member

coeuvre commented Nov 5, 2021

It's weird, the parent directories are actually created before the download.

@brentleyjones
Copy link
Contributor Author

It was on CI. All I can think of is the directories failed to be created, because of low disk space.

@coeuvre coeuvre added more data needed P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Nov 8, 2021
@coeuvre coeuvre removed this from the Bazel 5.0 Release Blockers milestone Nov 8, 2021
@keith
Copy link
Member

keith commented Apr 19, 2022

We're seeing what I think is the same issue with bazel 5.1.1:

[1,147 / 1,149] Compiling test/extensions/network/dns_resolver/apple/apple_dns_impl_test.cc; 1s remote-cache
ERROR: /Users/runner/work/1/s/test/extensions/network/dns_resolver/apple/BUILD:11:14: Compiling test/extensions/network/dns_resolver/apple/apple_dns_impl_test.cc failed: Exec failed due to IOException: 41 errors during bulk transfer:
java.io.IOException: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
java.io.IOException: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.

Unfortunately I think we're hiding the full stack trace https://dev.azure.com/cncf/envoy/_build/results?buildId=106443&view=logs&jobId=fa3d3e18-6969-5713-c3e7-3581195704fd&j=fa3d3e18-6969-5713-c3e7-3581195704fd&t=ba370bef-ca49-52f5-0c93-7c0c0f27c465 (this log will probably rotate)

@mrmeku
Copy link

mrmeku commented Aug 4, 2022

Hit this bug today when trying to use

build --strategy=GoLink=linux-sandbox,remote
build --strategy=GoCompile=linux-sandbox,remote

so that I could do GoLinking locally while doing the rest of my build on RBE.

Not sure if that information is helpful

@coeuvre
Copy link
Member

coeuvre commented Oct 20, 2023

A lot of things have changed since 5. Please reopen if it's still happening.

@coeuvre coeuvre closed this as completed Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

6 participants