Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

//src/test/shell/bazel/android:android_ndk_integration_test doesn't work with remote caching #4663

Open
buchgr opened this issue Feb 20, 2018 · 31 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P2 We'll consider working on this in future. (Assignee optional) team-Android Issues for Android team type: bug

Comments

@buchgr
Copy link
Contributor

buchgr commented Feb 20, 2018

INFO: From Executing genrule //src/test/java/com/google/devtools/build/android/desugar:desugar_testdata_from_directory_to_directory:
--
  | /tmp/tmp.aw3wZdZTdO /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/bazel-sandbox/1716082235553702946/execroot/io_bazel
  | [2018-02-20T10:29:55Z] /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/bazel-sandbox/1716082235553702946/execroot/io_bazel
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_1_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_4_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_5_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_6_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_2_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_3_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
@buchgr buchgr self-assigned this Feb 20, 2018
@buchgr buchgr changed the title //src/test/shell/bazel/android:android_ndk_integration_test doesn //src/test/shell/bazel/android:android_ndk_integration_test doesn't work with remote caching Feb 20, 2018
@buchgr buchgr added the P1 I'll work on this now. (Assignee required) label Feb 20, 2018
buchgr added a commit to bazelbuild/continuous-integration that referenced this issue Feb 20, 2018
@jin
Copy link
Member

jin commented Feb 23, 2018

What is the error here? Now that we're using BuildKite, we should re-enable this as soon as possible.

@jin
Copy link
Member

jin commented Feb 26, 2018

Ping, can we disable remote caching just for this test?

@buchgr
Copy link
Contributor Author

buchgr commented Feb 27, 2018

The error is Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build

We unfortunately can't disable remote caching just for one test, but we can completely disable remote caching until it's fixed? @philwo ?

@aj-michael
Copy link
Contributor

perhaps this is a problem with mac case sensitivity? the runfiles of this test should contain: androidndk/BUILD.bazel and well as androidndk/ndk/build/. Perhaps Bazel is trying to read the latter as if it were a BUILD file?

@jin
Copy link
Member

jin commented Feb 28, 2018

@aj-michael this looks like it's running on ubuntu buildkite-ubuntu1604-6z37

@jin
Copy link
Member

jin commented Feb 28, 2018

@buchgr can you please provide some steps on how we can reproduce this with a remote caching setup? Is there an existing remote cache server that we can use to reproduce this?

@jin
Copy link
Member

jin commented Mar 6, 2018

@buchgr I ran this test and it passes with --remote_http_cache=url-to-buildkite-cache on our Ubuntu CI test machine.

Managed to repro:

ERROR: /home/jingwen/bazel/src/test/shell/bazel/android/BUILD:71:1: failed: Unexpect
ed IO error.: Input is a directory: /home/jingwen/.cache/bazel/_bazel_jingwen/bb5fb95
2b6ed390892e90207ad3205d9/execroot/io_bazel/external/androidndk/ndk/build
Target //src/test/shell/bazel/android:android_ndk_integration_test up-to-date:
  bazel-bin/src/test/shell/bazel/android/android_ndk_integration_test
INFO: Elapsed time: 0.606s, Critical Path: 0.30s
FAILED: Build did NOT complete successfully
//src/test/shell/bazel/android:android_ndk_integration_test           NO STATUS

@jin
Copy link
Member

jin commented Mar 7, 2018

I renamed the directory from build to _build and the issue persists:

ERROR: /home/jingwen/bazel/src/test/shell/bazel/android/BUILD:71:1:  failed: Unexpected IO error
.: Input is a directory: /home/jingwen/.cache/bazel/_bazel_jingwen/bb5fb952b6ed390892e90207ad320
5d9/execroot/io_bazel/external/androidndk/ndk/_build

Deleted the build directory and the error shows up with another directory:

ERROR: /home/jingwen/bazel/src/test/shell/bazel/android/BUILD:71:1:  failed: Unexp
ected IO error.: Input is a directory: /home/jingwen/.cache/bazel/_bazel_jingwen/b
b5fb952b6ed390892e90207ad3205d9/execroot/io_bazel/external/androidndk/ndk/meta

@jin
Copy link
Member

jin commented Mar 13, 2018

@buchgr ping, is this a problem with external repositories? The NDK external repository is a directory of symlinks to a local path.

This test has been disabled for 3 weeks now and I'd like to help debug to get it back up again.

@aj-michael
Copy link
Contributor

Perhaps this is due to the fact that the test has an input that is a filegroup that contains a directory:

https://source.bazel.build/bazel/+/master:src/main/java/com/google/devtools/build/lib/bazel/rules/android/android_ndk_build_file_template.txt;l=22?q=android_ndk

@philwo
Copy link
Member

philwo commented Mar 13, 2018

@aj-michael Yes, I remember that directory inputs are very very problematic. They cannot be cached, the sandbox may fall over its own feet, Bazel has no clue when stuff inside that directory changes...

Is there any way we could get rid of that directory filegroup? Maybe replace it with a glob?

@jin
Copy link
Member

jin commented Mar 13, 2018

Thanks @aj-michael @philwo, let me try switching it to a glob.

@aj-michael
Copy link
Contributor

@philwo - my understanding is that directory inputs are something that is (unfortunately) supported in Bazel for historical reasons. Does remote caching not work with it at all? If so, I suspect there will be other users who are broken.

Also, I wonder why our SDK integration tests do not have this problem. They have a glob with exclude_directories = 0:

https://source.bazel.build/bazel/+/master:tools/android/android_sdk_repository_template.bzl;l=63?q=android_sdk_repository

There are two reasons that we do not use a glob here:

  1. Performance - there are a lot of files, and globbing them can take 20-30seconds
  2. Filenames contain characters that are invalid in the label syntax

Perhaps we can find a solution for 1, but I can't think of a solution for 2

@jin
Copy link
Member

jin commented Mar 13, 2018

Filenames contain characters that are invalid in the label syntax

For example, the NDK contains many paths with the string c++, and a few with !.

@philwo
Copy link
Member

philwo commented Mar 13, 2018

@ulfjack for advice. Not sure how to proceed then :(

@buchgr Can we mark individual tests as uncachable?

@jin
Copy link
Member

jin commented Mar 13, 2018

There's the no-cache tag: https://docs.bazel.build/versions/master/remote-caching.html#exclude-specific-targets-from-using-the-remote-cache

However, I tried marking the integration test with this tag and it's still failing with the same error. Maybe it's not doing what I think it's doing?

@ulfjack
Copy link
Contributor

ulfjack commented Mar 14, 2018

I have a partial change to properly support directory inputs, but haven't had time to finish it.

Unfortunately, due to a change by @ola-rozenfeld, the no-cache flag still uploads to the remote cache. @buchgr has a proposal for fixing that, but hasn't had time to work on it.

@ulfjack
Copy link
Contributor

ulfjack commented Mar 14, 2018

It's change https://bazel-review.googlesource.com/c/bazel/+/29872 - it's incomplete because it's not returning the correct metadata and it's not adding the metadata to the MetadataHandler.

That said, even with that change, I think it'll still require a patch to RemoteSpawnCache / RemoteSpawnRunner.

@ola-rozenfeld
Copy link
Contributor

@buchgr has a proposal for fixing that, but hasn't had time to work on it.
Jakob told me yesterday he is actually working on this right now.

@buchgr
Copy link
Contributor Author

buchgr commented Mar 14, 2018

There's a workaround that will let us enable it.Instead of using --experimental_remote_spawn_strategy, we can use --spawn_strategy=remote --genrule_strategy=remote --strategy=Javac=remote --strategy=Closure=remote and then tag the test with no-remote. I just tested it. Works.

buchgr added a commit to bazelbuild/continuous-integration that referenced this issue Mar 14, 2018
@philwo
Copy link
Member

philwo commented Mar 14, 2018

@buchgr I think your change accidentally disabled sandboxing on all platforms, as the RemoteSpawnRunner uses the normal LocalSpawnRunner as its fallback.

buchgr added a commit to bazelbuild/continuous-integration that referenced this issue Sep 5, 2018
@meisterT
Copy link
Member

meisterT commented Feb 3, 2019

@meisterT meisterT reopened this Feb 3, 2019
buchgr added a commit to buchgr/bazel that referenced this issue Feb 5, 2019
The TreeNodeRepository served us well, but has become a bit dated over
time and it's increasingly hard to understand and to fix bugs. Additionally,
it was written with the initial intend of incrementally updating the merkle
tree between actions, however internal performance analysis has shown
that doing so is 1) hard to implement reliably and 2) would increase
memory consumption multifold.

This is a rewrite of the code that also fixes bugs like bazelbuild#4663. The
merkle tree construction is split into two:
 - InputTree is an intermediate tree representation of a (sorted) list
   of inputs.
 - MerkleTree implements a visitor pattern over an InputTree and is
   responsible for serializing the InputTree to the protobuf merkle
   tree that's used by remote caching / execution.

Benchmarks on my local machine have shown this implementation to
consumes between 2-10x less CPU time than the current implementation.
buchgr added a commit to buchgr/bazel that referenced this issue Feb 28, 2019
The TreeNodeRepository served us well, but has become a bit dated over
time and it's increasingly hard to understand and to fix bugs. Additionally,
it was written with the initial intend of incrementally updating the merkle
tree between actions, however internal performance analysis has shown
that doing so is 1) hard to implement reliably and 2) would increase
memory consumption multifold.

This is a rewrite of the code that also fixes bugs like bazelbuild#4663. The
merkle tree construction is split into two:
 - InputTree is an intermediate tree representation of a (sorted) list
   of inputs.
 - MerkleTree implements a visitor pattern over an InputTree and is
   responsible for serializing the InputTree to the protobuf merkle
   tree that's used by remote caching / execution.

Benchmarks on my local machine have shown this implementation to
consumes between 2-10x less CPU time than the current implementation.
Tests by users have shown equally encouraging speedups [1].

[1] https://groups.google.com/d/msg/bazel-discuss/wPHrqm2z8lU/mzoRF236GQAJ
buchgr added a commit to buchgr/bazel that referenced this issue Mar 4, 2019
The TreeNodeRepository served us well, but has become a bit dated over
time and it's increasingly hard to understand and to fix bugs. Additionally,
it was written with the initial intend of incrementally updating the merkle
tree between actions, however internal performance analysis has shown
that doing so is 1) hard to implement reliably and 2) would increase
memory consumption multifold.

This is a rewrite of the code that also fixes bugs like bazelbuild#4663. The
merkle tree construction is split into two:
 - InputTree is an intermediate tree representation of a (sorted) list
   of inputs.
 - MerkleTree implements a visitor pattern over an InputTree and is
   responsible for serializing the InputTree to the protobuf merkle
   tree that's used by remote caching / execution.

Benchmarks on my local machine have shown this implementation to
consumes between 2-10x less CPU time than the current implementation.
Tests by users have shown equally encouraging speedups [1].

[1] https://groups.google.com/d/msg/bazel-discuss/wPHrqm2z8lU/mzoRF236GQAJ
bazel-io pushed a commit that referenced this issue Mar 8, 2019
The TreeNodeRepository served us well, but has become a bit dated over
time and it's increasingly hard to understand and to fix bugs. Additionally,
it was written with the initial intend of incrementally updating the merkle
tree between actions, however internal performance analysis has shown
that doing so is 1) hard to implement reliably and 2) would increase
memory consumption multifold.

This is a rewrite of the code that also fixes bugs like #4663. MerkleTree
implements a visitor pattern over an InputTree and is responsible for
serializing the InputTree to the protobuf merkle tree that's used by
remote caching / execution.

Benchmarks on my local machine have shown this implementation to
consumes between 2-10x less wall time than the current implementation.
Tests by users have shown equally encouraging speedups [1].

[1] https://groups.google.com/d/msg/bazel-discuss/wPHrqm2z8lU/mzoRF236GQAJ

Closes #7583.

PiperOrigin-RevId: 237421145
joeleba pushed a commit to joeleba/continuous-integration that referenced this issue Jun 17, 2019
joeleba pushed a commit to joeleba/continuous-integration that referenced this issue Jun 17, 2019
joeleba pushed a commit to joeleba/continuous-integration that referenced this issue Jun 17, 2019
@buchgr buchgr removed their assignment Jan 9, 2020
@jin jin added team-Android Issues for Android team P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) team-Android Issues for Android team labels Apr 17, 2020
@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2.5 years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Mar 16, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 31, 2023
@meteorcloudy meteorcloudy reopened this Apr 6, 2023
@meteorcloudy meteorcloudy added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Apr 6, 2023
@Shahid2687
Copy link

INFO: From Executing genrule //src/test/java/com/google/devtools/build/android/desugar:desugar_testdata_from_directory_to_directory:
--
  | /tmp/tmp.aw3wZdZTdO /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/bazel-sandbox/1716082235553702946/execroot/io_bazel
  | [2018-02-20T10:29:55Z] /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/bazel-sandbox/1716082235553702946/execroot/io_bazel
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_1_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_4_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_5_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_6_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_2_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build
  | ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-6z37/bazel/bazel/src/test/shell/bazel/android/BUILD:71:1: Couldn't build file src/test/shell/bazel/android/android_ndk_integration_test/shard_3_of_6/test.log:  failed: Unexpected IO error.: Input is a directory: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/02e53aad341ca65ced1335c03162fe6d/execroot/io_bazel/external/androidndk/ndk/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P2 We'll consider working on this in future. (Assignee optional) team-Android Issues for Android team type: bug
Projects
None yet
Development

No branches or pull requests

9 participants