Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add directory support for remote caching and execution #4011

Closed
wants to merge 5 commits into from
Closed

add directory support for remote caching and execution #4011

wants to merge 5 commits into from

Conversation

hchauvin
Copy link
Contributor

@hchauvin hchauvin commented Nov 2, 2017

Add support for directory trees as artifacts

@bazel-io
Copy link
Member

bazel-io commented Nov 2, 2017

Can one of the admins verify this patch?

@hchauvin
Copy link
Contributor Author

hchauvin commented Nov 2, 2017

#3891

This commit makes artifact directories usable in a remote context both for caching and execution. I realise this is quite an unwieldy commit, but it resolves all the TODOs I found in the code revolving around directories in com.google.devtools.build.lib.remote. I also took this as an opportunity to add some unit tests for the SimpleBlobStoreActionCache, which this commit modifies.

Basically, this is simply an implementation hinted at by @buchgr augmented by some changes in TreeNodeRepository.

One thing I am not happy with if the necessity of stating all the input files to check whether they are directories or not, this information, to my knowledge, not being provided anywhere else (however, some TODOs algorithms point at some improvements to the diffing algorithm, but this is out of the scope of this commit). I also don't have enough knowledge of the code base to know whether everything is optimized as it could be.

@katre katre added category: service APIs type: feature request P2 We'll consider working on this in future. (Assignee optional) labels Nov 2, 2017
@buchgr buchgr self-requested a review November 3, 2017 22:43
@buchgr
Copy link
Contributor

buchgr commented Nov 3, 2017

wooohoooo 🎆 ... let me review this 😀

@buchgr
Copy link
Contributor

buchgr commented Nov 3, 2017

OK to test

@buchgr buchgr self-assigned this Nov 3, 2017
@hchauvin
Copy link
Contributor Author

hchauvin commented Nov 4, 2017

Test //src/test/shell/bazel:empty_package_test failed on Darwin because: "Xcode version must be specified to use an Apple CROSSTOOL". Test //scripts/release:release_test timed out. Both pass locally.

}
} else {
downloadBlob(digest, path);
System.err.println("DOWNLOADED path " + path + ": " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was testing this because I need it too :) I think you want to pull this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks I completely forgot this one!

@rahul-malik
Copy link
Contributor

@buchgr @hchauvin - Whats the current status of this PR?

@kamalmarhubi
Copy link
Contributor

Also curious. It'd be great if this could land in 0.9 or 0.10!

bazel-io pushed a commit that referenced this pull request Nov 29, 2017
@ulfjack @buchgr  - I'm resubmitting #3984 on behalf of @mterring to get past CLA issues that are holding it up from merging.

This is a temporary fix for the issue #3891 while we wait for #4011 to be reviewed and tested

Closes #4188.

PiperOrigin-RevId: 177276751
ochafik pushed a commit to ochafik/bazel that referenced this pull request Nov 29, 2017
@ulfjack @buchgr  - I'm resubmitting bazelbuild#3984 on behalf of @mterring to get past CLA issues that are holding it up from merging.

This is a temporary fix for the issue bazelbuild#3891 while we wait for bazelbuild#4011 to be reviewed and tested

Closes bazelbuild#4188.

PiperOrigin-RevId: 177276751
@buchgr
Copy link
Contributor

buchgr commented Dec 1, 2017

I apologize I lost track of it. I ll review it Monday.

@rahul-malik @kamalmarhubi did you test it on your iOS project with remote caching? Does it work?

@hchauvin
Copy link
Contributor Author

hchauvin commented Dec 2, 2017

Hi, thanks for the update @buchgr.

@rahul-malik @kamalmarhubi thanks for the interest. Currently we're using it in production for a workspace that uses lots of target directories, both for remote caching and remote execution. Using directories is the only sane way we found for dealing with external npm dependencies (JavaScript) for our frontend (whether it is a good idea or not can be discussed, but it is another subject :)), so we cache whole node_modules folders without problem.

I followed Jakob's blueprint on how this should be implemented using Merkle trees and the remote API. I took this as an opportunity to increase test coverage and to deduplicate a bit the code for remote caching and execution in order to not have the logic for dealing with directories in multiple places.

Things that might need some input:

  • Symlinks. Currently, I don't follow them when caching directories. I isolated the policy in a constant, TreeNodeRepository.symlinkPolicy, to make that clear. In practice, this is not likely to be a problem because if your project has symlinks going out of target directories, you should do something concerning the way the project is organized. I don't know anyway whether doing something other than Symlinks.NOFOLLOW makes sense for directory caching. It's also worth pointing out that if the symlinks are "local" to the cached directory, nothing is broken. If the symlinks exit the cached directory... it depends :)
  • Speed. Testing whether each input file is a directory or not has a cost. I don't see any noticeable slowdown however, and when doing remote caching checking the stats of local files is probably not your biggest concern.

@hchauvin
Copy link
Contributor Author

hchauvin commented Dec 2, 2017

Thinking about it the change might be "big" enough to warrant writing an integration test in src/test/shell/bazel/remote_execution_test.sh, what do you think @buchgr ?

@buchgr
Copy link
Contributor

buchgr commented Dec 3, 2017

@hchauvin Yes please ... an integration test would be great :-)!

Copy link
Contributor

@buchgr buchgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! This is great work @hchauvin! Thank you.

* and {@link GrpcRemoteCache}.
*/
@ThreadSafety.ThreadSafe
public abstract class RemoteActionCacheBase implements RemoteActionCache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that you could just remove the RemoteActionCache interface and rename RemoteActionCacheBase to AbstractRemoteActionCache (it's a naming convention used throughout Bazel).

Having an abstract class with abstract upload/download methods seems like a good abstraction, that renders the interface unnecessary.

@@ -188,7 +188,14 @@ private ActionResult execute(Action action, Path execRoot)
FileSystemUtils.createDirectoryAndParents(file.getParentDirectory());
outputs.add(file);
}
// TODO(olaola): support output directories.
for (String output : action.getOutputDirectoriesList()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An integration testing both the client and remote worker implementation would be great!

if (digest == null) {
// If the artifact does not have a digest, it is because it is a directory.
// We get the digest from the set of Merkle hashes computed in this TreeNodeRepository.
return Preconditions.checkNotNull(inputDirectoryDigestCache.get(input));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a message stating what went wrong. makes errors easier to debug.

@@ -105,15 +122,22 @@ public int hashCode() {
}

// Should only be called by the TreeNodeRepository.
private TreeNode(Iterable<ChildEntry> childEntries) {
this.actionInput = null;
private TreeNode(Iterable<ChildEntry> childEntries, ActionInput actionInput) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: @Nullable

List<Dirent> sortedDirent = new ArrayList<>(path.readdir(symlinkPolicy));
sortedDirent.sort(Comparator.comparing(Dirent::getName));

List<TreeNode.ChildEntry> entries = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: initialize with sortedDirent.size().

@hchauvin
Copy link
Contributor Author

hchauvin commented Dec 5, 2017

@buchgr I took your comments, added 3 integration tests and rebased.

function test_directory_artifact() {
set_directory_artifact_testfixtures

bazel --host_jvm_args=-Dbazel.DigestFunction=SHA1 build \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should no longer be needed after you rebase.

@hchauvin
Copy link
Contributor Author

hchauvin commented Dec 6, 2017

@buchgr got it. Rebased again and removed digest function.

@buchgr
Copy link
Contributor

buchgr commented Dec 8, 2017

Thanks again! It's in the internal review! Should be exported soon!

@kamalmarhubi
Copy link
Contributor

@hchauvin this is awesome!

@buchgr any chance this could be cherry picked into 0.9? Would make a great holiday gift... :-)

@rahul-malik
Copy link
Contributor

@buchgr - Did this pass internal review?

@bazel-io bazel-io closed this in 3d0a04d Dec 20, 2017
@buchgr
Copy link
Contributor

buchgr commented Dec 20, 2017

Sorry for the delay. It's merged finally! Will be in 0.10.0

werkt pushed a commit to werkt/bazel that referenced this pull request Mar 2, 2018
Add support for directory trees as artifacts. Closes bazelbuild#4011.

PiperOrigin-RevId: 179691001
werkt pushed a commit to werkt/bazel that referenced this pull request Mar 2, 2018
Add support for directory trees as artifacts. Closes bazelbuild#4011.

PiperOrigin-RevId: 179691001
luca-digrazia pushed a commit to luca-digrazia/DatasetCommitsDiffSearch that referenced this pull request Sep 4, 2022
    @ulfjack @buchgr  - I'm resubmitting bazelbuild/bazel#3984 on behalf of @mterring to get past CLA issues that are holding it up from merging.

    This is a temporary fix for the issue bazelbuild/bazel#3891 while we wait for bazelbuild/bazel#4011 to be reviewed and tested

    Closes #4188.

    PiperOrigin-RevId: 177276751
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes P2 We'll consider working on this in future. (Assignee optional) type: feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants