Add support for symbolic links #18

juergbi · 2018-07-02T08:22:27Z

We want to make the Remote Execution API useful for building arbitrary packages using their existing build systems. Symbolic links are commonly used on POSIX systems for various purposes. Package build systems may rely on them and packages may create symlinks in their build output directory.

See also https://docs.google.com/document/d/1gnOYszitgrLVet3sQk-TKGqIcpkkDsc6aw-izoo-d64/edit#

googlebot · 2018-07-02T08:22:29Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

googlebot · 2018-07-02T08:22:30Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

illicitonion · 2018-07-03T23:55:21Z

build/bazel/remote/execution/v2/remote_execution.proto

+  // it can be an absolute path starting with `/`. Absolute paths are
+  // interpreted relative to the input root directory, i.e., they are not
+  // allowed to escape the input space. The canonical form forbids the
+  // substrings `/./` and `//` in the target path. `..` components are allowed


Can we restrict .. to only be allowed to be leading components, and only for relative symlinks?

I'm looking to avoid two things:

foo/../bar which is redundant and non-canonical - this feels trivial for callers to avoid, so should be forbidden

/../foo which is not allowed because it would escape the input root directory - explicitly saying both "they are not allowed to escape the input space" and ".. components are allowed anywhere" seems to open up an ambiguity that we can avoid.

The text I would suggest is "The canonical form forbids the substrings /./ and // in the target path. .. components are allowed only in relative paths, and only as leading components in the target path, i.e. any substring ../ may only be preceded by 0 or more ../ literals."

/../foo which is not allowed because it would escape the input root directory - explicitly saying both "they are not allowed to escape the input space" and ".. components are allowed anywhere" seems to open up an ambiguity that we can avoid.

Following POSIX semantics, /../foo would be identical to /foo and thus also not escape the input root directory. However, as this is trivial to canonicalize, I agree that we should not allow symlink targets starting with /../.

foo/../bar which is redundant and non-canonical - this feels trivial for callers to avoid, so should be forbidden

This is not as simple as it may seem. If foo is a symlink to a directory, foo/../bar is not guaranteed to be equivalent to bar (.. must be resolved after following foo). And foo might even be a dangling symlink that becomes valid as part of the build process. To be on the safe side with regards to compatibility I prefer allowing .. components in symlink targets.

OK, I re-read the whole discussion in the document comments about absolute paths now, and recall the problems. I'm still not sure whether to allow absolute paths in general, and only block them on a per-platform or per-server basis. I'm not a big fan of treating absolute paths as relative to the input root, because that is very confusing, and I'm not sure what problem will it solve -- packages that create absolute symlinks would still need to be heavily amended, right?
Suggestion: how about we simply disallow absolute paths for now, and will add them later (it is a non-breaking change) when particular use-cases arise?

packages that create absolute symlinks would still need to be heavily amended, right?

BuildStream will use workers that build in sandboxes where the input root is the filesystem root and thus, packages can use absolute symlinks without any modifications. I understand that this doesn't apply to other platforms, which is why we propose to leave this decision to each platform.

If this is not currently acceptable, we could definitely follow your suggestion and disallow absolute paths for now and add them later after more discussion.

Bleh. Okay, how about the server returning something like:

// Describes how the server treats absolute symlink targets. enum SymlinkAbsolutePathStrategy { UNKNOWN = 0; // Server will return an INVALID_ARGUMENT on symlinks with absolute targets. DISALLOWED = 1; // Server will allow symlink targets to escape the input root tree, possibly resulting in // non-hermetic builds. ALLOWED = 2; // Server will treat absolute symlink targets as relative to the input root, i.e. for "/a/foo" // an absolute symlink "/worker123/.../input_rootXYZ/a/foo" will be created. INPUT_ROOT = 3; } message CacheCapabilities { ... SymlinkAbsolutePathStrategy symlink_absolute_path_strategy = 5; }

And then you can just refer to that enum in the comments.

Although, wait, that affects the client behavior as well, for symlinks returned from the server. In the ALLOWED case, the client should just symlink the exact same absolute target non-hermetically; for INPUT_ROOT, the client should do the same translation? But bleh, that is just so non-intuitive...

Wait, we already allow setting the working directory in the API. What if we restrict the enum to just allow/disallow for now, and BuildStream just sets the working directory to "/" for its actions? In that case, allowing absolute paths is the same as treating them as input root, right?

I'm going to chime in and I think that ALLOWED and DISALLOWED should be the only behaviours allowed; INPUT_ROOT I think has too much ambiguity to it. If we require the server to reinterpret symlinks in any way, the behaviour could end up depending on whether the server creates the symlink as an absolute or relative path (because the symlink /input_root/foo and ../foo are semantically different, even if they both appear in /input_root/bar, and if /input_root is a symlink to /real_root, is the server expected to make a link to /real_root/foo?). I worry that trying to specify exactly how they should work is going to get confusing especially on Windows where path handling is very complicated.

I see concerns about symlinks escaping the input root, but I can't say that I really understand them. There's plenty of other ways that commands can escape the input root, and I don't think a command that runs /usr/bin/gcc is any less dependent on the worker environment than one that runs ./gcc which is a symlink to /usr/bin/gcc, say.

As for canonicalization, realistically, it is more strongly a set of best practices than hard-and-fast rules. The advantage to encoding them into the API, rather than letting the client do what is best, is merely to let the server detect and reject non-canonical actions. If the server does not do this, then the rule makes no difference (and by Hyrum's law, we really shouldn't have some but not all servers rejecting non-canonical paths); it just becomes something that the client can manage. We could even imagine a "canonical enforcement proxy" which proxies requests, after verifying that they meet the canonicalization requirements.

I personally do not feel at all comfortable trying to specify canonicalization requirements for Windows. For POSIX, I think that we could get away with the rules about // and /./. It may be easier to explicitly frame the requirements here as guidelines, rather than hard rules for the API. And honestly, that should be considered for all canonicalization requirements unless all the implementers are enforcing them or prepared to take that on as a priority.

The working directory does not influence the resolution of absolute symlinks. However, as the workers for BuildStream will set the filesystem root to the input root, allowing absolute paths is the same as treating them as relative to the input root. I.e., restricting the enum to just allow/disallow works for us. Let me know if I shall add this enum to the commit.

Let me know if I shall add this enum to the commit.

Yes, please! Even though we are effectively making it be a boolean, an enum allows us to revisit this decision later on.

I've updated the branch based on the comments above. I've prefixed the enum values with ABSOLUTE_SYMLINK_ to avoid conflict with UNKNOWN in DigestFunction as reported by protoc. Let me know if I should use a different name as prefix or if the conflict should be resolved in a different way.

googlebot · 2018-07-04T17:21:40Z

CLAs look good, thanks!

googlebot · 2018-07-04T17:21:40Z

CLAs look good, thanks!

ola-rozenfeld · 2018-07-12T15:18:47Z

Ping!

ola-rozenfeld · 2018-07-16T13:42:05Z

build/bazel/remote/execution/v2/remote_execution.proto

@@ -1231,6 +1249,18 @@ message PriorityCapabilities {
  repeated PriorityRange priorities = 1;
 }

+// Describes how the server treats absolute symlink targets.
+enum SymlinkAbsolutePathStrategy {
+  ABSOLUTE_SYMLINK_UNKNOWN = 0;


Grrr, C++ and its stupid scoping rules! How about we move the enum under CacheCapabilities message instead? Our current enums ignore the C++ use case in their naming, and I'd like to be consistent.

Fine by me. Update pushed.

ola-rozenfeld · 2018-07-16T13:46:19Z

build/bazel/remote/execution/v2/remote_execution.proto

+enum SymlinkAbsolutePathStrategy {
+  ABSOLUTE_SYMLINK_UNKNOWN = 0;
+
+  // Server will return an INVALID_ARGUMENT on symlinks with absolute targets.


Actually, how about:

// Server will return an INVALID_ARGUMENT on input symlinks with absolute targets.
// If an action tries to create an output symlink with an absolute target, a
// FAILED_PRECONDITION will be returned.

This clarification makes sense. Applied.

ola-rozenfeld · 2018-07-16T14:11:01Z

@alercah @illicitonion PTAL, thank you!
After this goes in, I will make the official 2.0 release.

illicitonion

Looks great :) Thanks!

ola-rozenfeld · 2018-09-18T18:08:54Z

A couple of questions (better late than never):

In the design doc you wrote "Due to this purely symbolic nature of symlinks, it doesn't make sense to include a digest of the target file or directory in the SymlinkNode" But doesn't it? What if the symlink pointed outside of the working directory, and the contents of the file changed, for example symlinks were used to point to some heavyweight tools installed in some system location, and those tools were updated? Isn't it really important to invalidate our cache key as a result? I guess what I'm suggesting is to include the Digest, and use it a precondition check -- error out (with PRECONDITION_FAILURE) if the actual Digest doesn't match the specified one, both on server and client. WDYT?

Secondly, I just discovered that apparently some people generate symlink outputs in their genrules. And yes, I don't mean directories, I mean actual files that are symlinks. We probably want to add an OutputSymlink in addition to OutputFile and OutputDirectory -- either that or add a field is_symlink in both these messages. WDYT?

juergbi · 2018-09-18T18:58:01Z

What if the symlink pointed outside of the working directory, and the contents of the file changed, for example symlinks were used to point to some heavyweight tools installed in some system location, and those tools were updated?

That's equivalent to specifying the path to a tool outside the input root as a command or in a script. If the exact version of a tool is significant, it should either be part of the input space or covered by platform constraints, in my opinion. Using tools outside the input root is possible with and without symlinks (if the platform allows escaping the input root at all). I don't see why we should treat a path in a symlink target differently from a path in other places.

Am I overlooking a use case where this would be required?

I guess what I'm suggesting is to include the Digest, and use it a precondition check -- error out (with PRECONDITION_FAILURE) if the actual Digest doesn't match the specified one, both on server and client.

I think we should avoid significantly deviating from the common symlink semantics. Otherwise we introduce incompatibilities. E.g., symlinks may point to a parent directory, which would make it impossible to calculate a digest for that directory (cyclic dependency). Symlinks are also allowed to be dangling. And this new requirement could slow down some things as the server would be forced to check/update digests of symlinks in the output tree even in subtrees where there were no real changes.

Secondly, I just discovered that apparently some people generate symlink outputs in their genrules. And yes, I don't mean directories, I mean actual files that are symlinks. We probably want to add an OutputSymlink in addition to OutputFile and OutputDirectory -- either that or add a field is_symlink in both these messages. WDYT?

Yes, OutputSymlink sounds useful. In BuildStream we always use whole directories as output, however, other clients might indeed want that.

ola-rozenfeld · 2018-09-19T19:21:39Z

Good point on symlinks being misused in this way, let me find out some more about that particular use case, and if it can be addressed differently.

Created #28 for the OutputSymlink. I can't add you as a reviewer for some reason, but please take a look! Thank you!

…ld#18) * Adding DownloadActionResult function to intermediate layer. * Addressing comments. * Fix breakage.

googlebot added the cla: no Pull requests whose authors are not covered by a CLA with Google. label Jul 2, 2018

illicitonion suggested changes Jul 3, 2018

View reviewed changes

juergbi force-pushed the symlinks branch 2 times, most recently from 7a70a77 to 9b73828 Compare July 4, 2018 17:21

googlebot added cla: yes Pull requests whose authors are covered by a CLA with Google. and removed cla: no Pull requests whose authors are not covered by a CLA with Google. labels Jul 4, 2018

juergbi force-pushed the symlinks branch from 9b73828 to ce0a7e8 Compare July 16, 2018 13:14

ola-rozenfeld reviewed Jul 16, 2018

View reviewed changes

Add support for symbolic links

52fa3b8

juergbi force-pushed the symlinks branch from ce0a7e8 to 52fa3b8 Compare July 16, 2018 13:57

ola-rozenfeld approved these changes Jul 16, 2018

View reviewed changes

ola-rozenfeld mentioned this pull request Jul 16, 2018

Importing remote-apis repository into Bazel for Remote API v2. bazelbuild/bazel#5605

Closed

illicitonion approved these changes Jul 17, 2018

View reviewed changes

ola-rozenfeld merged commit c1c1ad2 into bazelbuild:master Jul 17, 2018

santigl pushed a commit to santigl/remote-apis that referenced this pull request Aug 26, 2020

Adding DownloadActionResult function to intermediate layer. (bazelbui…

91c93c1

…ld#18) * Adding DownloadActionResult function to intermediate layer. * Addressing comments. * Fix breakage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for symbolic links #18

Add support for symbolic links #18

juergbi commented Jul 2, 2018

googlebot commented Jul 2, 2018

googlebot commented Jul 2, 2018

illicitonion Jul 3, 2018

juergbi Jul 4, 2018

ola-rozenfeld Jul 7, 2018

juergbi Jul 12, 2018

ola-rozenfeld Jul 12, 2018

ola-rozenfeld Jul 12, 2018

alercah Jul 12, 2018

juergbi Jul 12, 2018

ola-rozenfeld Jul 12, 2018

juergbi Jul 16, 2018

googlebot commented Jul 4, 2018

googlebot commented Jul 4, 2018

ola-rozenfeld commented Jul 12, 2018

ola-rozenfeld Jul 16, 2018

juergbi Jul 16, 2018

ola-rozenfeld Jul 16, 2018

juergbi Jul 16, 2018

ola-rozenfeld commented Jul 16, 2018

illicitonion left a comment

ola-rozenfeld commented Sep 18, 2018

juergbi commented Sep 18, 2018

ola-rozenfeld commented Sep 19, 2018

Add support for symbolic links #18

Add support for symbolic links #18

Conversation

juergbi commented Jul 2, 2018

googlebot commented Jul 2, 2018

What to do if you already signed the CLA

Individual signers

Corporate signers

googlebot commented Jul 2, 2018

What to do if you already signed the CLA

Individual signers

Corporate signers

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

googlebot commented Jul 4, 2018

googlebot commented Jul 4, 2018

ola-rozenfeld commented Jul 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ola-rozenfeld commented Jul 16, 2018

illicitonion left a comment

Choose a reason for hiding this comment

ola-rozenfeld commented Sep 18, 2018

juergbi commented Sep 18, 2018

ola-rozenfeld commented Sep 19, 2018