New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for symbolic links #18
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
1 similar comment
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
// it can be an absolute path starting with `/`. Absolute paths are | ||
// interpreted relative to the input root directory, i.e., they are not | ||
// allowed to escape the input space. The canonical form forbids the | ||
// substrings `/./` and `//` in the target path. `..` components are allowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we restrict ..
to only be allowed to be leading components, and only for relative symlinks?
I'm looking to avoid two things:
foo/../bar
which is redundant and non-canonical - this feels trivial for callers to avoid, so should be forbidden/../foo
which is not allowed because it would escape the input root directory - explicitly saying both "they are not allowed to escape the input space" and "..
components are allowed anywhere" seems to open up an ambiguity that we can avoid.
The text I would suggest is "The canonical form forbids the substrings /./
and //
in the target path. ..
components are allowed only in relative paths, and only as leading components in the target path, i.e. any substring ../
may only be preceded by 0 or more ../
literals."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/../foo
which is not allowed because it would escape the input root directory - explicitly saying both "they are not allowed to escape the input space" and ".. components are allowed anywhere" seems to open up an ambiguity that we can avoid.
Following POSIX semantics, /../foo
would be identical to /foo
and thus also not escape the input root directory. However, as this is trivial to canonicalize, I agree that we should not allow symlink targets starting with /../
.
foo/../bar
which is redundant and non-canonical - this feels trivial for callers to avoid, so should be forbidden
This is not as simple as it may seem. If foo
is a symlink to a directory, foo/../bar
is not guaranteed to be equivalent to bar
(..
must be resolved after following foo
). And foo
might even be a dangling symlink that becomes valid as part of the build process. To be on the safe side with regards to compatibility I prefer allowing ..
components in symlink targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I re-read the whole discussion in the document comments about absolute paths now, and recall the problems. I'm still not sure whether to allow absolute paths in general, and only block them on a per-platform or per-server basis. I'm not a big fan of treating absolute paths as relative to the input root, because that is very confusing, and I'm not sure what problem will it solve -- packages that create absolute symlinks would still need to be heavily amended, right?
Suggestion: how about we simply disallow absolute paths for now, and will add them later (it is a non-breaking change) when particular use-cases arise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packages that create absolute symlinks would still need to be heavily amended, right?
BuildStream will use workers that build in sandboxes where the input root is the filesystem root and thus, packages can use absolute symlinks without any modifications. I understand that this doesn't apply to other platforms, which is why we propose to leave this decision to each platform.
If this is not currently acceptable, we could definitely follow your suggestion and disallow absolute paths for now and add them later after more discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bleh. Okay, how about the server returning something like:
// Describes how the server treats absolute symlink targets.
enum SymlinkAbsolutePathStrategy {
UNKNOWN = 0;
// Server will return an INVALID_ARGUMENT on symlinks with absolute targets.
DISALLOWED = 1;
// Server will allow symlink targets to escape the input root tree, possibly resulting in
// non-hermetic builds.
ALLOWED = 2;
// Server will treat absolute symlink targets as relative to the input root, i.e. for "/a/foo"
// an absolute symlink "/worker123/.../input_rootXYZ/a/foo" will be created.
INPUT_ROOT = 3;
}
message CacheCapabilities {
...
SymlinkAbsolutePathStrategy symlink_absolute_path_strategy = 5;
}
And then you can just refer to that enum in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, wait, that affects the client behavior as well, for symlinks returned from the server. In the ALLOWED case, the client should just symlink the exact same absolute target non-hermetically; for INPUT_ROOT, the client should do the same translation? But bleh, that is just so non-intuitive...
Wait, we already allow setting the working directory in the API. What if we restrict the enum to just allow/disallow for now, and BuildStream just sets the working directory to "/" for its actions? In that case, allowing absolute paths is the same as treating them as input root, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to chime in and I think that ALLOWED and DISALLOWED should be the only behaviours allowed; INPUT_ROOT I think has too much ambiguity to it. If we require the server to reinterpret symlinks in any way, the behaviour could end up depending on whether the server creates the symlink as an absolute or relative path (because the symlink /input_root/foo
and ../foo
are semantically different, even if they both appear in /input_root/bar
, and if /input_root
is a symlink to /real_root
, is the server expected to make a link to /real_root/foo
?). I worry that trying to specify exactly how they should work is going to get confusing especially on Windows where path handling is very complicated.
I see concerns about symlinks escaping the input root, but I can't say that I really understand them. There's plenty of other ways that commands can escape the input root, and I don't think a command that runs /usr/bin/gcc
is any less dependent on the worker environment than one that runs ./gcc
which is a symlink to /usr/bin/gcc
, say.
As for canonicalization, realistically, it is more strongly a set of best practices than hard-and-fast rules. The advantage to encoding them into the API, rather than letting the client do what is best, is merely to let the server detect and reject non-canonical actions. If the server does not do this, then the rule makes no difference (and by Hyrum's law, we really shouldn't have some but not all servers rejecting non-canonical paths); it just becomes something that the client can manage. We could even imagine a "canonical enforcement proxy" which proxies requests, after verifying that they meet the canonicalization requirements.
I personally do not feel at all comfortable trying to specify canonicalization requirements for Windows. For POSIX, I think that we could get away with the rules about //
and /./
. It may be easier to explicitly frame the requirements here as guidelines, rather than hard rules for the API. And honestly, that should be considered for all canonicalization requirements unless all the implementers are enforcing them or prepared to take that on as a priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The working directory does not influence the resolution of absolute symlinks. However, as the workers for BuildStream will set the filesystem root to the input root, allowing absolute paths is the same as treating them as relative to the input root. I.e., restricting the enum to just allow/disallow works for us. Let me know if I shall add this enum to the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if I shall add this enum to the commit.
Yes, please! Even though we are effectively making it be a boolean, an enum allows us to revisit this decision later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the branch based on the comments above. I've prefixed the enum values with ABSOLUTE_SYMLINK_
to avoid conflict with UNKNOWN
in DigestFunction
as reported by protoc
. Let me know if I should use a different name as prefix or if the conflict should be resolved in a different way.
7a70a77
to
9b73828
Compare
CLAs look good, thanks! |
1 similar comment
CLAs look good, thanks! |
Ping! |
@@ -1231,6 +1249,18 @@ message PriorityCapabilities { | |||
repeated PriorityRange priorities = 1; | |||
} | |||
|
|||
// Describes how the server treats absolute symlink targets. | |||
enum SymlinkAbsolutePathStrategy { | |||
ABSOLUTE_SYMLINK_UNKNOWN = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grrr, C++ and its stupid scoping rules! How about we move the enum under CacheCapabilities message instead? Our current enums ignore the C++ use case in their naming, and I'd like to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me. Update pushed.
enum SymlinkAbsolutePathStrategy { | ||
ABSOLUTE_SYMLINK_UNKNOWN = 0; | ||
|
||
// Server will return an INVALID_ARGUMENT on symlinks with absolute targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, how about:
// Server will return an INVALID_ARGUMENT on input symlinks with absolute targets.
// If an action tries to create an output symlink with an absolute target, a
// FAILED_PRECONDITION will be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clarification makes sense. Applied.
@alercah @illicitonion PTAL, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great :) Thanks!
A couple of questions (better late than never): In the design doc you wrote "Due to this purely symbolic nature of symlinks, it doesn't make sense to include a digest of the target file or directory in the Secondly, I just discovered that apparently some people generate symlink outputs in their genrules. And yes, I don't mean directories, I mean actual files that are symlinks. We probably want to add an OutputSymlink in addition to OutputFile and OutputDirectory -- either that or add a field is_symlink in both these messages. WDYT? |
That's equivalent to specifying the path to a tool outside the input root as a command or in a script. If the exact version of a tool is significant, it should either be part of the input space or covered by platform constraints, in my opinion. Using tools outside the input root is possible with and without symlinks (if the platform allows escaping the input root at all). I don't see why we should treat a path in a symlink target differently from a path in other places. Am I overlooking a use case where this would be required?
I think we should avoid significantly deviating from the common symlink semantics. Otherwise we introduce incompatibilities. E.g., symlinks may point to a parent directory, which would make it impossible to calculate a digest for that directory (cyclic dependency). Symlinks are also allowed to be dangling. And this new requirement could slow down some things as the server would be forced to check/update digests of symlinks in the output tree even in subtrees where there were no real changes.
Yes, OutputSymlink sounds useful. In BuildStream we always use whole directories as output, however, other clients might indeed want that. |
Good point on symlinks being misused in this way, let me find out some more about that particular use case, and if it can be addressed differently. Created #28 for the OutputSymlink. I can't add you as a reviewer for some reason, but please take a look! Thank you! |
…ld#18) * Adding DownloadActionResult function to intermediate layer. * Addressing comments. * Fix breakage.
We want to make the Remote Execution API useful for building arbitrary packages using their existing build systems. Symbolic links are commonly used on POSIX systems for various purposes. Package build systems may rely on them and packages may create symlinks in their build output directory.
See also https://docs.google.com/document/d/1gnOYszitgrLVet3sQk-TKGqIcpkkDsc6aw-izoo-d64/edit#