Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SymlinkNode in ComputeMerkleTree()? #146

Open
k-ye opened this issue Jul 3, 2020 · 9 comments
Open

Support SymlinkNode in ComputeMerkleTree()? #146

k-ye opened this issue Jul 3, 2020 · 9 comments
Assignees

Comments

@k-ye
Copy link
Contributor

k-ye commented Jul 3, 2020

IIUC, currently the file tree produced by ComputeMerkeTree()

func ComputeMerkleTree(execRoot string, is *command.InputSpec, chunkSize int, cache filemetadata.Cache) (root digest.Digest, inputs []*chunker.Chunker, stats *Stats, err error) {
could only contain FileNode or DirectoryNode, but not SymlinkNode.

This could be very useful for Swarming's isolate support, since the .isolate file could contain symlinks. It could even contain invalid symlinks for testing purpose (@atetubou correct me if I'm wrong). As an example, I wrote this small case:

		{
			desc: "File invalid symlink included",
			input: []*inputPath{
				{path: "foo", isSymlink: true, symlinkTarget: "fooDir/foo"},  // invalid
			},
			spec: &command.InputSpec{
				Inputs: []string{"foo"},
			},
			rootDir: &repb.Directory{
				Symlinks: []*repb.SymlinkNode{{Name: "foo", Target: "fooDir/foo"}},
			},
			additionalBlobs: [][]byte{},
                         // ignore the cache stats
		},

But I got an empty repb.Directory{} instead.

From what I can see, the symlink support needs to be added at several places:

  1. filemeta.Metadata needs a symlink flag:
    type Metadata struct {
    Digest digest.Digest
    IsExecutable bool
    Err error
    }
  2. tree implementation needs to distinguish symlinks from actual files:
    type treeNode struct {
    Files map[string]*fileNode
    Dirs map[string]*treeNode
    }
    type fileNode struct {
    Chunker *chunker.Chunker
    IsExecutable bool
    EmptyDirectoryMarker bool
    }
    .
  3. VirtualInput might also need this flag:
    type VirtualInput struct {
    // The path for the input file to be staged at, relative to the ExecRoot.
    Path string
    // The byte contents of the file to be staged.
    Contents []byte
    // Whether the file should be staged as executable.
    IsExecutable bool
    // Whether the file is actually an empty directory. This is used to provide
    // empty directory inputs. When this is set, Contents and IsExecutable are
    // ignored.
    IsEmptyDirectory bool
    }

My biggest concern is around the filemeta cache. Right now it just treats invalid symlinks as an error:

if s, err := isSymlink(filename); err == nil && s {
fe.IsInvalidSymlink = true
}
. I wonder if we could instead add this invalidity info to Metadata, and let the upper level module decide how to deal with it.

Happy to contribute this feature, but I'd like to hear your opinions first. Thanks!

@k-ye
Copy link
Contributor Author

k-ye commented Jul 3, 2020

I guess this feature request somehow works against #134 ..

@ola-rozenfeld
Copy link
Contributor

We could add this support, but it would have to be optional (additional Client option that configures how symlinks should be handled). The options are: pretend that symlinks are regular files/directories (current); pretend that absolute symlinks are regular files/directories, but use a SymlinkNode for relative symlinks; use SymlinkNode for everything. The later distinction can be made on the basis of the server Capabilities; alternatively, it can be configuration, but the client will error out of server Capabilities disallow it. Also, what to do about invalid symlinks will need to be configured as well (ignore, error, or create a dangling SymlinkNode).

Yes, you're right that this would have to go all the way through to FileMetadata.
All in all, not a simple feature -- I supported it in Bazel at one point, and it was a bit annoying to get in.

Can you please expand on your current usage of the SDK and how this will help you, so that we could better prioritize? Thank you!

@k-ye
Copy link
Contributor Author

k-ye commented Jul 6, 2020

pretend that absolute symlinks are regular files/directories, but use a SymlinkNode for relative symlinks; use SymlinkNode for everything.

For Chromium Swarming's .isolate file, all the file paths are relative, so this two options are essentially the same for our usage. Besides, we'd also like to support dangling symlink nodes :-)

Can you please expand on your current usage of the SDK.

Sure. Basically, an .isolate defines a list of files we'd like to upload to RBE-CAS. So in our prototype impl, it will probably be as simple as:

rootDg, chunkers, stats, err := ComputeMerkleTree(execRoot, isFromIsolate, chunkSize, cache)
if err == nil {
    client.UploadIfMissing(ctx, chunkers...)
}
// |rootDg| should contain file, directory and *symlink* nodes.

@atetubou has made #143 so that Directory of execRoot is also included inside the chunkers.

Thanks!

@peterebden
Copy link
Contributor

So we ran across this using more or less the same code as above (hence why I sent #188). From my perspective the current behaviour is undesirable; it silently converts symlinks to regular files which can break consumers that care about the difference (which is how we found it).

To me the obvious thing I would expect is to get SymlinkNodes back that describe the original symlink (the third option above); I'm not sure that this needs to directly concern itself with whether the links are dangling or not (that seems like a concern for the author of the rules; a dangling link is potentially valid in some contexts). Noted that the rex API has this setting for whether absolute symlinks are allowed, but the current behaviour doesn't honour that (it converts to the regular file that's pointed to regardless of the destination).

@ola-rozenfeld
Copy link
Contributor

ola-rozenfeld commented Oct 1, 2020 via email

@atetubou
Copy link
Collaborator

This is now a problem in our usage in https://crbug.com/1143567.
We are handling mac test files containing
https://en.wikipedia.org/wiki/Bundle_(macOS)
and test fails if we send regular file instead of symlink.

@peterebden do you still plan to work for this? If not, I will work for this issue.

@atetubou
Copy link
Collaborator

If we won't get response from @peterebden before Nov 4th (JST), I'll ask @k-ye to work for this.

@k-ye
Copy link
Contributor Author

k-ye commented Nov 5, 2020

Let me take this over.

@peterebden
Copy link
Contributor

Apologies for the delay - I haven't had any time to look into this further (and probably won't soon unfortunately). Thanks for picking it up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants