Asset API: Use of Directory instead of Tree for PushDirectory #165

Qinusty · 2020-08-27T14:45:44Z

The use of the Directory instead of Tree for Push/Fetch Directory leads to additional overhead on each Fetch which could instead be performed once by the client when pushing the Directories to the CAS, or once by the Fetch Service when performing an optional remote fetch.

This overhead is made clear when attempting to ensure that the full directory Tree under the root_directory_digest is present with the CAS before responding to a FetchDirectory request.

Whilst verifying the existence of these files is not a requirement within the spec, I have encountered issues with clients falling over following a Fetch request which returned a digest to an item which isn't in the CAS. Is this something clients should handle, or servers should ensure the digests within ActionResults exist within the CAS.

It appears that a similar change occurred during the early days of the RE spec. f85ddf0#diff-23d388eca6738e626128af49c6c8f0e5R745-R750

The text was updated successfully, but these errors were encountered:

sstriker · 2020-08-28T23:21:40Z

Thanks for engaging @Qinusty.

The use of the Directory instead of Tree for Push/Fetch Directory leads to additional overhead on each Fetch which could instead be performed once by the client when pushing the Directories to the CAS, or once by the Fetch Service when performing an optional remote fetch.

The Remote Asset API Push/FetchDirectory calls are deliberately operating on directory digests. The reasoning here is that the API is very likely used for inputs (read: source trees). Source trees tend to leave large portions of a tree untouched when observed through time, which could mean quite a bit of duplication if all were stored as a Tree, rather than a hierarchy of Directories. This is in line with the input_root_digest in Action.

This overhead is made clear when attempting to ensure that the full directory Tree under the root_directory_digest is present with the CAS before responding to a FetchDirectory request.

Implementations can choose to optimize the verification path under the hood. Going from directory to tree is but a CAS.GetTree() call away, and an implementation may choose to cache that.
One could argue that a FindMissingBlobsInTree(root_directory_digest) call would be a useful for implementations as an alternative.

Whilst verifying the existence of these files is not a requirement within the spec, I have encountered issues with clients falling over following a Fetch request which returned a digest to an item which isn't in the CAS. Is this something clients should handle, or servers should ensure the digests within ActionResults exist within the CAS.

Clients need to be able to deal with cases where blobs disappear from CAS. I would say that not doing so leads to a poor user experience. That said, Servers SHOULD ensure that referenced files are present in the CAS at the time of the response.

For what it's worth, I would welcome a suggestion to inline the Blob or Tree in the FetchBlob / FetchDirectory responses based on a to-be-defined request flag. The downside to that is that it would require a higher complexity in service implementations, rather than that being optional.

I think that #144 and #159 are tangentially related to this issue.

EdSchouten · 2020-09-04T19:34:16Z

The Remote Asset API Push/FetchDirectory calls are deliberately operating on directory digests. The reasoning here is that the API is very likely used for inputs (read: source trees). Source trees tend to leave large portions of a tree untouched when observed through time, which could mean quite a bit of duplication if all were stored as a Tree, rather than a hierarchy of Directories.

How is that different for directory outputs that use Trees? To make a bit more clear what I mean, let me take the sentence above and change 'Source trees' with 'Directory outputs of build actions'.

Directory outputs of build actions tend to leave large portions of a tree untouched when observed through time, which could mean quite a bit of duplication if all were stored as a Tree, rather than a hierarchy of Directories.

This is in line with the input_root_digest in Action.

But it is not in line with tree_digest in OutputDirectory. In the case of REv2:

all data that travels from client -> storage, Directory is used.
all data that travels from storage -> client, Tree is used.

The reasoning behind this is somewhat obvious:

If ActionResult were to refer to a hierarchy of Directory objects, GetActionResult() would require a crazy amount of work to check for completeness. Or you would require a data store that has a garbage collector that potentially needs to scan very deep graphs.
Similarly, a client would need lots of roundtrips to download the data.

The Remote Asset API is not in line with REv2 in that sense. And that's a pity, because it means there can be less reuse of code. If there were concerns about performance/handling of large subtrees, it would have been better to discuss that separately. For example, this was already discussed to a certain extent in #140. Note how various people in that discussion were in agreement that GetTree should likely be removed.

Qinusty mentioned this issue Sep 4, 2020

Implement GetTree buildbarn/bb-storage#95

Closed

Qinusty mentioned this issue Sep 8, 2020

Add CompletenessChecking for Fetch/Push services buildbarn/bb-remote-asset#11

Open

juergbi mentioned this issue Nov 12, 2020

REv3 idea: CAS.ExpandTree()? #140

Open

bergsieker assigned sstriker Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asset API: Use of Directory instead of Tree for PushDirectory #165

Asset API: Use of Directory instead of Tree for PushDirectory #165

Qinusty commented Aug 27, 2020 •

edited

Loading

sstriker commented Aug 28, 2020

EdSchouten commented Sep 4, 2020 •

edited

Loading

Asset API: Use of Directory instead of Tree for PushDirectory #165

Asset API: Use of Directory instead of Tree for PushDirectory #165

Comments

Qinusty commented Aug 27, 2020 • edited Loading

sstriker commented Aug 28, 2020

EdSchouten commented Sep 4, 2020 • edited Loading

Qinusty commented Aug 27, 2020 •

edited

Loading

EdSchouten commented Sep 4, 2020 •

edited

Loading