L Server endpoints config #153

werkt · 2018-06-10T15:26:29Z

Addresses limitations indicated in #142

Provide configuration for the CAS and ActionCache for memory instances. Support a grpc endpoint for both, with mixed mode support for delegation and reorganization of CAS and AC into their own packages.

ola-rozenfeld

First pass comments -- I feel I need a little more context for the overall plan and use cases for this to make it easier to review. Thank you!

ola-rozenfeld · 2018-06-18T18:08:41Z

src/main/java/build/buildfarm/cas/GrpcCAS.java

+  }
+
+  @Override
+  public boolean contains(Digest digest) {


Why do you use queryWriteStatus for contains instead of getMissingBlobs? I feel that this API method is more for ongoing write operations, rather than keeping persistent metadata. I think getMissing is more appropriate. But, if you are using queryWriteStatus (and will implement it to essentially be encompass getMissing), then why are you expiring the item on false? What if there's an ongoing write that's just not finished yet?

I don't know what getMissing is, it doesn't appear in the bytestream API. Do you mean findMissingBlobs from the cas? The presumption for me was that we could use a bytestream only implementation for this cas, instead of requiring the (less common) ContentAddressableStorage implementation from remote_execution.

I'm expiring it so that on any access (since I don't have a callback mechanism for bytestream, though this might be a good case for a watcher), either get or contains, we trigger the onExpiration for a key. poor man's event system.

Note the comparison to the committed size: only the completed upload with committed size maching the digest size is considered 'contained' - we cannot return true for contains when the committed size has not reached the digest-indicated size. This check was actually missing from get, I will add it.

ola-rozenfeld · 2018-06-18T18:20:14Z

src/main/java/build/buildfarm/cas/GrpcCAS.java

+    return contains;
+  }
+
+  private synchronized void addOnExpiration(Digest digest, Runnable onExpiration) {


Say, how are onExpiration runnables used? I couldn't find where you use them. It's a bit weird to have them on a proxy cache that doesn't manage the actual items -- seems like you can't really guarantee to run them as needed.

I use them to back a map of CompletedOperations and the ActionCache - build.buildfarm.instance.DelegateCASMap does the heavy lifting there - I've found them to be fairly reliable in the memory instance, allowing the backing store to be contributed to by ActionResult and Operation messages for a single overall byte limit (if a bit naive). As noted above, something like Watcher would be required to do this reliably with GrpcCAS.

ola-rozenfeld · 2018-06-18T18:23:46Z

src/main/java/build/buildfarm/cas/GrpcCAS.java

+            }
+          });
+
+  public InputStream newStreamInput(String name) {


What if it throws a NOT_FOUND?

For streaming responses, grpc throws on hasNext() for the iterated replies, the first time the iterator is accessed. ByteStringIteratorInputStream handles this by transforming NOT_FOUND into an IOException with it as a cause.

ola-rozenfeld · 2018-06-18T18:24:37Z

src/main/java/build/buildfarm/ac/GrpcActionCache.java

+
+  @Override
+  public ActionResult get(ActionKey actionKey) {
+    return actionCacheBlockingStub.get().getActionResult(GetActionResultRequest.newBuilder()


Don't you need to transform NOT_FOUND status into null here?

ola-rozenfeld · 2018-06-18T18:29:50Z

src/main/java/build/buildfarm/ac/GrpcActionCache.java

+  }
+
+  @Override
+  public void put(ActionKey actionKey, ActionResult actionResult) {


I'm guessing retries for all these methods are coming later? Because now only the upload has them, IIUC.

ola-rozenfeld · 2018-06-18T18:45:46Z

src/main/java/build/buildfarm/cas/GrpcCAS.java

+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ConcurrentHashMap;
+
+class GrpcCAS implements ContentAddressableStorage {


Note, not for this PR, but for the future: you might want to push the get/putAllBlobs interface down to ContentAddressableStorage, to benefit from batching to reduce latency. Alexis did some performance analysis (results to be published soon) which discovered that the BatchUpdateBlobs method is extremely beneficial for average builds (and, by extension, BatchDownloadBlobs method for the worker). And this was with just using the most trivial bin-packing algorithm.

ola-rozenfeld · 2018-06-18T18:52:53Z

src/main/java/build/buildfarm/instance/memory/MemoryInstance.java

    super(
        name,
        digestUtil,
        contentAddressableStorage,
-        /*actionCache=*/ new DelegateCASMap<ActionKey, ActionResult>(contentAddressableStorage, ActionResult.parser(), digestUtil),
+        /*actionCache=*/ MemoryInstance.createActionCache(config.getActionCacheConfig(), contentAddressableStorage, digestUtil),


now the code is self-explanatory, no need for comments on parameter names, I think.

ola-rozenfeld · 2018-06-18T18:58:15Z

src/main/java/build/buildfarm/instance/memory/MemoryInstance.java

@@ -14,11 +14,17 @@

 package build.buildfarm.instance.memory;


Wait, so now the MemoryInstance can potentially use a gRPC backend? What scenarios do you want to use this in? Do you want to enable only CAS to be gRPC-based, and AC to be memory-based? (The other way around won't work, I think -- you can't proxy the AC but store the blobs in memory -- will that be a problem?)

Maybe this is a different kind of instance, then? I mean, do I understand correctly that a StubInstance is something that gRPC proxies everything, while MemoryInstance after this change could gRPC proxy some things, but not others? Sorry, it's just the distinction between the instance types became more blurry to me at this change.

Yes, this does change the lines a bit. I fought with this logically and came up with the following:

The memory instance is responsible for retaining all ephemeral state in memory for the purpose of execution. It is using well known interfaces for access to the ContentAddressableStorage and ActionCache, which were quite literally set apart in the remote_execution service definition. It is incapable of continuing a watch for a currently executing Operation through termination, and workers which attempt to report back via the OperationQueueService through a termination will be met with failure. Since he's not assuming responsibility for the CAS or AC that he is being passed, and not dropping those other memory-based limitations, it's not a convention break.

werkt

Updated summary to include a reference to the issue that I wanted to add this for.

werkt · 2018-06-28T19:57:24Z

src/main/java/build/buildfarm/cas/GrpcCAS.java

+            }
+          });
+
+  public InputStream newStreamInput(String name) {


For streaming responses, grpc throws on hasNext() for the iterated replies, the first time the iterator is accessed. ByteStringIteratorInputStream handles this by transforming NOT_FOUND into an IOException with it as a cause.

werkt · 2018-06-28T20:06:20Z

src/main/java/build/buildfarm/cas/GrpcCAS.java

+  }
+
+  @Override
+  public boolean contains(Digest digest) {


I don't know what getMissing is, it doesn't appear in the bytestream API. Do you mean findMissingBlobs from the cas? The presumption for me was that we could use a bytestream only implementation for this cas, instead of requiring the (less common) ContentAddressableStorage implementation from remote_execution.

I'm expiring it so that on any access (since I don't have a callback mechanism for bytestream, though this might be a good case for a watcher), either get or contains, we trigger the onExpiration for a key. poor man's event system.

Note the comparison to the committed size: only the completed upload with committed size maching the digest size is considered 'contained' - we cannot return true for contains when the committed size has not reached the digest-indicated size. This check was actually missing from get, I will add it.

werkt · 2018-06-28T20:10:36Z

src/main/java/build/buildfarm/cas/GrpcCAS.java

+    return contains;
+  }
+
+  private synchronized void addOnExpiration(Digest digest, Runnable onExpiration) {


I use them to back a map of CompletedOperations and the ActionCache - build.buildfarm.instance.DelegateCASMap does the heavy lifting there - I've found them to be fairly reliable in the memory instance, allowing the backing store to be contributed to by ActionResult and Operation messages for a single overall byte limit (if a bit naive). As noted above, something like Watcher would be required to do this reliably with GrpcCAS.

werkt · 2018-06-28T20:18:55Z

src/main/java/build/buildfarm/instance/memory/MemoryInstance.java

@@ -14,11 +14,17 @@

 package build.buildfarm.instance.memory;


Yes, this does change the lines a bit. I fought with this logically and came up with the following:

The memory instance is responsible for retaining all ephemeral state in memory for the purpose of execution. It is using well known interfaces for access to the ContentAddressableStorage and ActionCache, which were quite literally set apart in the remote_execution service definition. It is incapable of continuing a watch for a currently executing Operation through termination, and workers which attempt to report back via the OperationQueueService through a termination will be met with failure. Since he's not assuming responsibility for the CAS or AC that he is being passed, and not dropping those other memory-based limitations, it's not a convention break.

The size of the download content must be exactly the digest size to be consistent with contains

googlebot added the cla: yes label Jun 10, 2018

werkt changed the title ~~Server endpoints config~~ L Server endpoints config Jun 10, 2018

werkt requested a review from philwo June 11, 2018 02:11

werkt mentioned this pull request Jun 16, 2018

Alternative cache backends for remote workers #142

Closed

George Gensure and others added 5 commits June 16, 2018 10:14

Specify CAS and AC configs on memory instance

9510292

Provide configuration for the CAS and ActionCache for memory instances. Support a grpc endpoint for both, with mixed mode support for delegation and reorganization of CAS and AC into their own packages.

remove whitespace

0c26752

Exception cleanup for now-abstract methods

cb6604a

Remove debug MISSING_INPUT description

d0522f3

Remove disabled DelegateCASMap inheritance

c48a715

werkt force-pushed the server-endpoints-config branch from 9f2ada1 to c48a715 Compare June 16, 2018 14:19

werkt requested a review from ola-rozenfeld June 16, 2018 14:20

ola-rozenfeld reviewed Jun 18, 2018

View reviewed changes

George Gensure added 2 commits June 23, 2018 00:59

Remove unused map methods

d192242

Move digestMap initialization out of constructor

6cebf82

werkt commented Jun 28, 2018

View reviewed changes

George Gensure and others added 5 commits June 28, 2018 16:23

Remove unused async bsStub

93621d7

Add get test consistent with contains check

af80fed

The size of the download content must be exactly the digest size to be consistent with contains

Removing redundant parameter name comments

2dae44d

Add test to demonstrate NOT_FOUND get null return

662370d

Merge branch 'master' into server-endpoints-config

1357792

ola-rozenfeld approved these changes Aug 13, 2018

View reviewed changes

werkt and others added 2 commits August 21, 2018 10:09

Merge branch 'master' into server-endpoints-config

02ae8ec

Simplify OperationsMap interface and correct merge

52134cf

werkt merged commit 9e85388 into bazelbuild:master Aug 22, 2018

werkt deleted the server-endpoints-config branch December 27, 2018 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L Server endpoints config #153

L Server endpoints config #153

werkt commented Jun 10, 2018 •

edited

Loading

ola-rozenfeld left a comment

ola-rozenfeld Jun 18, 2018

werkt Jun 28, 2018

ola-rozenfeld Jun 18, 2018

werkt Jun 28, 2018

ola-rozenfeld Jun 18, 2018

werkt Jun 28, 2018

ola-rozenfeld Jun 18, 2018

ola-rozenfeld Jun 18, 2018

ola-rozenfeld Jun 18, 2018

ola-rozenfeld Jun 18, 2018

ola-rozenfeld Jun 18, 2018

werkt Jun 28, 2018

werkt left a comment

werkt Jun 28, 2018

werkt Jun 28, 2018

werkt Jun 28, 2018

werkt Jun 28, 2018

L Server endpoints config #153

L Server endpoints config #153

Conversation

werkt commented Jun 10, 2018 • edited Loading

ola-rozenfeld left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

werkt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

werkt commented Jun 10, 2018 •

edited

Loading