RFC: Bazel Query Service #29

alexeagle · 2020-12-09T18:47:02Z

In the BazelCon talk that inspired this repo, at this timestamp:
https://youtu.be/9Dk7mtIm7_A?t=1875
Benjamin talks about how Dropbox operationalized the bazel-diff tool by hosting it as a service. This issue proposes that we implement such a thing in this repo.

Language: Java, since that's what's already used in this repo

Storage: for the cache behavior, we need to store the hashes.json files at a given Git SHA. It should persist over server shutdowns since cache misses introduce a lot of latency. Can make this configurable but AWS S3 seems like the obvious choice most users would want.

Hosting: Ben points out in the talk that a custom load balancer can be needed for this service. So it's not enough to just ship a docker (OCI) image that has a runnable service with networking, we probably need a k8s manifest that also describes how to run a few instances of the query service, health/load checks, and a load balancer that finds available instance to send requests. Maybe even a dynamic scaling to adjust the number of instances.

Getting the code: we'd have to use a git client (probably assume one is on the $PATH and call it as a subprocess). Then we have to checkout the workspace at various SHAs. When a server comes up it should do an initial fetch of the repo before reporting healthy to accept requests. Need to give user configurability to reach their git server (auth keys, etc). Also have to deal with bad git state (maybe just detect and lame duck the server rather than try to repair)

Prior art:

Google has a service "skyframe" that basically gives you this "Bazel query at scale", partly based on Feature request: store/load analysis cache on disk bazelbuild/bazel#11194 and then a bunch of google-internal mechanics around it. I think it's safe to say that no one at Google has time or motivation to refactor that into an open-source shape. Also our scope here would be smaller, not supporting arbitrary bazel queries but only the affectedness calculation.
Dropbox has the implementation Benjamin describes in the talk. Maybe worth discussing with them if they can justify spending time to make that available.

The text was updated successfully, but these errors were encountered:

achew22 · 2020-12-09T19:20:06Z

Getting the code: we'd have to use a git client (probably assume one is on the $PATH and call it as a subprocess)

As a datum, I had a hallway-con conversation with Benjamin and Armoo (I hope I'm spelling that correct, I never had to write it down before) about this exact behavior. They said that they had a lot of issues with using the Git CLI to get a local checkout of files. Inconsistent checkouts, missing files, leaving files on disk that were deleted in a commit. You might consider using something like JGit which can expose any single commit's tree in Java without writing to disk and dealing with those issues.

alexeagle · 2020-12-09T19:25:55Z

Thanks for the tip @achew22 !

zoidyzoidzoid · 2021-07-27T10:03:31Z

We have some rough hopes of building something similar but much simpler in the future too.

The basic idea we had was whenever calculating hashes for a revision, try fetch it from S3, else generate and store it in S3.

arrdem · 2024-04-17T23:31:37Z

We've achieved a "good enough" implementation of this by independently doing what @zoidyzoidzoid suggested. We have a two-level local + s3 cache for the hash blobs produced by bazel-diff. A real analysis server ala skyframe would be amazing, but leveraging bazel-diff this way has already been a significant win in both performance and change detection correctness.

tinder-maxwellelliott mentioned this issue Dec 15, 2020

Migrate to JGit instead of using Git via terminal actions #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Bazel Query Service #29

RFC: Bazel Query Service #29

alexeagle commented Dec 9, 2020

achew22 commented Dec 9, 2020

alexeagle commented Dec 9, 2020

zoidyzoidzoid commented Jul 27, 2021

arrdem commented Apr 17, 2024

RFC: Bazel Query Service #29

RFC: Bazel Query Service #29

Comments

alexeagle commented Dec 9, 2020

achew22 commented Dec 9, 2020

alexeagle commented Dec 9, 2020

zoidyzoidzoid commented Jul 27, 2021

arrdem commented Apr 17, 2024