Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support read-only mode #59

Closed
Monnoroch opened this issue Nov 23, 2018 · 14 comments
Closed

Support read-only mode #59

Monnoroch opened this issue Nov 23, 2018 · 14 comments
Milestone

Comments

@Monnoroch
Copy link

Bazel remote caching has issues with files updating during build. To avoid them I would like to disable writing to remote cache for desktop machines and only allow the CI server to do the writing. There is a bazel flag build --remote_upload_local_results=false that can help me, but developers can still accidentally write to the remote cache. Hence it's best to have two instances of the cache over the same data directory and make one of them read-only to be used by developers and another one read-write for the CI server to cache the "master" branch.

I propose adding a -read-only flag that will allow starting a read-only cache instance that will discard PUT requests.

@buchgr
Copy link
Owner

buchgr commented Nov 26, 2018

Sounds like a great idea! Would you like to work on it @Monnoroch?

I propose adding a -read-only flag that will allow starting a read-only cache instance that will discard PUT requests.

I suppose it should fail with an error instead of silently discarding the bytes or is that what you meant?

@Monnoroch
Copy link
Author

I'm not quite sure on the details. The key thing is that it should not cause bazel to crash or abort the build.

Right now I have implemented this by mounting the data volume as read-only. I didn't expect it, but it just works, despite getting errors on os.Open-s. Good job on robustness! However this means that the flag is more of a convenience, and is thus deprioritised for me. Maybe I'll get to it some day though.

@Monnoroch
Copy link
Author

Monnoroch commented Nov 28, 2018

Upd: unfortunately, this setup doesn't work in a not easily discoverable way. Turns out, the server stores the list of cache entries in memory, thus it's not possible to write to one instance and read from another. This is very unfortunate and also stops this server to be scaled horizontally.

@buchgr do you have any ideas on what would be the best fix for that?

Monnoroch added a commit to Monnoroch/bazel-remote that referenced this issue Nov 28, 2018
See buchgr#59 for more context.
@buchgr
Copy link
Owner

buchgr commented Nov 29, 2018

@Monnoroch that's just an optimization. on startup the server reads metadata from disk and keeps them in memory, but all the information should be stored on disk too.

@nicolov
Copy link
Collaborator

nicolov commented Nov 29, 2018

Yep IIRC, the server looks up a key in the LRU cache before serving a request, so even if the file exists on disk (because it's been uploaded through a different instance), the file can't be served. The server only indexes the disk on startup (and even that gets quite slow). I guess you can add some code that does an os.Stat if the key is not found in the LRU. If the file exists, add it to the front of the LRU, and serve it.

@nicolov
Copy link
Collaborator

nicolov commented Nov 29, 2018

We should also create subdirectories by the first two letters of the shasum, since filesystems don't like having so many files in a single directory.

@Monnoroch
Copy link
Author

I don't think the in-memory cache is a good idea for RO instances. I can see that it's important for writing as it ensures maximum size, but for reading it's not crucial. Maybe even harmful: os.Stat will do a kernel call that will probably hit a page cache anyway. Above I hacked together a --read_only flag that discards writes and disables the in-memory cache completely. Not sending a PR as the code is a bit dirty, but so far it works great.

@buchgr
Copy link
Owner

buchgr commented Nov 29, 2018

@Monnoroch but you will clean it up and send us a PR right? 😛

@Monnoroch
Copy link
Author

@buchgr I'd love to, but I'm afraid I am missing some corner cases. Could you first take a look at my commit and give high-level feedback on my approach and help me identify possible pitfalls I just didn't think about?

@buchgr buchgr added this to the 1.0 milestone Jan 12, 2019
@uri-canva
Copy link

Related: #76

@mostynb
Copy link
Collaborator

mostynb commented Jun 28, 2021

bazel-remote has an --allow_unauthenticated_reads flag which I think covers this use-case now. You can configure CI to authenticate with bazel-remote via basic auth or mTLS, and let untrusted users have readonly access.

@mostynb mostynb closed this as completed Jun 28, 2021
@jheaff1
Copy link

jheaff1 commented Jul 13, 2022

Is it possible to require authentication for all users to read from the cache and only allow specific users (E.g CI) to write to the cache? I’d rather not expose my remote cache containing build artefacts to unauthenticated access, yet I don’t want all authenticated users being able to write to the cache

@mostynb
Copy link
Collaborator

mostynb commented Jul 14, 2022

@jheaff1: I don't think bazel-remote has a good solution for this at the moment. Would you like to open a new issue with a description of this feature request?

In the meantime, you might be able to setup a small proxy that allows authenticated read-only access. There might be some off-the-shelf http solutions that you could use (with some configuration), or you could implement a fairly small gRPC proxy yourself.

@jheaff1
Copy link

jheaff1 commented Jul 15, 2022

@mostynb Thanks for your reply. My use case is actually satisfied by the use of the bazel flag build --remote_upload_local_results=false so I won’t log a feature request 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants