New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a generic, easy-to-setup distributed caching #904

Closed
damienmg opened this Issue Feb 15, 2016 · 35 comments

Comments

Projects
None yet
@damienmg
Contributor

damienmg commented Feb 15, 2016

Build up on Alpha Lam's initial version to have a viable version for all users of Bazel.

@philwo is synchronizing all the efforts in that perspective.

@tfarina

This comment has been minimized.

Show comment
Hide comment
@tfarina

tfarina Mar 22, 2016

Contributor

@hhclam fyi

Contributor

tfarina commented Mar 22, 2016

@hhclam fyi

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz May 17, 2016

Member

Hi,
What's the status of this issue?

Member

ittaiz commented May 17, 2016

Hi,
What's the status of this issue?

@philwo

This comment has been minimized.

Show comment
Hide comment
@philwo

philwo May 17, 2016

Member

@ittaiz We have prototype implementations of distributed caching and remote execution in Bazel now:

Distributed caching: 79adf59
Remote execution: a1a79cb

Here is some documentation: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/README.md

We will continue to improve this and welcome feedback :) Please file bugs and assign to @hhclam if you encounter issues with these features.

Member

philwo commented May 17, 2016

@ittaiz We have prototype implementations of distributed caching and remote execution in Bazel now:

Distributed caching: 79adf59
Remote execution: a1a79cb

Here is some documentation: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/README.md

We will continue to improve this and welcome feedback :) Please file bugs and assign to @hhclam if you encounter issues with these features.

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz May 17, 2016

Member

got you.
Couple of questions:

  1. Are there any current thoughts on improvements or waiting for feedback? Asking since I saw some thoughts in the google group discussion
  2. Can we have the readme's contents incorporated to the bazel.io site? I looked for it there and couldn't find it. Also maybe update the roadmap that this is done. People just skimming the roadmap might not know it's there (I knew something is there by deep diving into the code).
  3. If I have thoughts they should be written here? Should they start in bazel-discuss and "graduate" here?

Thanks!

Member

ittaiz commented May 17, 2016

got you.
Couple of questions:

  1. Are there any current thoughts on improvements or waiting for feedback? Asking since I saw some thoughts in the google group discussion
  2. Can we have the readme's contents incorporated to the bazel.io site? I looked for it there and couldn't find it. Also maybe update the roadmap that this is done. People just skimming the roadmap might not know it's there (I knew something is there by deep diving into the code).
  3. If I have thoughts they should be written here? Should they start in bazel-discuss and "graduate" here?

Thanks!

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg May 17, 2016

Contributor

It is still experimental so I think we need to at least discuss on the future of the Bazel interface before documenting it (if we going to change it a lot, it is better to not put too much documentation for now).

Contributor

damienmg commented May 17, 2016

It is still experimental so I think we need to at least discuss on the future of the Bazel interface before documenting it (if we going to change it a lot, it is better to not put too much documentation for now).

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz May 17, 2016

Member

ok. I think when people evaluate bazel vs pants vs buck the distributed caching and execution plays a big part.
Having a note saying it's already there in as a prototype and can be used by early adopters, willing to be broken, can encourage people. I'm such a user.

Member

ittaiz commented May 17, 2016

ok. I think when people evaluate bazel vs pants vs buck the distributed caching and execution plays a big part.
Having a note saying it's already there in as a prototype and can be used by early adopters, willing to be broken, can encourage people. I'm such a user.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg May 17, 2016

Contributor

Yeah that sounds reasonable. We should at least do a blog post with basic
documentation of both feature. But we need to sync with Alpha who did
almost all the work.

On Tue, May 17, 2016 at 12:53 PM Ittai Zeidman notifications@github.com
wrote:

ok. I think when people evaluate bazel vs pants vs buck the distributed
caching and execution plays a big part.
Having a note saying it's already there in as a prototype and can be used
by early adopters, willing to be broken, can encourage people. I'm such a
user.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#904 (comment)

Contributor

damienmg commented May 17, 2016

Yeah that sounds reasonable. We should at least do a blog post with basic
documentation of both feature. But we need to sync with Alpha who did
almost all the work.

On Tue, May 17, 2016 at 12:53 PM Ittai Zeidman notifications@github.com
wrote:

ok. I think when people evaluate bazel vs pants vs buck the distributed
caching and execution plays a big part.
Having a note saying it's already there in as a prototype and can be used
by early adopters, willing to be broken, can encourage people. I'm such a
user.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#904 (comment)

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz May 17, 2016

Member

The discussion about where this feature should proceed will be in bazel-dev?

Member

ittaiz commented May 17, 2016

The discussion about where this feature should proceed will be in bazel-dev?

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg May 17, 2016

Contributor

We are planning to have a live discussion then we will follow up on
bazel-dev.

On Tue, May 17, 2016 at 1:21 PM Ittai Zeidman notifications@github.com
wrote:

The discussion about where this feature should proceed will be in
bazel-dev?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#904 (comment)

Contributor

damienmg commented May 17, 2016

We are planning to have a live discussion then we will follow up on
bazel-dev.

On Tue, May 17, 2016 at 1:21 PM Ittai Zeidman notifications@github.com
wrote:

The discussion about where this feature should proceed will be in
bazel-dev?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#904 (comment)

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz May 17, 2016

Member

looking forward to it

Member

ittaiz commented May 17, 2016

looking forward to it

@LavaScornedOven

This comment has been minimized.

Show comment
Hide comment
@LavaScornedOven

LavaScornedOven Nov 10, 2016

Just as a feedback from a random C++ developer, missing distributed cache is a deal breaker for me, especially when I consider that I already got through pain of learning make, QMake, CMake, gradle, and what not, and doing it anew just to get another way to build locally... It just doesn't justify the effort.

LavaScornedOven commented Nov 10, 2016

Just as a feedback from a random C++ developer, missing distributed cache is a deal breaker for me, especially when I consider that I already got through pain of learning make, QMake, CMake, gradle, and what not, and doing it anew just to get another way to build locally... It just doesn't justify the effort.

@damienmg damienmg modified the milestones: 0.6, 0.5 Dec 19, 2016

@softprops

This comment has been minimized.

Show comment
Hide comment
@softprops

softprops Jan 5, 2017

Any progress on this?

softprops commented Jan 5, 2017

Any progress on this?

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jan 9, 2017

Contributor

Not really, documentation is still missing...

Contributor

damienmg commented Jan 9, 2017

Not really, documentation is still missing...

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz Jan 9, 2017

Member

@damienmg not to belittle documentation but does that mean the code itself is ready in your opinion?

Member

ittaiz commented Jan 9, 2017

@damienmg not to belittle documentation but does that mean the code itself is ready in your opinion?

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jan 9, 2017

Contributor

To use for remote caching yes, but it still has no server side. Hazelcast can be used but I am not sure about the stability.

I also prototyped a local disk cache yesterday, see https://cr.bazel.build/8133

Contributor

damienmg commented Jan 9, 2017

To use for remote caching yes, but it still has no server side. Hazelcast can be used but I am not sure about the stability.

I also prototyped a local disk cache yesterday, see https://cr.bazel.build/8133

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jan 9, 2017

Contributor

(side-note: the protocol is still not stable, so we might change stuff in the future)

Contributor

damienmg commented Jan 9, 2017

(side-note: the protocol is still not stable, so we might change stuff in the future)

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz Jan 9, 2017

Member

Thanks for the clarification. Do you expect distributed actions to be supported any time soon?

Member

ittaiz commented Jan 9, 2017

Thanks for the clarification. Do you expect distributed actions to be supported any time soon?

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jan 9, 2017

Contributor

It is in the same state that caching, no server-side, no documentation still subject to change
As said on IRC to @abergmeier, the protocol itself should work.

Contributor

damienmg commented Jan 9, 2017

It is in the same state that caching, no server-side, no documentation still subject to change
As said on IRC to @abergmeier, the protocol itself should work.

@dabrahams

This comment has been minimized.

Show comment
Hide comment
@dabrahams

dabrahams Jan 29, 2017

Hi,

The FAQ that links here says: "The open source Bazel code runs build operations locally. We believe that this is fast enough for most of our users, but work is underway to provide distributed caching."

I think that's going to lead to a lot of disappointment. People that are considering adopting Google's build tool are highly likely to already be frustrated by long build times because they have (smaller-but-still-)very large projects to build. That's certainly how I arrived here, and having cluster-based operation actually supported is what could make it worth an investment to evaluate bazel in-depth.

So, please consider this encouragement to push the feature over the line to completion. Thanks!

dabrahams commented Jan 29, 2017

Hi,

The FAQ that links here says: "The open source Bazel code runs build operations locally. We believe that this is fast enough for most of our users, but work is underway to provide distributed caching."

I think that's going to lead to a lot of disappointment. People that are considering adopting Google's build tool are highly likely to already be frustrated by long build times because they have (smaller-but-still-)very large projects to build. That's certainly how I arrived here, and having cluster-based operation actually supported is what could make it worth an investment to evaluate bazel in-depth.

So, please consider this encouragement to push the feature over the line to completion. Thanks!

@philwo

This comment has been minimized.

Show comment
Hide comment
@philwo

philwo Feb 22, 2017

Member

@dabrahams The feature (remote caching, remote execution and a combination of both) is definitely coming and there's a lot of work related to it going on. We still need some time, though.

Member

philwo commented Feb 22, 2017

@dabrahams The feature (remote caching, remote execution and a combination of both) is definitely coming and there's a lot of work related to it going on. We still need some time, though.

@softprops

This comment has been minimized.

Show comment
Hide comment
@softprops

softprops Apr 13, 2017

Update: were using remote caching in ci at meetup using a mix of the nginx recipie

https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/README.md#nginx-with-webdav-module

With a custom proxy pass to handle http puts that return 200 response ses (I think this is master is fixed to handle nginx's 201)

Things seem to be working really well.

softprops commented Apr 13, 2017

Update: were using remote caching in ci at meetup using a mix of the nginx recipie

https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/README.md#nginx-with-webdav-module

With a custom proxy pass to handle http puts that return 200 response ses (I think this is master is fixed to handle nginx's 201)

Things seem to be working really well.

@asa

This comment has been minimized.

Show comment
Hide comment
@asa

asa Apr 13, 2017

Can you post your nginx config for this? (handling the 201 with proxy_pass)

asa commented Apr 13, 2017

Can you post your nginx config for this? (handling the 201 with proxy_pass)

@ulfjack ulfjack assigned ulfjack and unassigned philwo Apr 25, 2017

@ulfjack

This comment has been minimized.

Show comment
Hide comment
@ulfjack

ulfjack Apr 25, 2017

Contributor

I just wanted to update this. All of my fixes should be in the future 0.5.0 release, except the fix for #2843, though not everyone will be affected by that. There are some bugs in 0.4.5 that may silently fall back on errors or miss certain outputs (though I don't think that you'll get corrupt outputs).

Contributor

ulfjack commented Apr 25, 2017

I just wanted to update this. All of my fixes should be in the future 0.5.0 release, except the fix for #2843, though not everyone will be affected by that. There are some bugs in 0.4.5 that may silently fall back on errors or miss certain outputs (though I don't think that you'll get corrupt outputs).

@ulfjack

This comment has been minimized.

Show comment
Hide comment
@ulfjack

ulfjack May 19, 2017

Contributor

The fix for #2843 actually made it in.

Contributor

ulfjack commented May 19, 2017

The fix for #2843 actually made it in.

@brunobowden

This comment has been minimized.

Show comment
Hide comment
@brunobowden

brunobowden May 19, 2017

brunobowden commented May 19, 2017

@ulfjack

This comment has been minimized.

Show comment
Hide comment
@ulfjack

ulfjack May 19, 2017

Contributor

I think you're referring to #2964. I haven't made any progress on that. I'm currently in NYC, and will only be able to work on it the week after next.

Contributor

ulfjack commented May 19, 2017

I think you're referring to #2964. I haven't made any progress on that. I'm currently in NYC, and will only be able to work on it the week after next.

@softprops

This comment has been minimized.

Show comment
Hide comment
@softprops

softprops May 20, 2017

@ulfjack looking forward to that. Our temporary work around is a wrapper script that curls the cache endpoint to test for a 200 as an isitup test before pretending bazel cache args. A first class fallback would be much more robust!

softprops commented May 20, 2017

@ulfjack looking forward to that. Our temporary work around is a wrapper script that curls the cache endpoint to test for a 200 as an isitup test before pretending bazel cache args. A first class fallback would be much more robust!

@softprops

This comment has been minimized.

Show comment
Hide comment
@softprops

softprops May 20, 2017

@asa

Can you post your nginx config for this? (handling the 201 with proxy_pass)

Sry for the super long delay. our remote cache impl is closed source but I can share our nginx config

we run our cache in k8s as a pod with nginx and a little rustlang app that handles the uploads ( the proxy_pass http://localhost:1337; bit is the rust app )

Our nginx config looks something like

    log_format timed_combined '$remote_addr - $remote_user [$time_local] '
      '"$request" $status $body_bytes_sent '
      '"$http_referer" "$http_user_agent" '
      '$request_time $upstream_response_time $pipe';
    error_log stderr;
    access_log /dev/stdout timed_combined;
    server {
      listen 80;
      return 301 https://$host$request_uri;
    }
    server {
      listen 443 ssl;
      ssl_certificate /etc/nginx/certs/tls.crt;
      ssl_certificate_key /etc/nginx/certs/tls.key;
      ssl_protocols       TLSv1 TLSv1.1 TLSv1.2;
      ssl_ciphers         HIGH:!aNULL:!MD5;
      client_max_body_size 5000M;
      location / {
        auth_basic "Restricted";
        auth_basic_user_file /etc/nginx/auth/htpasswd;
        proxy_pass http://localhost:1337;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
      }
      location /protected {
        internal;
        root  /cache;
      }
    }

The rustlang app handles uploads but less nginx serve the blobs via returning X-Accel-Redirect headers for the content. We force clients through basic auth over https for now because that's about as much security you can pack into the --rest_cache_url flag. The proxy pass app also returns 200 on puts because bazel expects it in 0.4.5. I believe the list of response codes for puts was expanded in unreleased versions.

softprops commented May 20, 2017

@asa

Can you post your nginx config for this? (handling the 201 with proxy_pass)

Sry for the super long delay. our remote cache impl is closed source but I can share our nginx config

we run our cache in k8s as a pod with nginx and a little rustlang app that handles the uploads ( the proxy_pass http://localhost:1337; bit is the rust app )

Our nginx config looks something like

    log_format timed_combined '$remote_addr - $remote_user [$time_local] '
      '"$request" $status $body_bytes_sent '
      '"$http_referer" "$http_user_agent" '
      '$request_time $upstream_response_time $pipe';
    error_log stderr;
    access_log /dev/stdout timed_combined;
    server {
      listen 80;
      return 301 https://$host$request_uri;
    }
    server {
      listen 443 ssl;
      ssl_certificate /etc/nginx/certs/tls.crt;
      ssl_certificate_key /etc/nginx/certs/tls.key;
      ssl_protocols       TLSv1 TLSv1.1 TLSv1.2;
      ssl_ciphers         HIGH:!aNULL:!MD5;
      client_max_body_size 5000M;
      location / {
        auth_basic "Restricted";
        auth_basic_user_file /etc/nginx/auth/htpasswd;
        proxy_pass http://localhost:1337;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
      }
      location /protected {
        internal;
        root  /cache;
      }
    }

The rustlang app handles uploads but less nginx serve the blobs via returning X-Accel-Redirect headers for the content. We force clients through basic auth over https for now because that's about as much security you can pack into the --rest_cache_url flag. The proxy pass app also returns 200 on puts because bazel expects it in 0.4.5. I believe the list of response codes for puts was expanded in unreleased versions.

bazel-io pushed a commit that referenced this issue May 23, 2017

Update remote caching / execution docs
Progress on #904.

PiperOrigin-RevId: 156862823
@dabrahams

This comment has been minimized.

Show comment
Hide comment
@dabrahams

dabrahams May 27, 2017

I'm sure the answers seem obvious to some, but AFAICT there's no definition of “remote cacheing” or “remote execution” anywhere. For example, what exactly is being cached, and what difference does that make to builds? I could (roughly) guess what “remote execution” means, but the one page of documents mentions it only in the title. There's no description of how to set it up or test it as far as I can tell. Currently there's so little explicit information available about either of these features that it's hard to tell whether I should be more than mildly intrigued.

One thing I'm wondering, in particular, is whether I need to set up the same build environment on all the machines in the cluster (like distcc) or whether something like icecream's chroot with archived build environment strategy is being used.

dabrahams commented May 27, 2017

I'm sure the answers seem obvious to some, but AFAICT there's no definition of “remote cacheing” or “remote execution” anywhere. For example, what exactly is being cached, and what difference does that make to builds? I could (roughly) guess what “remote execution” means, but the one page of documents mentions it only in the title. There's no description of how to set it up or test it as far as I can tell. Currently there's so little explicit information available about either of these features that it's hard to tell whether I should be more than mildly intrigued.

One thing I'm wondering, in particular, is whether I need to set up the same build environment on all the machines in the cluster (like distcc) or whether something like icecream's chroot with archived build environment strategy is being used.

@ulfjack

This comment has been minimized.

Show comment
Hide comment
@ulfjack

ulfjack May 27, 2017

Contributor

The preliminary docs are here:
https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/README.md

They are still work in progress.

Remote caching allows Bazel to reuse results from previous builds both your own as well as your coworkers (with the caveat that local toolchains need to be the same), for ~10x faster builds in our experience. Remote execution is another ~10x on top of that if you have enough machines. YMMV.

If you check in your toolchains into your repository, then Bazel uploads the files to the remote machine. If you use local toolchains, you need to match the local environment exactly (more like distcc), but we're also looking at docker containers for remote linux builds (more like icecream). We're looking at providing docker images / hermetic toolchains for at least a few languages to get everyone started quickly. MacOS and Windows support are both still work in progress, and may additionally have legal restrictions.

(Uploading to the remote machine only happens if the machine doesn't have the files cached already.)

Contributor

ulfjack commented May 27, 2017

The preliminary docs are here:
https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/README.md

They are still work in progress.

Remote caching allows Bazel to reuse results from previous builds both your own as well as your coworkers (with the caveat that local toolchains need to be the same), for ~10x faster builds in our experience. Remote execution is another ~10x on top of that if you have enough machines. YMMV.

If you check in your toolchains into your repository, then Bazel uploads the files to the remote machine. If you use local toolchains, you need to match the local environment exactly (more like distcc), but we're also looking at docker containers for remote linux builds (more like icecream). We're looking at providing docker images / hermetic toolchains for at least a few languages to get everyone started quickly. MacOS and Windows support are both still work in progress, and may additionally have legal restrictions.

(Uploading to the remote machine only happens if the machine doesn't have the files cached already.)

@buchgr buchgr assigned buchgr and unassigned ulfjack Dec 22, 2017

@apobbati

This comment has been minimized.

Show comment
Hide comment
@apobbati

apobbati Feb 8, 2018

Is this issue still alive? Any work being done on it actively?

apobbati commented Feb 8, 2018

Is this issue still alive? Any work being done on it actively?

@buchgr

This comment has been minimized.

Show comment
Hide comment
@buchgr
Contributor

buchgr commented Feb 8, 2018

@hhclam

This comment has been minimized.

Show comment
Hide comment
@hhclam

hhclam Feb 8, 2018

Contributor

Nicely written!

Contributor

hhclam commented Feb 8, 2018

Nicely written!

@jgavris

This comment has been minimized.

Show comment
Hide comment
@jgavris

jgavris Feb 8, 2018

Contributor

I concur, very nice work. Also +1, Google Cloud Storage is relatively easy to setup, and saves you from the hassle of having to run a service of your own!

Contributor

jgavris commented Feb 8, 2018

I concur, very nice work. Also +1, Google Cloud Storage is relatively easy to setup, and saves you from the hassle of having to run a service of your own!

@apobbati

This comment has been minimized.

Show comment
Hide comment
@apobbati

apobbati Feb 8, 2018

Awesome! Thanks for sharing

apobbati commented Feb 8, 2018

Awesome! Thanks for sharing

@buchgr buchgr closed this Mar 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment