Skip to content

Commit

Permalink
build: override grpc dependencies with our dependencies
Browse files Browse the repository at this point in the history
Don't get transitive grpc dependencies, use the ones from our `maven_install(...)`

chore(deps): bump protobuf runtime to 3.19.1

chore(deps) add transitive dependencies

feat: add Proto reflection service to shard worker

To aid connection troubleshooting

Bug: Fix Blocked thread in WriteStreamObserver Caused by CASFile Write (bazelbuild#1486)

* Add unit test
* Signal Write on complete

Pin the Java toolchain to `remotejdk_17` (bazelbuild#1509)

Closes bazelbuild#1508

Cleanups:
- remove the unused `ubuntu-bionic` base image
- replace `ubuntu-jammy:jammy-java11-gcc` with `ubuntu-mantic:mantic-java17-gcc`
- replace `amazoncorretto:19` with `ubuntu-mantic:mantic-java17-gcc`
- swap inverted log file names in a file

docs: add markdown language specifiers for code blocks

Support OutputPaths in OutputDirectory

Specifying any number of OutputPaths will ignore OutputFiles (consistent
with uploads). Where an OutputPath specifies an output directory, the
action must be able to create the directory itself.

Permit Absolute Symlink Targets with configuration

Partial specification of the absolute symlink response per REAPI.
Remaining work will be in output identification.

chore: update bazel to 6.4.0 (bazelbuild#1513)

Trying to get more info on the Lombok stamping issue on Windows CI.
See also bazelbuild/bazel#10363 and
bazelbuild/bazel#18185

Rename instance types (bazelbuild#1514)

Create SymlinkNode outputs during upload (bazelbuild#1515)

Default disabled, available with createSymlinkOutputs option in Worker
config.

feat: Implement CAS lease extension (bazelbuild#1455)

Problem

    Enabling the findMissingBlobsViaBackplane flag in BuildfarmServer eliminates the need for the BuildfarmWorker's fmb API call. This BuildfarmWorker:fmb call was also responsible for tracking CAS entry access. As result, our CAS cache eviction strategy shifted from LRU to FIFO.
    When the findMissingBlobsViaBackplane flag is enabled, the buildfarm relies on the backplane as the definitive source for CAS availability. Since we don't update CAS expiry on each access, the backplane will independently expire CAS entries based on the specified cas_expire duration, even if they are actively being read.

Solution

Updated bfServer:fmb call to perform non-blocking fmb calls to workers, allowing these workers to record access for the relevant CAS entries.

Extended expiry duration for available CAS entries in the backplane on each fmb call.

With these changes, we can utilize Bazel's experimental_remote_cache_lease_extension and experimental_remote_cache_ttl flags for incremental builds.

Closes bazelbuild#1428

Bump org.json:json from 20230227 to 20231013 in /admin/main (bazelbuild#1516)

Bumps [org.json:json](https://github.com/douglascrockford/JSON-java) from 20230227 to 20231013.
- [Release notes](https://github.com/douglascrockford/JSON-java/releases)
- [Changelog](https://github.com/stleary/JSON-java/blob/master/docs/RELEASES.md)
- [Commits](https://github.com/douglascrockford/JSON-java/commits)

---
updated-dependencies:
- dependency-name: org.json:json
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Re-add missing graceful shutdown functionality (bazelbuild#1520)

Technically correct to unwrap EE on lock failure

Bump rules_oss_audit and patch for py3.11

Prevent healthStatusManager NPE on start failure

Consistent check for publicName presence

Read through external with query THROUGH=true

Specifying a correlated invocation id with a uri containing a
THROUGH=true query param will cause the CFC to read a blob through an
external input stream, populating locally along the way. This permits
client-based replication of blobs, and can enable N+1 replication and
traffic balancing for reads.

Add --port option to worker

Option to run the worker with a cmdline specification for its gRPC
server port.

Restore worker --root cmdline specification

Root cmdline specification has been broken since the config change of
v2.

Make bf-executor small blob names consistent

Remove the size identification for small blobs when uploading with
bf-executor.

feat: Hot CAS Entries - Update read counts in Redis
  • Loading branch information
jasonschroeder-sfdc authored and amishra-u committed Nov 8, 2023
1 parent 1244187 commit 8ca62bd
Show file tree
Hide file tree
Showing 51 changed files with 777 additions and 213 deletions.
4 changes: 2 additions & 2 deletions .bazelci/run_server_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ bazel build //src/main/java/build/buildfarm:buildfarm-shard-worker
bazel build //src/main/java/build/buildfarm:buildfarm-server

# Start a single worker
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker $(pwd)/examples/config.minimal.yml > server.log 2>&1 &
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker $(pwd)/examples/config.minimal.yml > worker.log 2>&1 &
echo "Started buildfarm-shard-worker..."

# Start a single server
bazel run //src/main/java/build/buildfarm:buildfarm-server $(pwd)/examples/config.minimal.yml > worker.log 2>&1 &
bazel run //src/main/java/build/buildfarm:buildfarm-server $(pwd)/examples/config.minimal.yml > server.log 2>&1 &
echo "Started buildfarm-server..."

echo "Wait for startup to finish..."
Expand Down
6 changes: 6 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
build --java_language_version=17
build --java_runtime_version=remotejdk_17

build --tool_java_language_version=17
build --tool_java_runtime_version=remotejdk_17

common --enable_platform_specific_config

build:fuse --define=fuse=true
Expand Down
2 changes: 1 addition & 1 deletion .bazelversion
Original file line number Diff line number Diff line change
@@ -1 +1 @@
6.2.0
6.4.0
6 changes: 3 additions & 3 deletions BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ sh_binary(
java_image(
name = "buildfarm-server",
args = ["/app/build_buildfarm/examples/config.minimal.yml"],
base = "@amazon_corretto_java_image_base//image",
base = "@ubuntu-mantic//image",
classpath_resources = [
"//src/main/java/build/buildfarm:configs",
],
Expand Down Expand Up @@ -148,14 +148,14 @@ oss_audit(
# Download cgroup-tools so that the worker is able to restrict actions via control groups.
download_pkgs(
name = "worker_pkgs",
image_tar = "@ubuntu-jammy//image",
image_tar = "@ubuntu-mantic//image",
packages = ["cgroup-tools"],
tags = ["container"],
)

install_pkgs(
name = "worker_pkgs_image",
image_tar = "@ubuntu-jammy//image",
image_tar = "@ubuntu-mantic//image",
installables_tar = ":worker_pkgs.tar",
installation_cleanup_commands = "rm -rf /var/lib/apt/lists/*",
output_image_name = "worker_pkgs_image",
Expand Down
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,17 @@ All commandline options override corresponding config settings.

Run via

```
docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:5.0.9
```shell
$ docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:5.0.9
redis-cli config set stop-writes-on-bgsave-error no
```

### Bazel Buildfarm Server

Run via

```
bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>
```shell
$ bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```
Expand All @@ -40,8 +40,8 @@ Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=

Run via

```
bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>
```shell
$ bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml

Expand All @@ -53,9 +53,9 @@ Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm

To use the example configured buildfarm with bazel (version 1.0 or higher), you can configure your `.bazelrc` as follows:

```
```shell
$ cat .bazelrc
build --remote_executor=grpc://localhost:8980
$ build --remote_executor=grpc://localhost:8980
```

Then run your build as you would normally do.
Expand All @@ -67,20 +67,20 @@ Buildfarm uses [Java's Logging framework](https://docs.oracle.com/javase/10/core
You can use typical Java logging configuration to filter these results and observe the flow of executions through your running services.
An example `logging.properties` file has been provided at [examples/logging.properties](examples/logging.properties) for use as follows:

```
bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```

and

```
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
``` shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```

To attach a remote debugger, run the executable with the `--debug=<PORT>` flag. For example:

```
bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml
```shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml
```


Expand Down
48 changes: 25 additions & 23 deletions _site/docs/configuration/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ has_children: true

Minimal required:

```
```yaml
backplane:
redisUri: "redis://localhost:6379"
queues:
Expand All @@ -28,17 +28,18 @@ For an example configuration containing all of the configuration values, see `ex

### Common

| Configuration | Accepted and _Default_ Values | Command Line Argument | Description |
|----------------------|-------------------------------|-----------------------|---------------------------------------------------|
| digestFunction | _SHA256_, SHA1 | | Digest function for this implementation |
| defaultActionTimeout | Integer, _600_ | | Default timeout value for an action (seconds) |
| maximumActionTimeout | Integer, _3600_ | | Maximum allowed action timeout (seconds) |
| maxEntrySizeBytes | Long, _2147483648_ | | Maximum size of a single blob accepted (bytes) |
| prometheusPort | Integer, _9090_ | --prometheus_port | Listening port of the Prometheus metrics endpoint |
| Configuration | Accepted and _Default_ Values | Command Line Argument | Description |
|------------------------------|-------------------------------|-----------------------|--------------------------------------------------------------|
| digestFunction | _SHA256_, SHA1 | | Digest function for this implementation |
| defaultActionTimeout | Integer, _600_ | | Default timeout value for an action (seconds) |
| maximumActionTimeout | Integer, _3600_ | | Maximum allowed action timeout (seconds) |
| maxEntrySizeBytes | Long, _2147483648_ | | Maximum size of a single blob accepted (bytes) |
| prometheusPort | Integer, _9090_ | --prometheus_port | Listening port of the Prometheus metrics endpoint |
| allowSymlinkTargetAbsolute | boolean, _false_ | | Permit inputs to contain symlinks with absolute path targets |

Example:

```
```yaml
digestFunction: SHA1
defaultActionTimeout: 1800
maximumActionTimeout: 1800
Expand Down Expand Up @@ -79,7 +80,7 @@ worker:

Example:

```
```yaml
server:
instanceType: SHARD
name: shard
Expand All @@ -96,7 +97,7 @@ server:

Example:

```
```yaml
server:
grpcMetrics:
enabled: false
Expand All @@ -114,7 +115,7 @@ server:

Example:

```
```yaml
server:
caches:
directoryCacheMaxEntries: 10000
Expand All @@ -132,7 +133,7 @@ server:

Example:

```
```yaml
server:
admin:
deploymentEnvironment: AWS
Expand All @@ -151,14 +152,14 @@ server:

Example:

```
```yaml
server:
metrics:
publisher: log
logLevel: INFO
```

```
```yaml
server:
metrics:
publisher: aws
Expand Down Expand Up @@ -207,7 +208,7 @@ server:

Example:

```
```yaml
backplane:
type: SHARD
redisUri: "redis://localhost:6379"
Expand All @@ -224,7 +225,7 @@ backplane:

Example:

```
```yaml
backplane:
type: SHARD
redisUri: "redis://localhost:6379"
Expand Down Expand Up @@ -261,8 +262,9 @@ backplane:
| errorOperationRemainingResources | boolean, _false_ | | |
| realInputDirectories | List of Strings, _external_ | | A list of paths that will not be subject to the effects of linkInputDirectories setting, may also be used to provide writable directories as input roots for actions which expect to be able to write to an input location and will fail if they cannot |
| gracefulShutdownSeconds | Integer, 0 | | Time in seconds to allow for operations in flight to finish when shutdown signal is received |
| createSymlinkOutputs | boolean, _false_ | | Creates SymlinkNodes for symbolic links discovered in output paths for actions. No verification of the symlink target path occurs. Buildstream, for example, requires this. |

```
```yaml
worker:
port: 8981
publicName: "localhost:8981"
Expand All @@ -279,7 +281,7 @@ worker:

Example:

```
```yaml
worker:
capabilities:
cas: true
Expand All @@ -296,7 +298,7 @@ worker:

Example:

```
```yaml
worker:
sandboxSettings:
alwaysUse: true
Expand All @@ -313,7 +315,7 @@ worker:

Example:

```
```yaml
worker:
dequeueMatchSettings:
acceptEverything: true
Expand All @@ -333,7 +335,7 @@ worker:

Example:

```
```yaml
worker:
storages:
- type: FILESYSTEM
Expand Down Expand Up @@ -361,7 +363,7 @@ worker:

Example:

```
```yaml
worker:
executionPolicies:
- name: test
Expand Down
26 changes: 17 additions & 9 deletions _site/docs/execution/execution_policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This policy type specifies that a worker should prepend a single path, and a num

This example will use the buildfarm-provided executable `as-nobody`, which will upon execution demote itself to a `nobody` effective process owner uid, and perform an `execvp(2)` with the remaining provided program arguments, which will subsequently execute as a user that no longer matches the worker process.

```
```yaml
# default wrapper policy application
worker:
executionPolicies:
Expand Down Expand Up @@ -50,32 +50,37 @@ These wrappers are used for detecting actions that rely on time. Below is a dem
This addresses two problems in regards to an action's dependence on time. The 1st problem is when an action takes longer than it should because it's sleeping unnecessarily. The 2nd problem is when an action relies on time which causes it to eventually be broken on master despite the code not changing. Both problems are expressed below as unit tests. We demonstrate a time-spoofing mechanism (the re-writing of syscalls) which allows us to detect these problems generically over any action. The objective is to analyze builds for performance inefficiency and discover future instabilities before they occur.

### Issue 1 (slow test)
```

```bash
#!/bin/bash
set -euo pipefail

echo -n "testing... "
sleep 10;
echo "done"
```

The test takes 10 seconds to run on average.
```
bazel test --runs_per_test=10 --config=remote //cloud/buildfarm:sleep_test

```shell
$ bazel test --runs_per_test=10 --config=remote //cloud/buildfarm:sleep_test
//cloud/buildfarm:sleep_test PASSED in 10.2s
Stats over 10 runs: max = 10.2s, min = 10.1s, avg = 10.2s, dev = 0.0s
```

We can check for performance improvements by using the `skip-sleep` option.
```
bazel test --runs_per_test=10 --config=remote --remote_default_exec_properties='skip-sleep=true' //cloud/buildfarm:sleep_test

```shell
$ bazel test --runs_per_test=10 --config=remote --remote_default_exec_properties='skip-sleep=true' //cloud/buildfarm:sleep_test
//cloud/buildfarm:sleep_test PASSED in 1.0s
Stats over 10 runs: max = 1.0s, min = 0.9s, avg = 1.0s, dev = 0.0s
```

Now the test is 10x faster. If skipping sleep makes an action perform significantly faster without affecting its success rate, that would warrant further investigation into the action's implementation.

### Issue 2 (future failing test)
```

```bash
#!/bin/bash
set -euo pipefail

Expand All @@ -89,12 +94,15 @@ echo "Times change."
date
exit -1;
```

The test passes today, but will it pass tomorrow? Will it pass a year from now? We can find out by using the `time-shift` option.
```
bazel test --test_output=streamed --remote_default_exec_properties='time-shift=31556952' --config=remote //cloud/buildfarm:future_fail

```shell
$ bazel test --test_output=streamed --remote_default_exec_properties='time-shift=31556952' --config=remote //cloud/buildfarm:future_fail
INFO: Found 1 test target...
Times change.
Mon Sep 25 18:31:09 UTC 2023
//cloud/buildfarm:future_fail FAILED in 18.0s
```

Time is shifted to the year 2023 and the test now fails. We can fix the problem before others see it.
Loading

0 comments on commit 8ca62bd

Please sign in to comment.