Skip to content

Commit

Permalink
parent 90439ca
Browse files Browse the repository at this point in the history
author Anshuman Mishra <a.mishra@uber.com> 1696277984 -0700
committer Anshuman Mishra <a.mishra@uber.com> 1700707781 -0800

parent 90439ca
author Anshuman Mishra <a.mishra@uber.com> 1696277984 -0700
committer Anshuman Mishra <a.mishra@uber.com> 1700707757 -0800

parent 90439ca
author Anshuman Mishra <a.mishra@uber.com> 1696277984 -0700
committer Anshuman Mishra <a.mishra@uber.com> 1700707511 -0800

feat: Hot CAS Entries - Implement CAS access metrics recorder

Log on write errors

Use integer ids for Sqlite bidirectional index

The cost in size for a single table bidirectional index is vast compared
to the use of 3nf integer keys. Experimental estimates offer a decrease
in file size of 90%.

Update graceful shutdown functionality to better handle worker terminations (buildfarm#1462)

Manipulate worker set directly in RSB

Avoid dependency on subscriber to update state changes when removing
workers. This prevents an NPE which will occur invariably when workers
are allowably configured with subscribeToBackplane: false.

Remove publishTtlMetric option

The individual metric controls for ttl are not necessary either for
performance or feature support. Use Files' attributes acquisition
mechanism for modified time.

Config-compatible behavior for publishTtlMetric

Correct logging advisements for current Java

Java logging definitions must now match java.util.logging.config.file,
update these specifications in our README.md

Rename GracefulShutdownTest

Remove WebController

Interrupt+Join operationQueuer/dispatchMonitor

Use interrupt to halt the OperationQueuer.
Join on both operationQueuer and dispatchMonitor before instance stop
return.

Present operationNames by stage

Include Match and Report Result stages in output
Record the active operationName occupying slots in each of the stages
and present them with WorkerProfile
Avoid several unnecessary casts with interfaces for operation slot
stages.

Remove subscribeToBackplane, adjust failsafe op

A shard server is impractical without operation subscription, partition
subscription confirmation between servers and workers.
The failsafe execution is configuration that is likely not desired on
workers. This change removes the failsafe behavior from workers via
backplane config, and relegates the setting of failsafe boolean to
server config. If the option is restored for workers, it can be added to
worker configs so that configs may continue to be shared between workers
and servers and retain independent addressability.

Removing AWS/GCP Metrics and Admin controls

Internally driven metrics and scaling controls have low, if any, usage
rates. Prometheus has largely succeeded independent publication of
metrics, and externally driven scaling is the norm. These modules have
been incomplete between cloud providers, and for the functional side of
AWS, bind us to springboot. Removing them for the sake of reduced
dependencies and complexity.

Remove unused setOnCancelHandler

Remove this unused OperationQueue feature which provides no invocations
on any use.

Update BWoB docs for ensureOutputsPresent

Improve unit test

Disable Bzlmod explicitly in .bazelrc

Log write errors with worker address

Revert "Use integer ids for Sqlite bidirectional index"

This reverts commit f651cdb.

Common String.format for PipelineStage

Cleanup matched logic in SWC listener

Continue the loop while we have *not* matched successfully and avoid a
confusing inversion in getMatched()

Refactor SWC matcher and clarify Nullable

Distinguish the valid/unique/propagating methods of entry listening.

Interrupt matchStage to induce prepare shutdown

The only signal to a waiting match that will halt its current listen
loop for a valid unique operation is an interrupt.

Specify example config with grpc target

Distinguish target param with GRPC type storage from FILESYSTEM
definition

Remove SpringBoot usage

Reinstate prior usage of LoggingMain for safe shutdown, with added
release mechanism for interrupted processes. All invoked shutdowns are
graceful, with vastly improved shutdown speed for empty workers waiting
for pipeline stages.

Enable graceful shutdown for server (buildfarm#1490)

refactor: code cleanup

Tiny code cleanup

Log paths created on putDirectory

Will include operation root and inform directory cache effectiveness.

Permit regex realInputDirectories

Selecting realInputDirectories by regex permits flexible patterns that
can yield drastic improvements in directory reuse for specialized
deployments. runfiles in particular are hazardous expansions of
nearly-execroot in the case of bazel.

Care must be taken to match directories exclusively.
The entire input tree is traversed for matches against expanded paths
under the root, to allow for nested selection.
Each match thus costs the number of input directories.
Counterintuitively, OutputFiles are augmented to avoid the recursive
check for OutputDirectories which only applies to actual reported
results, resulting in a path match when creating the exec root.
Regex style is java.util.Pattern, and must match the full input
directory.

Log execPath rather than the cache dir path

This will include the path to the missed directory and the operation
which required it.

Shore up OutputDirectory for silence on duplicates

Prevent adding duplicate realInputDirectories matches

Trigger realInputDirectories to have empty files

Ensure that the last leg of the execution presents a directory, rather
than the parent, per OutputDirectory's stamping.

Switch to positive check for linkInputDirectories

docs(configuration): document --prometheus_port CLI argument

docs(configuration): readability and typos

style(configuration.md): table formatting

feat: support --redis_uri command line option

Support a `--redis_uri` command line option for start-up.

docs(configuration): document the --redis_uri command line options

also fixed some spelling typos.

Example should use `container_image` instead of `java_image`

chore: bump rules_jvm_external

Bumping 4.2 -> 5.3

chore: bump rules_cc

Bump fro 0.0.6 -> 0.0.9

Implement local resources for workers (buildfarm#1282)

Suppress unused warning

Bump bazel version, otherwise some test fail with System::setSecurityManager

Revert bazel upgrade

New line at end of file

feat: Hot CAS Entries - Update read counts in Redis

feat: Hot CAS Entries - Final Integration

build: override grpc dependencies with our dependencies

Don't get transitive grpc dependencies, use the ones from our `maven_install(...)`

chore(deps): bump protobuf runtime to 3.19.1

chore(deps) add transitive dependencies

feat: add Proto reflection service to shard worker

To aid connection troubleshooting

Bug: Fix Blocked thread in WriteStreamObserver Caused by CASFile Write (buildfarm#1486)

* Add unit test
* Signal Write on complete

Pin the Java toolchain to `remotejdk_17` (buildfarm#1509)

Closes buildfarm#1508

Cleanups:
- remove the unused `ubuntu-bionic` base image
- replace `ubuntu-jammy:jammy-java11-gcc` with `ubuntu-mantic:mantic-java17-gcc`
- replace `amazoncorretto:19` with `ubuntu-mantic:mantic-java17-gcc`
- swap inverted log file names in a file

docs: add markdown language specifiers for code blocks

Support OutputPaths in OutputDirectory

Specifying any number of OutputPaths will ignore OutputFiles (consistent
with uploads). Where an OutputPath specifies an output directory, the
action must be able to create the directory itself.

Permit Absolute Symlink Targets with configuration

Partial specification of the absolute symlink response per REAPI.
Remaining work will be in output identification.

chore: update bazel to 6.4.0 (buildfarm#1513)

Trying to get more info on the Lombok stamping issue on Windows CI.
See also bazelbuild/bazel#10363 and
bazelbuild/bazel#18185

Rename instance types (buildfarm#1514)

Create SymlinkNode outputs during upload (buildfarm#1515)

Default disabled, available with createSymlinkOutputs option in Worker
config.

feat: Implement CAS lease extension (buildfarm#1455)

Problem

    Enabling the findMissingBlobsViaBackplane flag in BuildfarmServer eliminates the need for the BuildfarmWorker's fmb API call. This BuildfarmWorker:fmb call was also responsible for tracking CAS entry access. As result, our CAS cache eviction strategy shifted from LRU to FIFO.
    When the findMissingBlobsViaBackplane flag is enabled, the buildfarm relies on the backplane as the definitive source for CAS availability. Since we don't update CAS expiry on each access, the backplane will independently expire CAS entries based on the specified cas_expire duration, even if they are actively being read.

Solution

Updated bfServer:fmb call to perform non-blocking fmb calls to workers, allowing these workers to record access for the relevant CAS entries.

Extended expiry duration for available CAS entries in the backplane on each fmb call.

With these changes, we can utilize Bazel's experimental_remote_cache_lease_extension and experimental_remote_cache_ttl flags for incremental builds.

Closes buildfarm#1428

Bump org.json:json from 20230227 to 20231013 in /admin/main (buildfarm#1516)

Bumps [org.json:json](https://github.com/douglascrockford/JSON-java) from 20230227 to 20231013.
- [Release notes](https://github.com/douglascrockford/JSON-java/releases)
- [Changelog](https://github.com/stleary/JSON-java/blob/master/docs/RELEASES.md)
- [Commits](https://github.com/douglascrockford/JSON-java/commits)

---
updated-dependencies:
- dependency-name: org.json:json
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Re-add missing graceful shutdown functionality (buildfarm#1520)

Technically correct to unwrap EE on lock failure

Bump rules_oss_audit and patch for py3.11

Prevent healthStatusManager NPE on start failure

Consistent check for publicName presence

Read through external with query THROUGH=true

Specifying a correlated invocation id with a uri containing a
THROUGH=true query param will cause the CFC to read a blob through an
external input stream, populating locally along the way. This permits
client-based replication of blobs, and can enable N+1 replication and
traffic balancing for reads.

Add --port option to worker

Option to run the worker with a cmdline specification for its gRPC
server port.

Restore worker --root cmdline specification

Root cmdline specification has been broken since the config change of
v2.

Make bf-executor small blob names consistent

Remove the size identification for small blobs when uploading with
bf-executor.

feat: Hot CAS Entries - Update read counts in Redis

chore(deps): bump protobuf runtime to 3.19.1

chore(deps) add transitive dependencies

feat: add Proto reflection service to shard worker

To aid connection troubleshooting

Bug: Fix Blocked thread in WriteStreamObserver Caused by CASFile Write (buildfarm#1486)

* Add unit test
* Signal Write on complete

Pin the Java toolchain to `remotejdk_17` (buildfarm#1509)

Closes buildfarm#1508

Cleanups:
- remove the unused `ubuntu-bionic` base image
- replace `ubuntu-jammy:jammy-java11-gcc` with `ubuntu-mantic:mantic-java17-gcc`
- replace `amazoncorretto:19` with `ubuntu-mantic:mantic-java17-gcc`
- swap inverted log file names in a file

docs: add markdown language specifiers for code blocks

Support OutputPaths in OutputDirectory

Specifying any number of OutputPaths will ignore OutputFiles (consistent
with uploads). Where an OutputPath specifies an output directory, the
action must be able to create the directory itself.

chore: update bazel to 6.4.0 (buildfarm#1513)

Trying to get more info on the Lombok stamping issue on Windows CI.
See also bazelbuild/bazel#10363 and
bazelbuild/bazel#18185

Rename instance types (buildfarm#1514)

Create SymlinkNode outputs during upload (buildfarm#1515)

Default disabled, available with createSymlinkOutputs option in Worker
config.

feat: Implement CAS lease extension (buildfarm#1455)

Problem

    Enabling the findMissingBlobsViaBackplane flag in BuildfarmServer eliminates the need for the BuildfarmWorker's fmb API call. This BuildfarmWorker:fmb call was also responsible for tracking CAS entry access. As result, our CAS cache eviction strategy shifted from LRU to FIFO.
    When the findMissingBlobsViaBackplane flag is enabled, the buildfarm relies on the backplane as the definitive source for CAS availability. Since we don't update CAS expiry on each access, the backplane will independently expire CAS entries based on the specified cas_expire duration, even if they are actively being read.

Solution

Updated bfServer:fmb call to perform non-blocking fmb calls to workers, allowing these workers to record access for the relevant CAS entries.

Extended expiry duration for available CAS entries in the backplane on each fmb call.

With these changes, we can utilize Bazel's experimental_remote_cache_lease_extension and experimental_remote_cache_ttl flags for incremental builds.

Closes buildfarm#1428

Bump org.json:json from 20230227 to 20231013 in /admin/main (buildfarm#1516)

Bumps [org.json:json](https://github.com/douglascrockford/JSON-java) from 20230227 to 20231013.
- [Release notes](https://github.com/douglascrockford/JSON-java/releases)
- [Changelog](https://github.com/stleary/JSON-java/blob/master/docs/RELEASES.md)
- [Commits](https://github.com/douglascrockford/JSON-java/commits)

---
updated-dependencies:
- dependency-name: org.json:json
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Re-add missing graceful shutdown functionality (buildfarm#1520)

Technically correct to unwrap EE on lock failure

Bump rules_oss_audit and patch for py3.11

Prevent healthStatusManager NPE on start failure

Consistent check for publicName presence

Read through external with query THROUGH=true

Specifying a correlated invocation id with a uri containing a
THROUGH=true query param will cause the CFC to read a blob through an
external input stream, populating locally along the way. This permits
client-based replication of blobs, and can enable N+1 replication and
traffic balancing for reads.

Add --port option to worker

Option to run the worker with a cmdline specification for its gRPC
server port.

Restore worker --root cmdline specification

Root cmdline specification has been broken since the config change of
v2.

Make bf-executor small blob names consistent

Remove the size identification for small blobs when uploading with
bf-executor.

Configured output size operation failure

Permit installations to control the failure process for operations which
produce outputs larger than the maxEntrySizeBytes. A default value
of false retains the existing behavior which appears transient and
blacklists the executed action key. When enabled, the action will fail
under an invalid violation that indicates user error.

Restore abbrev port as -p

Update zstd-jni for latest version

There's been a few releases of it by now and this pulls the latest. For
buildfarm, notable changes included performance enhancments during
decompression.

See:
https://github.com/facebook/zstd/releases/tag/v1.5.5

Attempt to resolve windows stamping

Bug: Fix workerSet update logic for RemoteCasWriter

Detail storage requirements

Update for further docs related to storage+type functionality
Remove outdated Operation Queue worker definitions

Fix worker execution env title

Add storage example descriptions

Check for context cancelled before responding to error (buildfarm#1526)

When a write fails because the write was already cancelled before due to something like deadline exceeded, we get an unknown error. The exception comes from here and when it gets to errorResponse(), it only checks if status code is cancelled. In this case the status code is unknown, so we need to check if context is cancelled to prevent responseObserver from being invoked

The code change adds checking if context is cancelled and a unit test testing when the exception has context cancelled.

chore(deps): bump com.google.errorprone:error-prone

Release notes: https://github.com/google/error-prone/releases/tag/v2.22.0

Write logs and cleaup

Run formatter

Fix main merge

remove cleanup

Minor updates

Worker name execution properties matching

 updates

 updates

 updates

 updates

 updates

Update ShardWorkerContext.java

Update ShardWorkerContext.java

Release resources when not keeping an operation (buildfarm#1535)

Update queues.md

Refer to new camelized DMS fields.
Expand predefined dynamic execution property name matches.

Implement custom label header support for Grpc metrics interceptor (buildfarm#1530)

Add an option to provide a list of custom label headers to add to metrics.

Specify direct guava dependency usage (buildfarm#1538)

Testing with bazel HEAD using jdk21 compilation has revealed new
direct dependencies on guava.

Update lombok dependency for jdk21 (buildfarm#1540)

Annotations under lombok were fixed for jdk21 in 1.18.28, update to
current.

Reorganize DequeueMatchEvaluator (buildfarm#1537)

Remove acceptEverything DequeueMatchSetting
Place worker name in workerProvisions
Only enable allowUnmatched effects on key mismatch
Only acquire resources after asserting compatibility
Update documentation to match changes

Upgrade com_google_protobuf for jvm compatibility (buildfarm#1539)

Correct deprecated AccessController usage warning
Requires a newer bazel than 6.4.0 for macos to choose unix toolchain with C++ std=c++14 specification for protobuf->absl dependency.

Create buildfarm-worker-base-build-and-deploy.yml (buildfarm#1534)

Create a github workflow to build base buildfarm worker image.

Add base image generation scripts (buildfarm#1532)

Fix buildfarm-worker-base-build-and-deploy.yml (buildfarm#1541)

Add public buildfarm image generation actions (buildfarm#1542)

Update base image building action (buildfarm#1544)

Add release image generation action (buildfarm#1545)

Limit workflow to canonical repository (buildfarm#1547)

Check for "cores" exec property as min-cores match (buildfarm#1548)

The execution platform property "cores" is detailed in documentation as
specifying "min-cores" and "max-cores". Match this definition and
prevent "cores" from being evaluated as a strict match with the worker
provision properties (with likely rejection).

Consider output_* as relative to WD (buildfarm#1550)

Per the REAPI spec:

`The paths are relative to the working directory of the action
execution.`

Prefix the WorkingDirectory to paths used as OutputDirectory parameters,
and verify that these are present in the layout of the directory for
use.

Implement Persistent Workers as an execution path (buildfarm#1260)

Followup to buildfarm#1195

Add a new execution pathway in worker/Executor.java to use persistent workers via PersistentExecutor, like DockerExecutor.

Mostly unchanged from the form we used to experiment back at Twitter, but now with tests.

Co-authored-by: Shane Delmore shane@delmore.io

Locate Output Paths relative to WorkingDirectory (buildfarm#1553)

* Locate Output Paths relative to WorkingDirectory

Required as a corollary to OutputDirectory changes to consider outputs
as relative to working directory.

* Windows builds emit relativize paths with native separators

Remove incorrect external resolve of WD on upload (buildfarm#1554)

Previous patch included a change in actionRoot parameter, expecting it
to prefer the working directory rooted path to discover outputs. Might
want to reapply this later, but for now leave the resolution in
uploadOutputs.

empty
  • Loading branch information
amishra-u committed Nov 23, 2023
1 parent 90439ca commit a110fe8
Show file tree
Hide file tree
Showing 168 changed files with 5,726 additions and 2,264 deletions.
8 changes: 8 additions & 0 deletions .bazelci/presubmit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ tasks:
name: "Unit Tests"
build_targets:
- "..."
build_flags:
- "--build_tag_filters=-container"
test_flags:
- "--test_tag_filters=-integration,-redis"
test_targets:
Expand All @@ -49,13 +51,18 @@ tasks:
name: "Unit Tests"
build_targets:
- "..."
build_flags:
- "--build_tag_filters=-container"
test_flags:
- "--test_tag_filters=-integration,-redis"
test_targets:
- "..."
macos:
name: "Unit Tests"
environment:
USE_BAZEL_VERSION: 17be878292730359c9c90efdceabed26126df7ae
build_flags:
- "--cxxopt=-std=c++14"
- "--build_tag_filters=-container"
build_targets:
- "..."
Expand All @@ -70,6 +77,7 @@ tasks:
build_targets:
- "..."
test_flags:
- "--@rules_jvm_external//settings:stamp_manifest=False"
- "--test_tag_filters=-integration,-redis"
test_targets:
- "..."
Expand Down
4 changes: 2 additions & 2 deletions .bazelci/run_server_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ bazel build //src/main/java/build/buildfarm:buildfarm-shard-worker
bazel build //src/main/java/build/buildfarm:buildfarm-server

# Start a single worker
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker $(pwd)/examples/config.minimal.yml > server.log 2>&1 &
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker $(pwd)/examples/config.minimal.yml > worker.log 2>&1 &
echo "Started buildfarm-shard-worker..."

# Start a single server
bazel run //src/main/java/build/buildfarm:buildfarm-server $(pwd)/examples/config.minimal.yml > worker.log 2>&1 &
bazel run //src/main/java/build/buildfarm:buildfarm-server $(pwd)/examples/config.minimal.yml > server.log 2>&1 &
echo "Started buildfarm-server..."

echo "Wait for startup to finish..."
Expand Down
10 changes: 10 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
build --java_language_version=17
build --java_runtime_version=remotejdk_17

build --tool_java_language_version=17
build --tool_java_runtime_version=remotejdk_17

common --enable_platform_specific_config

build:fuse --define=fuse=true
Expand All @@ -14,3 +20,7 @@ test --test_tag_filters=-redis,-integration
# Ensure buildfarm is compatible with future versions of bazel.
# https://buildkite.com/bazel/bazelisk-plus-incompatible-flags
common --incompatible_disallow_empty_glob

# TODO: migrate all dependencies from WORKSPACE to MODULE.bazel
# https://github.com/bazelbuild/bazel-buildfarm/issues/1479
common --noenable_bzlmod
2 changes: 1 addition & 1 deletion .bazelversion
Original file line number Diff line number Diff line change
@@ -1 +1 @@
6.1.2
6.4.0
31 changes: 31 additions & 0 deletions .github/workflows/buildfarm-images-build-and-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Build and Push Latest Buildfarm Images

on:
push:
branches:
- main

jobs:
build:
if: github.repository == 'bazelbuild/bazel-buildfarm'
name: Build Buildfarm Images
runs-on: ubuntu-latest
steps:
- uses: bazelbuild/setup-bazelisk@v2

- name: Checkout
uses: actions/checkout@v3

- name: Login to Bazelbuild Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.BAZELBUILD_DOCKERHUB_USERNAME }}
password: ${{ secrets.BAZELBUILD_DOCKERHUB_TOKEN }}

- name: Build Server Image
id: buildAndPushServerImage
run: bazel run public_push_buildfarm-server --define release_version=latest

- name: Build Worker Image
id: buildAndPushWorkerImage
run: bazel run public_push_buildfarm-worker --define release_version=latest
30 changes: 30 additions & 0 deletions .github/workflows/buildfarm-release-build-and-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Build and Push Buildfarm Releases

on:
release:
types: [published]

jobs:
build:
if: github.repository == 'bazelbuild/bazel-buildfarm'
name: Build Buildfarm Images
runs-on: ubuntu-latest
steps:
- uses: bazelbuild/setup-bazelisk@v2

- name: Checkout
uses: actions/checkout@v3

- name: Login to Bazelbuild Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.BAZELBUILD_DOCKERHUB_USERNAME }}
password: ${{ secrets.BAZELBUILD_DOCKERHUB_TOKEN }}

- name: Build Server Image
id: buildAndPushServerImage
run: bazel run public_push_buildfarm-server --define release_version=${{ github.event.release.tag_name }}

- name: Build Worker Image
id: buildAndPushWorkerImage
run: bazel run public_push_buildfarm-worker --define release_version=${{ github.event.release.tag_name }}
39 changes: 39 additions & 0 deletions .github/workflows/buildfarm-worker-base-build-and-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Build and Push Base Buildfarm Worker Images

on:
push:
branches:
- main
paths:
- ci/base-worker-image/jammy/Dockerfile
- ci/base-worker-image/mantic/Dockerfile
jobs:
build:
if: github.repository == 'bazelbuild/bazel-buildfarm'
name: Build Base Buildfarm Worker Image
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Login to Bazelbuild Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.BAZELBUILD_DOCKERHUB_USERNAME }}
password: ${{ secrets.BAZELBUILD_DOCKERHUB_TOKEN }}

- name: Build Jammy Docker image
uses: docker/build-push-action@3b5e8027fcad23fda98b2e3ac259d8d67585f671
with:
context: .
file: ./ci/base-worker-image/jammy/Dockerfile
push: true
tags: bazelbuild/buildfarm-worker-base:jammy

- name: Build Mantic Docker image
uses: docker/build-push-action@3b5e8027fcad23fda98b2e3ac259d8d67585f671
with:
context: .
file: ./ci/base-worker-image/mantic/Dockerfile
push: true
tags: bazelbuild/buildfarm-worker-base:mantic
28 changes: 25 additions & 3 deletions BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ load("@com_github_bazelbuild_buildtools//buildifier:def.bzl", "buildifier")
load("@io_bazel_rules_docker//java:image.bzl", "java_image")
load("@io_bazel_rules_docker//docker/package_managers:download_pkgs.bzl", "download_pkgs")
load("@io_bazel_rules_docker//docker/package_managers:install_pkgs.bzl", "install_pkgs")
load("@io_bazel_rules_docker//container:container.bzl", "container_image")
load("@io_bazel_rules_docker//container:container.bzl", "container_image", "container_push")
load("@rules_oss_audit//oss_audit:java/oss_audit.bzl", "oss_audit")
load("//:jvm_flags.bzl", "server_jvm_flags", "worker_jvm_flags")

Expand Down Expand Up @@ -148,14 +148,14 @@ oss_audit(
# Download cgroup-tools so that the worker is able to restrict actions via control groups.
download_pkgs(
name = "worker_pkgs",
image_tar = "@ubuntu-jammy//image",
image_tar = "@ubuntu-mantic//image",
packages = ["cgroup-tools"],
tags = ["container"],
)

install_pkgs(
name = "worker_pkgs_image",
image_tar = "@ubuntu-jammy//image",
image_tar = "@ubuntu-mantic//image",
installables_tar = ":worker_pkgs.tar",
installation_cleanup_commands = "rm -rf /var/lib/apt/lists/*",
output_image_name = "worker_pkgs_image",
Expand Down Expand Up @@ -195,3 +195,25 @@ oss_audit(
src = "//src/main/java/build/buildfarm:buildfarm-shard-worker",
tags = ["audit"],
)

# Below targets push public docker images to bazelbuild dockerhub.

container_push(
name = "public_push_buildfarm-server",
format = "Docker",
image = ":buildfarm-server",
registry = "index.docker.io",
repository = "bazelbuild/buildfarm-server",
tag = "$(release_version)",
tags = ["container"],
)

container_push(
name = "public_push_buildfarm-worker",
format = "Docker",
image = ":buildfarm-shard-worker",
registry = "index.docker.io",
repository = "bazelbuild/buildfarm-worker",
tag = "$(release_version)",
tags = ["container"],
)
2 changes: 2 additions & 0 deletions MODULE.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# TODO: migrate all dependencies from WORKSPACE to MODULE.bazel
# https://github.com/bazelbuild/bazel-buildfarm/issues/1479
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,19 @@ All commandline options override corresponding config settings.

Run via

```
docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:5.0.9
```shell
$ docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:5.0.9
redis-cli config set stop-writes-on-bgsave-error no
```

### Bazel Buildfarm Server

Run via

```
bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>
```shell
$ bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Dlogging.config=file:$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```
**`logfile`** has to be in the [standard java util logging format](https://docs.oracle.com/cd/E57471_01/bigData.100/data_processing_bdd/src/rdp_logging_config.html) and passed as a --jvm_flag=-Dlogging.config=file:
**`configfile`** has to be in [yaml format](https://bazelbuild.github.io/bazel-buildfarm/docs/configuration).
Expand All @@ -40,10 +40,10 @@ Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=

Run via

```
bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>
```shell
$ bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Dlogging.config=file:$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml

```
**`logfile`** has to be in the [standard java util logging format](https://docs.oracle.com/cd/E57471_01/bigData.100/data_processing_bdd/src/rdp_logging_config.html) and passed as a --jvm_flag=-Dlogging.config=file:
Expand All @@ -53,9 +53,9 @@ Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm

To use the example configured buildfarm with bazel (version 1.0 or higher), you can configure your `.bazelrc` as follows:

```
```shell
$ cat .bazelrc
build --remote_executor=grpc://localhost:8980
$ build --remote_executor=grpc://localhost:8980
```

Then run your build as you would normally do.
Expand All @@ -67,20 +67,20 @@ Buildfarm uses [Java's Logging framework](https://docs.oracle.com/javase/10/core
You can use typical Java logging configuration to filter these results and observe the flow of executions through your running services.
An example `logging.properties` file has been provided at [examples/logging.properties](examples/logging.properties) for use as follows:

```
bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Dlogging.config=file:$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```

and

```
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Dlogging.config=file:$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
``` shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```

To attach a remote debugger, run the executable with the `--debug=<PORT>` flag. For example:

```
bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml
```shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml
```


Expand Down
29 changes: 16 additions & 13 deletions _site/docs/architecture/queues.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,32 +25,35 @@ If your configuration file does not specify any provisioned queues, buildfarm wi
This will ensure the expected behavior for the paradigm in which all work is put on the same queue.

### Matching Algorithm
The matching algorithm is performed by the operation queue when the caller is requesting to push or pop elements.
The matching algorithm is performed by the operation queue when the server or worker is requesting to push or pop elements, respectively.
The matching algorithm is designed to find the appropriate queue to perform these actions on.
On the scheduler side, the action's platform properties are used for matching.
On the worker side, the `dequeue_match_settings` are used.
![Operation Queue Matching]({{site.url}}{{site.baseurl}}/assets/images/Operation-Queue-Matching1.png)

This is how the matching algorithm works:
The matching algorithm works as follows:
Each provision queue is checked in the order that it is configured.
The first provision queue that is deemed eligible is chosen and used.
When deciding if an action is eligible for the provision queue, each platform property is checked individually.
By default, there must be a perfect match on each key/value.
Wildcards ("*") can be used to avoid the need of a perfect match.
Additionally, if the action contains any platform properties is not mentioned by the provision queue, it will be deemed ineligible.
setting `allow_unmatched: true` can be used to allow a superset of action properties as long as a subset matches the provision queue.
setting `allowUnmatched: true` can be used to allow a superset of action properties as long as a subset matches the provision queue.
If no provision queues can be matched, the operation queue will provide an analysis on why none of the queues were eligible.

When taking elements off of the operation queue, the matching algorithm behaves a similar way.
The worker's `DequeueMatchSettings` also have an `allow_unmatched` property.
Workers also have the ability to reject an operation after matching with a provision queue and dequeuing a value.
To avoid any of these rejections by the worker, you can use `accept_everything: true`.

When configuring your worker, consider the following decisions:
First, if the accept_everything setting is true, the job is accepted.
Otherwise, if any execution property for the queue has a wildcard key, the job is accepted.
Otherwise, if the allow_unmatched setting is true, each key present in the queue's properties must be a wildcard or exist in the execution request's properties with an equal value.
Otherwise, the execution request's properties must have exactly the same set of keys as the queue's execution properties, and the request's value for each property must equal the queue's if the queue's value for this property is not a wildcard.
A worker will dequeue operations from matching queues and determine whether to keep and execute it according to the following procedure:
For each property key-value in the operation's platform, an operation is REJECTED if:
The key is `min-cores` and the integer value is greater than the number of cores on the worker.
Or The key is `min-mem` and the integer value is greater than the number of bytes of RAM on the worker.
Or if the key exists in the `DequeueMatchSettings` platform with neither the value nor a `*` in the corresponding DMS platform key's values,
Or if the `allowUnmatched` setting is `false`.
For each resource requested in the operation's platform with the resource: prefix, the action is rejected if:
The resource amount cannot currently be satisfied with the associated resource capacity count

There are special predefined execution property names which resolve to dynamic configuration for the worker to match against:
`Worker`: The worker's `publicName`
`min-cores`: Less than or equal to the `executeStageWidth`
`process-wrapper`: The set of named `process-wrappers` present in configuration

### Server Example

Expand Down
4 changes: 2 additions & 2 deletions _site/docs/architecture/worker-execution-environment.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: Workers
title: Worker Execution Environment
parent: Architecture
nav_order: 3
---
Expand Down Expand Up @@ -124,4 +124,4 @@ java_image(

And now that this is in place, we can use the following to build the container and make it available to our local docker daemon:

`bazel run :buildfarm-shard-worker-ubuntu20-java14`
`bazel run :buildfarm-shard-worker-ubuntu20-java14`
Loading

0 comments on commit a110fe8

Please sign in to comment.