Skip to content

Commit

Permalink
Add notes to testing/buildbot/README.md about adding tests to the CQ.
Browse files Browse the repository at this point in the history
Ideally we can point folks who are trying to add tests to the CQ here.

Also fix some links to very old code or since-removed builders.

Bug: None
Change-Id: If68f8c4ccad744a40bf98a9621c28cb089e05b2a
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3089625
Reviewed-by: Sven Zheng <svenzheng@chromium.org>
Commit-Queue: Ben Pastene <bpastene@chromium.org>
Reviewed-by: Yuly Novikov <ynovikov@chromium.org>
Reviewed-by: Brian Sheedy <bsheedy@chromium.org>
Reviewed-by: Erik Staab <estaab@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1097023}
  • Loading branch information
bpastene authored and Chromium LUCI CQ committed Jan 25, 2023
1 parent 090f700 commit 74fc6cb
Showing 1 changed file with 62 additions and 35 deletions.
97 changes: 62 additions & 35 deletions testing/buildbot/README.md
Expand Up @@ -17,9 +17,9 @@ components should be in components_unittests.

## A tour of the directory

* <master_name\>.json -- buildbot configuration json files. These are used to
* <builder_group\>.json -- test configuration json files. These are used to
configure what tests are run on what builders, in addition to specifying
builder-specific arguments and parameters. They are now autogenerated, mainly
builder-specific arguments and parameters. They are autogenerated, mainly
using the generate_buildbot_json tool in this directory.
* [generate_buildbot_json.py](./generate_buildbot_json.py) -- generates most of
the buildbot json files in this directory, based on data contained in the
Expand Down Expand Up @@ -53,34 +53,32 @@ a standardized format.
### Buildbot configuration json
Logic in the
[Chromium recipe](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipes/chromium.py)
looks up each builder for each master and test generators in
[chromium_tests/steps.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/steps.py)
parse the data. For example, as of
[a6e11220](https://chromium.googlesource.com/chromium/tools/build/+/a6e11220d97d578d6ba091abd68beba28a004722)
[generate_gtest](https://chromium.googlesource.com/chromium/tools/build/+/a6e11220d97d578d6ba091abd68beba28a004722/scripts/slave/recipe_modules/chromium_tests/steps.py#416)
parses any entry in a builder's
['gtest_tests'](https://chromium.googlesource.com/chromium/src/+/5750756522296b2a9a08009d8d2cc90db3b88f56/testing/buildbot/chromium.android.json#1243)
entry.
looks up each builder for each builder group, and the test generators in
[chromium_tests/generators.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/generators.py)
parse the data into structures defined in
[chromium_tests/steps.py.](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/steps.py)

## Making changes

All of the JSON files in this directory are autogenerated. The "how to use"
section below describes the main tool, `generate_buildbot_json.py`, which
manages most of the waterfalls. It's no longer possible to hand-edit the JSON
manages most of the waterfalls. It's not possible to hand-edit the JSON
files; presubmit checks forbid doing so.

Note that trybots mirror regular waterfall bots, with the mapping defined in
Note that trybots mirror regular waterfall bots, with the mapping defined either
in
[trybots.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/trybots.py).
or in the bots' `mirrors = ` attribute in their //infra/config/ definitions.
This means that, as of
[81fcc4bc](https://chromium.googlesource.com/chromium/src/+/81fcc4bc6123ace8dd37db74fd2592e3e15ea46a/testing/buildbot/),
[5af7340b](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/testing/buildbot/),
if you want to edit
[linux_android_rel_ng](https://chromium.googlesource.com/chromium/tools/build/+/59a2653d5f143213f4f166714657808b0c646bd7/scripts/slave/recipe_modules/chromium_tests/trybots.py#142),
[linux-wayland-rel](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/infra/config/subprojects/chromium/try/tryserver.chromium.linux.star#280),
you actually need to edit
[Android Tests](https://chromium.googlesource.com/chromium/src/+/81fcc4bc6123ace8dd37db74fd2592e3e15ea46a/testing/buildbot/chromium.linux.json#23).
[Linux Tests (Wayland)](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/testing/buildbot/waterfalls.pyl#4895).

### Trying the changes on trybots
You should be able to try build changes that affect the trybots directly (for
example, adding a test to linux_android_rel_ng should show up immediately in
example, adding a test to linux-rel should show up immediately in
your tryjob). Non-trybot changes have to be landed manually :(.

## Capacity considerations when editing the configuration files
Expand All @@ -89,25 +87,54 @@ infrastructure has capacity to handle the extra load. This is especially true
for the established
[Chromium CQ builders](https://chromium.googlesource.com/chromium/src/+/HEAD/infra/config/generated/cq-builders.md),
as they operate under strict execution requirements. Make sure to get a resource
owner or a member of Chrome Browser Core EngProd to sign off that there is both
builder and swarmed test shard capacity available.

In particular, pay attention to the capacity of the builder which compiles and
then triggers and collects swarming task shards. If you're adding a new test
suite to a bot, and know that the test suite adds one hour of testing time to
the swarming shards, and know that you have enough swarmed capacity to handle
that one hour of testing, that's a good start. But if that test *also* happens
to run in shards which take 10 minutes longer than any other shards on that
current bot, that means that the top-level builder will also take 10 minutes
longer to run -- or 20 minutes longer if there are failures and retries. Ensure
that the builder pool has enough capacity to handle that increase as well.

Additionally, if your change is expected to increase utilization in the testing
pools by any more than 5 VMs or 50 CPU cores, it will need to be approved via
a resource request. (Consult anyone in //infra/OWNERS if you need help
calculating the resource usage of a test change.) See http://go/i-need-hw
for the steps involved in getting the approval. See [go/estimating-bot-capacity](https://goto.google.com/estimating-bot-capacity)
for guidance on how many hosts to request.
owner or a member of Chrome Browser Infra to sign off that there is both builder
and swarmed test shard capacity available. The suggested process for adding new
test suites to the CQ builders is to:
1. File a bug if one isn't already on-file for the addition of the tests, assign
it to yourself and apply the `Infra>Client>Chrome` component.
2. Add the tests in post-submit only mode, meaning the test would run on
post-submit bots, but not in pre-submit bots (a.k.a CQ bots). This can be
achieved by adding the `'ci_only': True` line to the test's definition in
the pyl files here.
([Example](https://chromium.googlesource.com/chromium/src/+/79ed7956/testing/buildbot/test_suite_exceptions.pyl#934))
3. After a sufficient amount of time (suggest 2 weeks), examine the results of
the test on the affected post-submit builders to determine the amount of
regressions they're catching. Note: unless the new test is providing unique
info/artifacts (e.g. stack traces, log files) that pre-existing tests lack,
exclude any regressions that _other_ tests also caught. We're only interested
in the regressions that these new tests catch alone.
4. If the new tests aren't excessively flaky (use
[this dashboard](http://shortn/_gP9pAC2IS3) to verify) and if they catch a
sufficient number of regressions over that trial period, then they can be
promoted to the CQ. To do so, see the steps below.
**Note:** The precise number of regressions that need to be caught depends on
the runtime of the tests. A large suite like browser_tests would need to
catch multiple per week, while a much smaller one need not catch as many. If
you're unsure if your tests meet the cutoff, proceed with the following steps
and specify how many regressions were caught in the justification of the
resource request. Depending on resources, the resource owners may not approve
of the request. In which case, see step #5.
1. Calculate the amount of machine resources needed for the tests. Googlers
can use [this dashboard](http://shortn/_nyyTPgDJtF) to determine the
amount of bots required by comparing it to a similar suite on the same
builder. Do this for each CQ builder and each suite that's being added.
2. File a [resource request](http://go/file-chrome-resource-bug) for the
required amount of machines. Make sure to specify the correct type of bots
needed (Linux, Windows, Android emulator, Android device, etc).
3. If/when the request is approved and the resources have been deployed, you
can remove the `'ci_only': True` line for the definitions here to start
running the tests on the CQ.
5. If the new tests _don't_ catch regressions sufficiently frequently, then they
don't provide a high-enough signal to warrant running on the CQ.
Consequently, they should remain in post-submit only with a comment
explaining why. This can be revisited if things change.

If your change doesn't affect the CQ but is expected to increase utilization in
the testing pools by any more than 5 VMs or 50 CPU cores, it will still need to
be approved via a resource request. Consult the
[dashboard](http://shortn/_nyyTPgDJtF) linked above to calculate the resource
usage of a test change. See http://go/i-need-hw for the steps involved in
getting the approval.

## How to use the generate_buildbot_json tool
### Test suites
Expand Down

0 comments on commit 74fc6cb

Please sign in to comment.