Add notes to testing/buildbot/README.md about adding tests to the CQ.

Ideally we can point folks who are trying to add tests to the CQ here. Also fix some links to very old code or since-removed builders. Bug: None Change-Id: If68f8c4ccad744a40bf98a9621c28cb089e05b2a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3089625 Reviewed-by: Sven Zheng <svenzheng@chromium.org> Commit-Queue: Ben Pastene <bpastene@chromium.org> Reviewed-by: Yuly Novikov <ynovikov@chromium.org> Reviewed-by: Brian Sheedy <bsheedy@chromium.org> Reviewed-by: Erik Staab <estaab@chromium.org> Cr-Commit-Position: refs/heads/main@{#1097023}
chromium · Jan 25, 2023 · 74fc6cb · 74fc6cb
1 parent 090f700
commit 74fc6cb
Showing 1 changed file with 62 additions and 35 deletions.
diff --git a/testing/buildbot/README.md b/testing/buildbot/README.md
@@ -17,9 +17,9 @@ components should be in components_unittests.
 
 ## A tour of the directory
 
-* <master_name\>.json -- buildbot configuration json files. These are used to
+* <builder_group\>.json -- test configuration json files. These are used to
 configure what tests are run on what builders, in addition to specifying
-builder-specific arguments and parameters. They are now autogenerated, mainly
+builder-specific arguments and parameters. They are autogenerated, mainly
 using the generate_buildbot_json tool in this directory.
 * [generate_buildbot_json.py](./generate_buildbot_json.py) -- generates most of
 the buildbot json files in this directory, based on data contained in the
@@ -53,34 +53,32 @@ a standardized format.
 ### Buildbot configuration json
 Logic in the
 [Chromium recipe](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipes/chromium.py)
-looks up each builder for each master and test generators in
-[chromium_tests/steps.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/steps.py)
-parse the data. For example, as of
-[a6e11220](https://chromium.googlesource.com/chromium/tools/build/+/a6e11220d97d578d6ba091abd68beba28a004722)
-[generate_gtest](https://chromium.googlesource.com/chromium/tools/build/+/a6e11220d97d578d6ba091abd68beba28a004722/scripts/slave/recipe_modules/chromium_tests/steps.py#416)
-parses any entry in a builder's
-['gtest_tests'](https://chromium.googlesource.com/chromium/src/+/5750756522296b2a9a08009d8d2cc90db3b88f56/testing/buildbot/chromium.android.json#1243)
-entry.
+looks up each builder for each builder group, and the test generators in
+[chromium_tests/generators.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/generators.py)
+parse the data into structures defined in
+[chromium_tests/steps.py.](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/steps.py)
 
 ## Making changes
 
 All of the JSON files in this directory are autogenerated. The "how to use"
 section below describes the main tool, `generate_buildbot_json.py`, which
-manages most of the waterfalls. It's no longer possible to hand-edit the JSON
+manages most of the waterfalls. It's not possible to hand-edit the JSON
 files; presubmit checks forbid doing so.
 
-Note that trybots mirror regular waterfall bots, with the mapping defined in
+Note that trybots mirror regular waterfall bots, with the mapping defined either
+in
 [trybots.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/trybots.py).
+or in the bots' `mirrors = ` attribute in their //infra/config/ definitions.
 This means that, as of
-[81fcc4bc](https://chromium.googlesource.com/chromium/src/+/81fcc4bc6123ace8dd37db74fd2592e3e15ea46a/testing/buildbot/),
+[5af7340b](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/testing/buildbot/),
 if you want to edit
-[linux_android_rel_ng](https://chromium.googlesource.com/chromium/tools/build/+/59a2653d5f143213f4f166714657808b0c646bd7/scripts/slave/recipe_modules/chromium_tests/trybots.py#142),
+[linux-wayland-rel](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/infra/config/subprojects/chromium/try/tryserver.chromium.linux.star#280),
 you actually need to edit
-[Android Tests](https://chromium.googlesource.com/chromium/src/+/81fcc4bc6123ace8dd37db74fd2592e3e15ea46a/testing/buildbot/chromium.linux.json#23).
+[Linux Tests (Wayland)](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/testing/buildbot/waterfalls.pyl#4895).
 
 ### Trying the changes on trybots
 You should be able to try build changes that affect the trybots directly (for
-example, adding a test to linux_android_rel_ng should show up immediately in
+example, adding a test to linux-rel should show up immediately in
 your tryjob). Non-trybot changes have to be landed manually :(.
 
 ## Capacity considerations when editing the configuration files
@@ -89,25 +87,54 @@ infrastructure has capacity to handle the extra load.  This is especially true
 for the established
 [Chromium CQ builders](https://chromium.googlesource.com/chromium/src/+/HEAD/infra/config/generated/cq-builders.md),
 as they operate under strict execution requirements. Make sure to get a resource
-owner or a member of Chrome Browser Core EngProd to sign off that there is both
-builder and swarmed test shard capacity available.
-
-In particular, pay attention to the capacity of the builder which compiles and
-then triggers and collects swarming task shards. If you're adding a new test
-suite to a bot, and know that the test suite adds one hour of testing time to
-the swarming shards, and know that you have enough swarmed capacity to handle
-that one hour of testing, that's a good start. But if that test *also* happens
-to run in shards which take 10 minutes longer than any other shards on that
-current bot, that means that the top-level builder will also take 10 minutes
-longer to run -- or 20 minutes longer if there are failures and retries. Ensure
-that the builder pool has enough capacity to handle that increase as well.
-
-Additionally, if your change is expected to increase utilization in the testing
-pools by any more than 5 VMs or 50 CPU cores, it will need to be approved via
-a resource request. (Consult anyone in //infra/OWNERS if you need help
-calculating the resource usage of a test change.) See http://go/i-need-hw
-for the steps involved in getting the approval. See [go/estimating-bot-capacity](https://goto.google.com/estimating-bot-capacity)
-for guidance on how many hosts to request.
+owner or a member of Chrome Browser Infra to sign off that there is both builder
+and swarmed test shard capacity available. The suggested process for adding new
+test suites to the CQ builders is to:
+1. File a bug if one isn't already on-file for the addition of the tests, assign
+   it to yourself and apply the `Infra>Client>Chrome` component.
+2. Add the tests in post-submit only mode, meaning the test would run on
+   post-submit bots, but not in pre-submit bots (a.k.a CQ bots). This can be
+   achieved by adding the `'ci_only': True` line to the test's definition in
+   the pyl files here.
+   ([Example](https://chromium.googlesource.com/chromium/src/+/79ed7956/testing/buildbot/test_suite_exceptions.pyl#934))
+3. After a sufficient amount of time (suggest 2 weeks), examine the results of
+   the test on the affected post-submit builders to determine the amount of
+   regressions they're catching. Note: unless the new test is providing unique
+   info/artifacts (e.g. stack traces, log files) that pre-existing tests lack,
+   exclude any regressions that _other_ tests also caught. We're only interested
+   in the regressions that these new tests catch alone.
+4. If the new tests aren't excessively flaky (use
+   [this dashboard](http://shortn/_gP9pAC2IS3) to verify) and if they catch a
+   sufficient number of regressions over that trial period, then they can be
+   promoted to the CQ. To do so, see the steps below.
+   **Note:** The precise number of regressions that need to be caught depends on
+   the runtime of the tests. A large suite like browser_tests would need to
+   catch multiple per week, while a much smaller one need not catch as many. If
+   you're unsure if your tests meet the cutoff, proceed with the following steps
+   and specify how many regressions were caught in the justification of the
+   resource request. Depending on resources, the resource owners may not approve
+   of the request. In which case, see step #5.
+   1. Calculate the amount of machine resources needed for the tests. Googlers
+      can use [this dashboard](http://shortn/_nyyTPgDJtF) to determine the
+      amount of bots required by comparing it to a similar suite on the same
+      builder. Do this for each CQ builder and each suite that's being added.
+   2. File a [resource request](http://go/file-chrome-resource-bug) for the
+      required amount of machines. Make sure to specify the correct type of bots
+      needed (Linux, Windows, Android emulator, Android device, etc).
+   3. If/when the request is approved and the resources have been deployed, you
+      can remove the `'ci_only': True` line for the definitions here to start
+      running the tests on the CQ.
+5. If the new tests _don't_ catch regressions sufficiently frequently, then they
+   don't provide a high-enough signal to warrant running on the CQ.
+   Consequently, they should remain in post-submit only with a comment
+   explaining why. This can be revisited if things change.
+
+If your change doesn't affect the CQ but is expected to increase utilization in
+the testing pools by any more than 5 VMs or 50 CPU cores, it will still need to
+be approved via a resource request. Consult the
+[dashboard](http://shortn/_nyyTPgDJtF) linked above to calculate the resource
+usage of a test change. See http://go/i-need-hw for the steps involved in
+getting the approval.
 
 ## How to use the generate_buildbot_json tool
 ### Test suites