diff --git a/testing/buildbot/README.md b/testing/buildbot/README.md index fc8fefd6eef4c..28ce834b57e46 100644 --- a/testing/buildbot/README.md +++ b/testing/buildbot/README.md @@ -17,9 +17,9 @@ components should be in components_unittests. ## A tour of the directory -* .json -- buildbot configuration json files. These are used to +* .json -- test configuration json files. These are used to configure what tests are run on what builders, in addition to specifying -builder-specific arguments and parameters. They are now autogenerated, mainly +builder-specific arguments and parameters. They are autogenerated, mainly using the generate_buildbot_json tool in this directory. * [generate_buildbot_json.py](./generate_buildbot_json.py) -- generates most of the buildbot json files in this directory, based on data contained in the @@ -53,34 +53,32 @@ a standardized format. ### Buildbot configuration json Logic in the [Chromium recipe](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipes/chromium.py) -looks up each builder for each master and test generators in -[chromium_tests/steps.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/steps.py) -parse the data. For example, as of -[a6e11220](https://chromium.googlesource.com/chromium/tools/build/+/a6e11220d97d578d6ba091abd68beba28a004722) -[generate_gtest](https://chromium.googlesource.com/chromium/tools/build/+/a6e11220d97d578d6ba091abd68beba28a004722/scripts/slave/recipe_modules/chromium_tests/steps.py#416) -parses any entry in a builder's -['gtest_tests'](https://chromium.googlesource.com/chromium/src/+/5750756522296b2a9a08009d8d2cc90db3b88f56/testing/buildbot/chromium.android.json#1243) -entry. +looks up each builder for each builder group, and the test generators in +[chromium_tests/generators.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/generators.py) +parse the data into structures defined in +[chromium_tests/steps.py.](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/steps.py) ## Making changes All of the JSON files in this directory are autogenerated. The "how to use" section below describes the main tool, `generate_buildbot_json.py`, which -manages most of the waterfalls. It's no longer possible to hand-edit the JSON +manages most of the waterfalls. It's not possible to hand-edit the JSON files; presubmit checks forbid doing so. -Note that trybots mirror regular waterfall bots, with the mapping defined in +Note that trybots mirror regular waterfall bots, with the mapping defined either +in [trybots.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/trybots.py). +or in the bots' `mirrors = ` attribute in their //infra/config/ definitions. This means that, as of -[81fcc4bc](https://chromium.googlesource.com/chromium/src/+/81fcc4bc6123ace8dd37db74fd2592e3e15ea46a/testing/buildbot/), +[5af7340b](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/testing/buildbot/), if you want to edit -[linux_android_rel_ng](https://chromium.googlesource.com/chromium/tools/build/+/59a2653d5f143213f4f166714657808b0c646bd7/scripts/slave/recipe_modules/chromium_tests/trybots.py#142), +[linux-wayland-rel](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/infra/config/subprojects/chromium/try/tryserver.chromium.linux.star#280), you actually need to edit -[Android Tests](https://chromium.googlesource.com/chromium/src/+/81fcc4bc6123ace8dd37db74fd2592e3e15ea46a/testing/buildbot/chromium.linux.json#23). +[Linux Tests (Wayland)](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/testing/buildbot/waterfalls.pyl#4895). ### Trying the changes on trybots You should be able to try build changes that affect the trybots directly (for -example, adding a test to linux_android_rel_ng should show up immediately in +example, adding a test to linux-rel should show up immediately in your tryjob). Non-trybot changes have to be landed manually :(. ## Capacity considerations when editing the configuration files @@ -89,25 +87,54 @@ infrastructure has capacity to handle the extra load. This is especially true for the established [Chromium CQ builders](https://chromium.googlesource.com/chromium/src/+/HEAD/infra/config/generated/cq-builders.md), as they operate under strict execution requirements. Make sure to get a resource -owner or a member of Chrome Browser Core EngProd to sign off that there is both -builder and swarmed test shard capacity available. - -In particular, pay attention to the capacity of the builder which compiles and -then triggers and collects swarming task shards. If you're adding a new test -suite to a bot, and know that the test suite adds one hour of testing time to -the swarming shards, and know that you have enough swarmed capacity to handle -that one hour of testing, that's a good start. But if that test *also* happens -to run in shards which take 10 minutes longer than any other shards on that -current bot, that means that the top-level builder will also take 10 minutes -longer to run -- or 20 minutes longer if there are failures and retries. Ensure -that the builder pool has enough capacity to handle that increase as well. - -Additionally, if your change is expected to increase utilization in the testing -pools by any more than 5 VMs or 50 CPU cores, it will need to be approved via -a resource request. (Consult anyone in //infra/OWNERS if you need help -calculating the resource usage of a test change.) See http://go/i-need-hw -for the steps involved in getting the approval. See [go/estimating-bot-capacity](https://goto.google.com/estimating-bot-capacity) -for guidance on how many hosts to request. +owner or a member of Chrome Browser Infra to sign off that there is both builder +and swarmed test shard capacity available. The suggested process for adding new +test suites to the CQ builders is to: +1. File a bug if one isn't already on-file for the addition of the tests, assign + it to yourself and apply the `Infra>Client>Chrome` component. +2. Add the tests in post-submit only mode, meaning the test would run on + post-submit bots, but not in pre-submit bots (a.k.a CQ bots). This can be + achieved by adding the `'ci_only': True` line to the test's definition in + the pyl files here. + ([Example](https://chromium.googlesource.com/chromium/src/+/79ed7956/testing/buildbot/test_suite_exceptions.pyl#934)) +3. After a sufficient amount of time (suggest 2 weeks), examine the results of + the test on the affected post-submit builders to determine the amount of + regressions they're catching. Note: unless the new test is providing unique + info/artifacts (e.g. stack traces, log files) that pre-existing tests lack, + exclude any regressions that _other_ tests also caught. We're only interested + in the regressions that these new tests catch alone. +4. If the new tests aren't excessively flaky (use + [this dashboard](http://shortn/_gP9pAC2IS3) to verify) and if they catch a + sufficient number of regressions over that trial period, then they can be + promoted to the CQ. To do so, see the steps below. + **Note:** The precise number of regressions that need to be caught depends on + the runtime of the tests. A large suite like browser_tests would need to + catch multiple per week, while a much smaller one need not catch as many. If + you're unsure if your tests meet the cutoff, proceed with the following steps + and specify how many regressions were caught in the justification of the + resource request. Depending on resources, the resource owners may not approve + of the request. In which case, see step #5. + 1. Calculate the amount of machine resources needed for the tests. Googlers + can use [this dashboard](http://shortn/_nyyTPgDJtF) to determine the + amount of bots required by comparing it to a similar suite on the same + builder. Do this for each CQ builder and each suite that's being added. + 2. File a [resource request](http://go/file-chrome-resource-bug) for the + required amount of machines. Make sure to specify the correct type of bots + needed (Linux, Windows, Android emulator, Android device, etc). + 3. If/when the request is approved and the resources have been deployed, you + can remove the `'ci_only': True` line for the definitions here to start + running the tests on the CQ. +5. If the new tests _don't_ catch regressions sufficiently frequently, then they + don't provide a high-enough signal to warrant running on the CQ. + Consequently, they should remain in post-submit only with a comment + explaining why. This can be revisited if things change. + +If your change doesn't affect the CQ but is expected to increase utilization in +the testing pools by any more than 5 VMs or 50 CPU cores, it will still need to +be approved via a resource request. Consult the +[dashboard](http://shortn/_nyyTPgDJtF) linked above to calculate the resource +usage of a test change. See http://go/i-need-hw for the steps involved in +getting the approval. ## How to use the generate_buildbot_json tool ### Test suites