Skip to content

Conversation

philsc
Copy link
Contributor

@philsc philsc commented Dec 7, 2023

This patch deletes the old copy of bazel_integration_test that we
had vendored into the repo. It's replaced with the maintained
rules_bazel_integration_test.

Summary of changes:

  • Increases minimum tested bazel version to 6.4.0.
    rules_bazel_integration_test depends on some features in 6.4
  • Moves several CI jobs to be BIT tests. This free ups
    about 10 CI slots.
  • Runs these tests under a separate CI job. This is so that feedback
    from the regular test jobs is faster.

Notes about these bazel-in-bazel integration tests:

  • The tests are very heavy and easily overwhelm a system. Unfortunately,
    there doesn't appear to be a way to cap their parallelism; only
    disable it entirely using the exclusive tag. Some light
    testing shows there is some speedup to be gained on CI
    if we can, in the future, limit them to 2 or 4 concurrent
    executions.
  • A special version named "self" is created that re-uses
    whatever the outer Bazel program is. This is mainly so that
    Bazel's "at head" testing pipelines (the one that runs tests
    with Bazel built from head) are able to affect the integration tests.
    It also comes in handy when locally testing a custom Bazel build.
  • The globbing of child workspace files can be somewhat prone to
    following bazel-* symlinks, so its important the .bazelignore
    and deleted packages configs are up-to-date. Otherwise the
    globbing can turn into 30,000+ files and consume a system-freezing
    level of memory and CPU.

Fixes #1209

This patch deletes the old copy of `bazel_integration_test` that we
had vendored into the repo. It's replaced with the maintained
`rules_bazel_integration_test`.

I tried to enable the tests with Bazel 5.4.0, but we have at least one
Bazel 6+-only feature that we make use of in `rules_python`.

    DEBUG: /home/users/phil/.cache/bazel/_bazel_phil/30d461f9383039e65fc400eede879d16/execroot/rules_python/_tmp/b12274e4f3e9a093f0136650955cdf80/_bazel_phil/93e1b72b6c0a666bcd5ccbe6a9bbbd7b/external/rules_python.override/python/repositories.bzl:550:22: WARNING: ignoring register_coverage_tool=True when registering @python_3_9: Bazel 6+ required, got 5.4.0
    ERROR: Traceback (most recent call last):
            File "/home/users/phil/.cache/bazel/_bazel_phil/30d461f9383039e65fc400eede879d16/execroot/rules_python/_tmp/b12274e4f3e9a093f0136650955cdf80/_bazel_phil/93e1b72b6c0a666bcd5ccbe6a9bbbd7b/external/rules_python.override/python/private/bzlmod/python.bzl", line 132, column 57, in _python_impl
                    toolchain_info = _python_register_toolchains(
            File "/home/users/phil/.cache/bazel/_bazel_phil/30d461f9383039e65fc400eede879d16/execroot/rules_python/_tmp/b12274e4f3e9a093f0136650955cdf80/_bazel_phil/93e1b72b6c0a666bcd5ccbe6a9bbbd7b/external/rules_python.override/python/private/bzlmod/python.bzl", line 49, column 31, in _python_register_toolchains
                    python_register_toolchains(
            File "/home/users/phil/.cache/bazel/_bazel_phil/30d461f9383039e65fc400eede879d16/execroot/rules_python/_tmp/b12274e4f3e9a093f0136650955cdf80/_bazel_phil/93e1b72b6c0a666bcd5ccbe6a9bbbd7b/external/rules_python.override/python/repositories.bzl", line 603, column 39, in python_register_toolchains
                    native.register_toolchains("@{toolchain_repo_name}//:{platform}_toolchain".format(
    Error in register_toolchains: The native module can be accessed only from a BUILD thread. Wrap the function in a macro and call it from a BUILD file
    ERROR: error evaluating module extension python in @rules_python.override//python/extensions:python.bzl

So we only test against 6.2.0 for now.
Copy link
Collaborator

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the CI failures, most errors mention gazelle. Probably the gazelle files aren't inputs to the test? IIRC, gazelle is a separate distribution.

tags = (tags or []) + [
# Upstream normally runs the tests with the `exclusive` tag. That
# tag also implies `local` by default. We don't really need the
# exclusion feature, but we do need the test to break the sandbox.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it to break the sandbox?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't have it break the sandbox, then the tests will fail:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INFO: Repository bazel_features~1.1.1 instantiated at:
  <builtin>: in <toplevel>
Repository rule http_archive defined at:
  /tmp/bazel-working-directory/rules_python/_tmp/2d3bceb7df44f5c60fb62c99b332cef6/_bazel_phil/99580b3b5ce42da1b00cf311bc78b2b8/external/bazel_tools/tools/build_defs/repo/http.bzl:384:31: in <toplevel>
ERROR: An error occurred during the fetch of repository 'bazel_features~1.1.1':
   Traceback (most recent call last):
        File "/tmp/bazel-working-directory/rules_python/_tmp/2d3bceb7df44f5c60fb62c99b332cef6/_bazel_phil/99580b3b5ce42da1b00cf311bc78b2b8/external/bazel_tools/tools/build_defs/repo/http.bzl", line 145, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: error copying file: couldn't delete destination: /home/users/phil/.cache/bazel_repository/content_addressable/sha256/62c26e427e5cbc751024446927622e398a9dcdf32c64325238815709d11c11a8/tmp-44772af6-b733-44fd-b1a2-ea49aa2d0bcf (Read-only file system)
ERROR: <builtin>: fetching http_archive rule //:bazel_features~1.1.1: Traceback (most recent call last):
        File "/tmp/bazel-working-directory/rules_python/_tmp/2d3bceb7df44f5c60fb62c99b332cef6/_bazel_phil/99580b3b5ce42da1b00cf311bc78b2b8/external/bazel_tools/tools/build_defs/repo/http.bzl", line 145, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: error copying file: couldn't delete destination: /home/users/phil/.cache/bazel_repository/content_addressable/sha256/62c26e427e5cbc751024446927622e398a9dcdf32c64325238815709d11c11a8/tmp-44772af6-b733-44fd-b1a2-ea49aa2d0bcf (Read-only file system)

It uses the user's home directory by default. Not sure if there's a way to fix that. But that's outside the scope of this PR I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the impression the inner bazel is trying to download bazel_features? Strictly speaking, it shouldn't need to because the outer bazel would have (or could have) already done that, but the inner bazel probably isn't aware of where the outer bazel downloaded things to. In any case, yeah, I agree its out of scope for this PR.

Bazel's own CI does something to optimize this, so we can ask them how they make it work (I recall something about they have a special target to eagerly fetch things, then somehow wire the inner bazel to read from there).

@rickeylev
Copy link
Collaborator

FYI: I found that all the integration tests in the examples directory were broken. And CI was skipping them (as part of the main repo tests; it ran them as separate jobs in separate workspace directories).

You might want to check if the other existing integration tests were working at head or not. If not, just delete them. We've been going awhile without them, so I don't think they were adding anything. You might be able to start with a clean slate and not have to worry about trying to convert any of the existing ones over.

philsc and others added 16 commits December 16, 2023 20:14
…tion_test-b

gist of change:
 - revert changes to presubmit (upstream moved tests to
   tests/integration)
 - partial revert of .bazelrc. Kept the comments, deleted the
   deleted packages; cna rerun the command to get new values,
   presumably.
 - deleted the move to tests/integration_tests; upstream moved
   things to tests/integration
 - deleted the old usages of bazel integration tests; upstream deleted
   all usages.

Mergedly yours,
rickeylev
* remove old references to tests/integration_tests
* remove missed conflict marker
Upstream made tests/integratin the directory for all the integration
testing related stuff.
@rickeylev
Copy link
Collaborator

@philsc i rebased this to head and did some cleanup and pushed it to your branch. they were merges, so it should pull just fine for you. hope so anyways! never done such extensive changes to another person's branch

* bump CI to use 6.4; necessary for rules_bazel_integration_test
* Ignore some additional directories
* Get bazel version from .baelversion file
* Add use_repo() calls; these don't appear necessary, but avoid a
  warning being printed
This should make it possible for the bazel-in-bazel tests to use
the same version of bazel as the outer invocation. This is important
so that Bazel CI's downstream testing pipeline is able to run tests
with unreleased versions of bazel automatically.

Along the way, fix an issue that blows up memory and cpu usage:
symlinks in the child workspace were being followed, which caused
the set of sources passed to the bazel-in-bazel test to become
30,000+ entries long. To avoid that, cap the size of the glob and
add those symlinks to the ignore files.

Also add a small optimization for how the workspace files are collected:
the bazel_integration_tests macro creates a separate filegroup target
for the list of files passed into it for each version of bazel it
tests, which duplicates that same list. To avoid that duplication,
manually create a filegroup once and pass that single list into
the bazel_integration_tests macro.
integration tests are slow, so run them as a separate job so that
the regular tests give feedback sooner.

Adds the integration-test tag back, which should make CI ignore them
for the other jobs, too.

Also default their size to large.
The underlying tests are no longer tagged manual.

* Default bzlmod=True
* Fix ordering of loads
* Add missing copyright header
* Switch back to using exclusive tag for now. Limiting to 2 concurrent
  tests netted a small improvmement (17m -> 13m), but it's not clear
  how to limit them when doing local development. I'm impressed with how
  thoroughly these tests gobble up a machine's resources and render
  it borderline unresponsive.
* Default size to enormous.
* Remove all the bazel-in-bazel tests for the examples. We have
  sufficient coverage through the CI jobs, so don't need to run
  them for every combination; we can re-add or create smaller tests
  for regressions as we find them
* Remove CI jobs for tests that the bazel-in-bazel tests now cover.
  This frees up some CI slots we can use later.
Frees up a few more slots.

Also makes the pip_repository_entry_points test be a workspace test
(it doesn't support bzlmod)
@rickeylev rickeylev marked this pull request as ready for review January 6, 2024 22:59
@rickeylev rickeylev requested a review from f0rmiga as a code owner January 6, 2024 22:59
@rickeylev rickeylev added this pull request to the merge queue Jan 6, 2024
examples/bzlmod/bazel-bzlmod
examples/bzlmod/bazel-out
examples/bzlmod/bazel-testlogs
examples/bzlmod/other_module/bazel-bin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not expect the bazel symlinks to be included here. Or at leastthey should be included for all examples.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've been confused by this. It's not very clear to me why the symlinks get picked up sometimes and not others. Sometimes adding to this ignore list helps, sometimes I have to delete the symlinks. It seems like some interaction between the outer bazel, deleted packages, the ignore file, and some sort of caching.

@rickeylev
Copy link
Collaborator

Ok, I'm going to merge this. It sets up the basics we need to have better test coverage of stuff that happens during earlier phases (repo rules, modules, etc). I think we'll need some custom test runners to fully support those cases, but we at least have the basics to start.

Merged via the queue into bazel-contrib:main with commit 8aaaf93 Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

buildkite pipeline file limitation - error

3 participants