Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel Examples configuration is failing with Bazel@HEAD #18771

Closed
sgowroji opened this issue Jun 26, 2023 · 14 comments
Closed

Bazel Examples configuration is failing with Bazel@HEAD #18771

sgowroji opened this issue Jun 26, 2023 · 14 comments
Labels
breakage team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged

Comments

@sgowroji
Copy link
Member

https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/3123#0188f574-6d64-41f8-9bc1-f1128df8d993

Platform : Macos, Lts, Windows

Logs :

FATAL: bazel ran out of memory and crashed. Printing stack trace:
java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.Arrays.copyOf(Unknown Source)
	at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
	at java.base/java.lang.AbstractStringBuilder.append(Unknown Source)
	at java.base/java.lang.StringBuilder.append(Unknown Source)
	at com.google.devtools.common.options.OptionsBase.mapToCacheKey(OptionsBase.java:115)
	at com.google.devtools.build.lib.analysis.config.BuildOptions.checksum(BuildOptions.java:178)
	at com.google.devtools.build.lib.analysis.config.BuildOptions.hashCode(BuildOptions.java:256)
	at java.base/java.util.HashMap.hash(Unknown Source)
	at java.base/java.util.HashMap.put(Unknown Source)
	at java.base/java.util.HashSet.add(Unknown Source)
	at com.google.devtools.build.lib.analysis.starlark.StarlarkTransition.validate(StarlarkTransition.java:233)
	at com.google.devtools.build.lib.analysis.config.StarlarkTransitionCache.computeIfAbsent(StarlarkTransitionCache.java:93)
	at com.google.devtools.build.lib.analysis.producers.TransitionApplier.applyStarlarkTransition(TransitionApplier.java:137)
	at com.google.devtools.build.lib.analysis.producers.TransitionApplier$$Lambda$788/0x0000000800720250.step(Unknown Source)
	at com.google.devtools.build.skyframe.state.TaskTreeNode.run(TaskTreeNode.java:95)
	at com.google.devtools.build.skyframe.state.Driver.drive(Driver.java:89)
	at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.computeTargetAndConfiguration(ConfiguredTargetFunction.java:489)
	at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.compute(ConfiguredTargetFunction.java:247)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:506)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:399)
	at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
bazel build failed with exit code 33
🚨 Error: The command exited with status 1

Steps :

git clone -v https://github.com/bazelbuild/examples.git
git reset ec23fe3421c86724e58bd1c0f02c089888abab4e --hard
export USE_BAZEL_VERSION=7f7017a44ec40a2a6b22f1c52654e81906e50e4c
cd configurations
bazel build -- ... -//read_attr_in_transition:will-break -//cc_binary_selectable_copts:app_forgets_to_set_features -//cc_binary_selectable_copts:lib -//cc_binary_selectable_copts:app_forgets_to_set_features_native_binary -//cc_binary_selectable_copts:app_with_feature1_native_binary -//cc_binary_selectable_copts:app_with_feature2_native_binary -//cc_test/... -//cc_test/...

Bisect result:

 bazel --bisect=8fd5b04c1b6158000c7eced06d8b8695830c2efc..d097b5d6cd3bc9fdb725b379b6cf3ef247126008 build -- ... -//read_attr_in_transition:will-break -//cc_binary_selectable_copts:app_forgets_to_set_features -//cc_binary_selectable_copts:lib -//cc_binary_selectable_copts:app_forgets_to_set_features_native_binary -//cc_binary_selectable_copts:app_with_feature1_native_binary -//cc_binary_selectable_copts:app_with_feature2_native_binary -//cc_test/... -//cc_test/...

Culprit : 52dbdc7

CC Greenteam @comius

@sgowroji sgowroji added type: bug breakage untriaged team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Jun 26, 2023
@meteorcloudy meteorcloudy added team-Core Skyframe, bazel query, BEP, options parsing, bazelrc and removed team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Jun 27, 2023
@meteorcloudy
Copy link
Member

meteorcloudy commented Jun 27, 2023

/cc @aoeui @justinhorvitz

@greatfilter
Copy link

Writing from somewhere in Atlantic ocean over spotty satellite internet so can't fix this right away.

Most likely a known consequence of a recent change. Non-convergent transitions -- a transition that when applied repeatedly -- repeatedly generates novel configurations leads to an infinitely expanding graph.

Ideally, there would be a clear error message for this condition.

Preferred solution is to make the transition idempotent.

Anything that gets cleaned up in this way can't regress because anything exhibiting this behavior leads to OOMs.

@justinhorvitz
Copy link
Contributor

Let me know if you want to roll back 52dbdc7 until Shahan is back and can give this some attention including more helpful failure messages and clean up violations. I don't know where to find the configuration transition that is leading to infinite config expansion (presumably adding a repeated flag) but if there's just one place perhaps that can be changed.

@meteorcloudy
Copy link
Member

meteorcloudy commented Jun 27, 2023

I think Bazel example is the only broken downstream project, so it's not that urgent.

The configurations is located under https://github.com/bazelbuild/examples/tree/main/configurations

@justinhorvitz
Copy link
Contributor

https://github.com/bazelbuild/examples/blob/cc8af817b441a5aca9eb215cee51ca12cb141e3a/configurations/read_attr_in_transition/defs.bzl#L8 is definitely a culprit (not sure if there are others). I believe Shahan is planning to fail fast on a non-convergent transition like that, and if so the documentation and examples should of course be updated. Until then you could try changing that example to not append if the string already ends with -transitioned or something like that.

@meteorcloudy
Copy link
Member

/cc @gregestren

@fmeum
Copy link
Collaborator

fmeum commented Jun 27, 2023

I'm not sure I can completely follow this discussion, but is it correct that all transitions are expected to be idempotent? Assuming a non-idempotent transition is only applied at the particular edge it is attached to, how would it end up creating an infinite number of configurations?

If non-idempotent transitions are expected to fail in one way or another, then this would break rulesets such as rules_fuzzing (https://github.com/bazelbuild/rules_fuzzing/blob/9865504b549e86ccfb4713afcc1914c982567f05/fuzzing/private/binary.bzl#L58, where merge just concatenates) and most transitions on copt I've seen so far.

@justinhorvitz
Copy link
Contributor

I believe the requirement would only be limited to "self transitions" like the example I linked to.

@gregestren
Copy link
Contributor

@fmeum
Copy link
Collaborator

fmeum commented Jun 27, 2023

@gregestren @justinhorvitz Thanks, I missed that part. It's still slightly surprising to me as I understood "self transitions" to essentially be "incoming edge transitions", but all examples I have seen have been outgoing edge transitions and thus aren't affected (including rules_fuzzing).

@meteorcloudy
Copy link
Member

I think Bazel example is the only broken downstream project, so it's not that urgent.

Sorry, I was wrong. There are some other projects in downstream with hanging or OOM errors, so I believe it's better to rollback the change first.

@justinhorvitz
Copy link
Contributor

Let's roll back and then consider whether we should do a more thorough cleanup of external projects (this approach seemed to work within google) or pivot to an alternative approach (there is another I think will work).

copybara-service bot pushed a commit that referenced this issue Jun 28, 2023
*** Reason for rollback ***

#18771

PiperOrigin-RevId: 544025702
Change-Id: I5c036cda4536f86088f259391cdb7c58ef04df6d
@seaurching
Copy link

I have a 64 cores, 128G memory physical server. Want to known use multi cores to build bazel with bazel-dist-version.zip source packages

copybara-service bot pushed a commit that referenced this issue Jul 5, 2023
*** Reason for rollback ***

Roll forward with fix.

The original CL caused infinite expansion of rule transitions when they
were non-convergent.

The fix adds idempotency checking to transitions when they are
non-noop and flags them to not apply a rule transition via the new
ConfiguredTargetKey.shouldApplyRuleTransition property. This is discussed
more thoroughly in the code comments of ConfiguredTargetKey and
IdempotencyChecker.

*** Original change description ***

Automated rollback of commit 52dbdc7.

*** Reason for rollback ***

#18771

PiperOrigin-RevId: 545664983
Change-Id: Id1e46b61506edb861e76ff5f3858e3c95aaaa407
fweikert pushed a commit that referenced this issue Jul 11, 2023
*** Reason for rollback ***

#18771

PiperOrigin-RevId: 544025702
Change-Id: I5c036cda4536f86088f259391cdb7c58ef04df6d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breakage team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged
Projects
None yet
Development

No branches or pull requests

7 participants