Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BES: make uploader retry attempts configurable #16305

Closed

Conversation

sluongng
Copy link
Contributor

Depends on different Build Event Service setup, there could be different failure modes that may tolerate less or more failures for Build Events uploading.

Allow users to tweak the number without having to use a custom JVM args or shipping a fork of Bazel with these number tweaked.

@sluongng sluongng force-pushed the sluongng/uploader-retry-config branch from 7202e2b to d6562e4 Compare September 20, 2022 11:03
@sluongng sluongng marked this pull request as ready for review September 20, 2022 12:11
@sluongng
Copy link
Contributor Author

Not to sure how to test this just yet. I want to do this in Java but it might need to be a Bash test instead 🤔.

Open for suggestion.

@ShreeM01 ShreeM01 added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Sep 21, 2022
@coeuvre
Copy link
Member

coeuvre commented Sep 21, 2022

The changes LGTM but I will let @michaeledgar to make the decision.

@sluongng sluongng force-pushed the sluongng/uploader-retry-config branch from d6562e4 to b536c75 Compare September 21, 2022 13:35
@sluongng
Copy link
Contributor Author

I gave the flag names and docs slight tweak to better reflect their usages.

I am looking into src/test/java/com/google/devtools/build/lib/buildeventservice/BazelBuildEventServiceModuleTest.java but was unable to run the test locally on an M1 laptop.

java.lang.UnsatisfiedLinkError: /private/var/tmp/_bazel_sngoc/3438d92a73240559fc14d42b354b0b3f/sandbox/darwin-sandbox/2373/execroot/io_bazel/_tmp/96e4189e467fdee3386978ad0a337362/bazel-jni.11079868827068369907/libunix_jni.dylib: dlopen(/private/var/tmp/_bazel_sngoc/3438d92a73240559fc14d42b354b0b3f/sandbox/darwin-sandbox/2373/execroot/io_bazel/_tmp/96e4189e467fdee3386978ad0a337362/bazel-jni.11079868827068369907/libunix_jni.dylib, 0x0001): tried: '/private/var/tmp/_bazel_sngoc/3438d92a73240559fc14d42b354b0b3f/sandbox/darwin-sandbox/2373/execroot/io_bazel/_tmp/96e4189e467fdee3386978ad0a337362/bazel-jni.11079868827068369907/libunix_jni.dylib' (mach-o file, but is an incompatible architecture (have (arm64), need (x86_64)))

Exception in thread "Thread-0" java.lang.NoClassDefFoundError: Could not initialize class com.google.devtools.build.lib.unix.NativePosixFiles
	at java.base/java.lang.Thread.run(Thread.java:829)
EEE.EEE.EEE.EEE

Depends on different Build Event Service setup, there could be different
failure modes that may tolerate less or more failures for Build Events
uploading.

Allow users to tweak the number without having to use a custom JVM args
or shipping a fork of Bazel with these number tweaked.
@sluongng sluongng force-pushed the sluongng/uploader-retry-config branch from b536c75 to 4b0f209 Compare September 28, 2022 11:21
@sluongng
Copy link
Contributor Author

I added a small test for the retry attempt count.

Not quite sure how to test the delay value though, I would need to be able to assert the info log somehow but I could not find a good way to do that.
If anyone could point me to a precedented approach to replicate, it would be very helpful.

@sluongng
Copy link
Contributor Author

sluongng commented Oct 5, 2022

@michaeledgar I would appreciate it if you could take a look at this PR.

Just want to know if you folks are ok with the direction.

@@ -227,6 +227,17 @@ public void testCreatesStreamerForBesTransport() throws Exception {
.isInstanceOf(BuildEventServiceTransport.class);
}

@Test
public void testRetryCount() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding a test! It is pretty suspicious that we don't have any other test that exercised the now-defunct environment variable.

Copy link
Contributor

@michaeledgar michaeledgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this shortly. Thanks for sending this out, and persisting despite my unavailability!

public int buildEventUploadAttempt;

@Option(
name = "experimental_build_event_upload_retry_attempt_delay",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should switch this to a Duration.

Since this sat in my inbox for so long, I'll save us the round-trip and do it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants