Skip to content

Conversation

@bvolpato
Copy link
Contributor

@bvolpato bvolpato commented Mar 13, 2023

A race condition can be easily introduced and exhaust all default withMaxRetries(4) with 2 job submissions / threads staging the same JAR at the same time, especially for large JARs:

2023-03-13 18:19:40 INFO TemplateTestBase:35 - 2023-03-13 18:19:40 WARN RetryHttpRequestInitializer:165 - Request failed with code 412, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://storage.googleapis.com/upload/storage/v1/b/{bucket}/o?ifGenerationMatch=1678731567510887&name=2023-03-13-18-11_IT/staging/hbase-shaded-server-1.4.12-{hash}.jar&uploadType=resumable&upload_id={id}.

2023-03-13 18:19:40 INFO TemplateTestBase:35 - 2023-03-13 18:19:40 ERROR PackageUtil:191 - Upload failed, will NOT retry staging of package: /home/{user}/.m2/repository/org/apache/hbase/hbase-shaded-server/1.4.12/hbase-shaded-server-1.4.12.jar

2023-03-13 18:19:40 INFO TemplateTestBase:35 - java.io.IOException: Upload failed for 'gs://{bucket}/2023-03-13-18-11_IT/staging/hbase-shaded-server-1.4.12-{hash}.jar'

Even in scenarios in which a concurrent submission goes through, the artifacts are re-uploaded, making the cache useless under racing.

This change will check if the file was already staged before each try, instead of before entering the retry loop.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@Abacn
Copy link
Contributor

Abacn commented Mar 14, 2023

Looks like the failed unit tests are relevant?

org.apache.beam.runners.dataflow.util.PackageUtilTest.testPackageUploadEventuallySucceeds
org.apache.beam.runners.dataflow.util.PackageUtilTest.testPackageUploadFailsWhenIOExceptionThrown

@bvolpato bvolpato force-pushed the backoff-racing-return-cached branch from d8defa0 to c03d972 Compare March 14, 2023 14:07
@bvolpato
Copy link
Contributor Author

Thanks @Abacn. Fixed it now. I had tested the functionality and didn't notice that test class.

Change was just around Mockito's verify as it now calls getObjects n times for n tries.

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

just run './gradlew :runners:google-cloud-dataflow-java:spotlessApply' to fix code format

@bvolpato bvolpato force-pushed the backoff-racing-return-cached branch from c03d972 to 2b36d03 Compare March 14, 2023 15:41
@bvolpato
Copy link
Contributor Author

My bad, done -- feel free to merge.

@Abacn Abacn merged commit 6b18305 into apache:master Mar 14, 2023
@bvolpato bvolpato deleted the backoff-racing-return-cached branch March 14, 2023 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants