Skip to content

[CDAP-20954] Fix the precondition while uploading file to GCS incase of retries#15539

Merged
itsankit-google merged 1 commit intodevelopfrom
CDAP-20954
Feb 8, 2024
Merged

[CDAP-20954] Fix the precondition while uploading file to GCS incase of retries#15539
itsankit-google merged 1 commit intodevelopfrom
CDAP-20954

Conversation

@itsankit-google
Copy link
Copy Markdown
Member

@itsankit-google itsankit-google commented Feb 7, 2024

JIRA: CDAP-20954

Fixes the issue:

io.cdap.cdap.runtime.spi.provisioner.dataproc.DataprocRuntimeException: Error while launching job default_DataFusionQuickStart_DataPipelineWorkflow_xxxxxx on cluster xxxxxxx.
        at io.cdap.cdap.runtime.spi.runtimejob.DataprocRuntimeJobManager.launch(DataprocRuntimeJobManager.java:380)
        at io.cdap.cdap.internal.provision.ProvisioningService$RuntimeJobManagerCallWrapper.launch(ProvisioningService.java:1031)
        at io.cdap.cdap.internal.app.runtime.distributed.remote.RuntimeJobTwillPreparer.launch(RuntimeJobTwillPreparer.java:185)
        at io.cdap.cdap.internal.app.runtime.distributed.remote.AbstractRuntimeTwillPreparer.lambda$start$1(AbstractRuntimeTwillPreparer.java:472)
        at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillRunnerService$ControllerFactory.lambda$create$0(RemoteExecutionTwillRunnerService.java:613)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.concurrent.ExecutionException: com.google.cloud.storage.StorageException: 304 Not Modified
        |> PUT https://storage.googleapis.com/upload/storage/v1/b/xxxxxxxxxxxx/o?ifGenerationNotMatch=1707230919533328&name=cdap-job/6ebe5a14-c4fe-11ee-ba20-4eef2f2241be/hConf.xml&uploadType=resumable&upload_id=ABPtcPq6pUMUx3NcWP6wdEX8obidqLWNTyeWENm75bGeR4haaFBqIFAgWs2dW7iaaPnIbGneMGteQLiS2wBCs018xKte7W5euuZseJAhLANBqcDn
        |> content-range: bytes 0-172145/172146
        |  
        |< HTTP/1.1 304 Not Modified
        |< content-length: 0
        |< content-type: application/json
        |< x-guploader-uploadid: XXXXXXXXXX
        |  
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at io.cdap.cdap.runtime.spi.runtimejob.DataprocRuntimeJobManager.launch(DataprocRuntimeJobManager.java:338)
        ... 11 common frames omitted

context:

uploadToGcsUtil(localFile, storage, targetFilePath, blobInfo,
          Storage.BlobWriteOption.doesNotExist());

Above upload failed with 412 which means object with specified name already exists, to override it, we fetch the blobInfo from GCS and re-upload it using GenerationNotMatch pre-condition which returns 304 which means generation number matched hence fixing the condition.

GCS Docs reference: https://cloud.google.com/storage/docs/request-preconditions#precondition_criteria

@itsankit-google itsankit-google added the build Triggers github actions build label Feb 7, 2024
@itsankit-google itsankit-google force-pushed the CDAP-20954 branch 2 times, most recently from c3a55c2 to 4e7ae68 Compare February 8, 2024 15:00
Copy link
Copy Markdown
Contributor

@masoud-io masoud-io left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment; otherwise LGTM.

Followup work: One main concern is that this code assumes that only one thread upload the file. This is a strong assumption and cannot be guaranteed 100% with system workers (e.g., a system worker stucks, appfabric timesout and send the request to another system worker. Now there are two system workers uploading the same artifacts). So it'd be great to think of a way to make sure the logic is correct even if multiple threads are uploading the same file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Triggers github actions build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants