Skip to content

The issue where SDKv2 is not able to upload large files to an S3 bucket, even after adding configuration #5966

@Nikkeii

Description

@Nikkeii

Describe the bug

The Scala application is designed to upload large files (including directories with subdirectories) to Amazon S3 using the AWS SDK v2 S3AsyncClient and S3TransferManager. However, despite enabling multipart upload and configuring the MultipartConfiguration correctly, the application fails to upload large files consistently. The upload process results in a timeout without completing the upload.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Uploads files to s3 successfully.

Current Behavior

Observed Errors:

  1. Despite increasing apiCallTimeout, connectionTimeout, and writeTimeout to 60+ minutes, the uploads still time out when attempting to upload large files.
  2. Multipart Configuration Ignored: The multipart upload configuration is not being respected by S3AsyncClient.
    Threshold and part size settings do not seem to trigger splitting large files into multiple parts.
  3. The difference between the request time and the current time is too large.
  4. ERROR - Failed to upload directory [file] to bucket [bucket]. Error message: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.

Reproduction Steps

Try to upload around 19-20GB file with this script
import org.slf4j.{Logger, LoggerFactory}
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider
import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration
import software.amazon.awssdk.core.retry.{RetryMode, RetryPolicy}
import software.amazon.awssdk.core.retry.backoff.FullJitterBackoffStrategy
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3.{S3AsyncClient, S3Client}
import software.amazon.awssdk.services.s3.crt.{S3CrtHttpConfiguration, S3CrtRetryConfiguration}
import software.amazon.awssdk.services.s3.model.{PutObjectRequest, ServerSideEncryption}
import software.amazon.awssdk.services.s3.multipart.MultipartConfiguration
import software.amazon.awssdk.transfer.s3.S3TransferManager
import software.amazon.awssdk.transfer.s3.model.{CompletedFileUpload, UploadFileRequest}
import software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener

import scala.jdk.CollectionConverters.*
import java.io.File
import java.nio.file.{Files, Path}
import java.time.Duration

object Main {
private val logger: Logger = LoggerFactory.getLogger(getClass)

def main(args: Array[String]): Unit = {
val bucketName = "bucket_name"
val keyPrefix = "obj_key"
val dirPath = "file"
val includeSubDir = true
val MB = 1024 * 1024

logger.info(s"Starting S3 upload from directory: $dirPath")

val httpClient = NettyNioAsyncHttpClient.builder()
  .maxConcurrency(20) // Increase max connections
  .connectionAcquisitionTimeout(Duration.ofMinutes(60))
  .readTimeout(Duration.ofMinutes(60)) // Increase read timeout
  .writeTimeout(Duration.ofMinutes(70)) // Increase write timeout
  .tcpKeepAlive(true)
  .connectionTimeout(Duration.ofMinutes(70)) // Increase connection timeout

val overrideConfig = ClientOverrideConfiguration.builder()
  .retryStrategy(RetryMode.STANDARD) //  Uses AWS SDK standard retry strategy
  .apiCallTimeout(Duration.ofMinutes(30)) // Timeout for API calls
  .apiCallAttemptTimeout(Duration.ofMinutes(30)) // Timeout per retry attempt
  .build()


val s3AsyncClient = S3AsyncClient.builder()
  .region(Region.US_EAST_1) // Set your AWS region
  .credentialsProvider(ProfileCredentialsProvider.create("default")) // Set your AWS profile
  .overrideConfiguration(overrideConfig)
  .httpClientBuilder(httpClient)
  .multipartEnabled(true)
  .multipartConfiguration(MultipartConfiguration.builder()
  .thresholdInBytes(50 * MB).minimumPartSizeInBytes(50 * MB).apiCallBufferSizeInBytes(50 * MB).build())
  .build()




val dir = new File(dirPath)
if (!dir.exists() || !dir.isDirectory) {
  logger.error(s"Directory does not exist or is not a directory: $dirPath")
  return
}

val transferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build()

try {
  val filesToUpload: List[Path] = if (includeSubDir) {
    Files.walk(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
  } else {
    Files.list(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
  }

  if (filesToUpload.isEmpty) {
    logger.warn("No files found to upload.")
    return
  }

  filesToUpload.foreach { filePath =>
    val relativePath = dir.toPath.relativize(filePath).toString.replace("\\", "/")
    val s3Key = keyPrefix + relativePath

    logger.info(s"Uploading file: ${filePath.toAbsolutePath} -> S3 ($bucketName/$s3Key)")

    val uploadFileRequest = UploadFileRequest.builder()
      .source(filePath)
      .addTransferListener(LoggingTransferListener.create())  // Add listener.
      .putObjectRequest(PutObjectRequest.builder()
        .bucket(bucketName)
        .key(s3Key)
        .serverSideEncryption(ServerSideEncryption.AES256)
        .build())
      .build()

    val upload = transferManager.uploadFile(uploadFileRequest)
    val result: CompletedFileUpload = upload.completionFuture().join()

    logger.info(s"Successfully uploaded: $s3Key")
  }

} catch {
  case e: Exception =>
    logger.error("Error during upload", e)
} finally {
  transferManager.close()
}

}
}

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.30.38

JDK version used

21

Operating System and version

Windows 11

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.closed-for-stalenessp2This is a standard priority issueresponse-requestedWaiting on additional info and feedback. Will move to "closing-soon" in 10 days.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions