-
Notifications
You must be signed in to change notification settings - Fork 951
Description
Describe the bug
The Scala application is designed to upload large files (including directories with subdirectories) to Amazon S3 using the AWS SDK v2 S3AsyncClient and S3TransferManager. However, despite enabling multipart upload and configuring the MultipartConfiguration correctly, the application fails to upload large files consistently. The upload process results in a timeout without completing the upload.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
Uploads files to s3 successfully.
Current Behavior
Observed Errors:
- Despite increasing apiCallTimeout, connectionTimeout, and writeTimeout to 60+ minutes, the uploads still time out when attempting to upload large files.
- Multipart Configuration Ignored: The multipart upload configuration is not being respected by S3AsyncClient.
Threshold and part size settings do not seem to trigger splitting large files into multiple parts. - The difference between the request time and the current time is too large.
- ERROR - Failed to upload directory [file] to bucket [bucket]. Error message: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
Reproduction Steps
Try to upload around 19-20GB file with this script
import org.slf4j.{Logger, LoggerFactory}
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider
import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration
import software.amazon.awssdk.core.retry.{RetryMode, RetryPolicy}
import software.amazon.awssdk.core.retry.backoff.FullJitterBackoffStrategy
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3.{S3AsyncClient, S3Client}
import software.amazon.awssdk.services.s3.crt.{S3CrtHttpConfiguration, S3CrtRetryConfiguration}
import software.amazon.awssdk.services.s3.model.{PutObjectRequest, ServerSideEncryption}
import software.amazon.awssdk.services.s3.multipart.MultipartConfiguration
import software.amazon.awssdk.transfer.s3.S3TransferManager
import software.amazon.awssdk.transfer.s3.model.{CompletedFileUpload, UploadFileRequest}
import software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener
import scala.jdk.CollectionConverters.*
import java.io.File
import java.nio.file.{Files, Path}
import java.time.Duration
object Main {
private val logger: Logger = LoggerFactory.getLogger(getClass)
def main(args: Array[String]): Unit = {
val bucketName = "bucket_name"
val keyPrefix = "obj_key"
val dirPath = "file"
val includeSubDir = true
val MB = 1024 * 1024
logger.info(s"Starting S3 upload from directory: $dirPath")
val httpClient = NettyNioAsyncHttpClient.builder()
.maxConcurrency(20) // Increase max connections
.connectionAcquisitionTimeout(Duration.ofMinutes(60))
.readTimeout(Duration.ofMinutes(60)) // Increase read timeout
.writeTimeout(Duration.ofMinutes(70)) // Increase write timeout
.tcpKeepAlive(true)
.connectionTimeout(Duration.ofMinutes(70)) // Increase connection timeout
val overrideConfig = ClientOverrideConfiguration.builder()
.retryStrategy(RetryMode.STANDARD) // Uses AWS SDK standard retry strategy
.apiCallTimeout(Duration.ofMinutes(30)) // Timeout for API calls
.apiCallAttemptTimeout(Duration.ofMinutes(30)) // Timeout per retry attempt
.build()
val s3AsyncClient = S3AsyncClient.builder()
.region(Region.US_EAST_1) // Set your AWS region
.credentialsProvider(ProfileCredentialsProvider.create("default")) // Set your AWS profile
.overrideConfiguration(overrideConfig)
.httpClientBuilder(httpClient)
.multipartEnabled(true)
.multipartConfiguration(MultipartConfiguration.builder()
.thresholdInBytes(50 * MB).minimumPartSizeInBytes(50 * MB).apiCallBufferSizeInBytes(50 * MB).build())
.build()
val dir = new File(dirPath)
if (!dir.exists() || !dir.isDirectory) {
logger.error(s"Directory does not exist or is not a directory: $dirPath")
return
}
val transferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build()
try {
val filesToUpload: List[Path] = if (includeSubDir) {
Files.walk(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
} else {
Files.list(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
}
if (filesToUpload.isEmpty) {
logger.warn("No files found to upload.")
return
}
filesToUpload.foreach { filePath =>
val relativePath = dir.toPath.relativize(filePath).toString.replace("\\", "/")
val s3Key = keyPrefix + relativePath
logger.info(s"Uploading file: ${filePath.toAbsolutePath} -> S3 ($bucketName/$s3Key)")
val uploadFileRequest = UploadFileRequest.builder()
.source(filePath)
.addTransferListener(LoggingTransferListener.create()) // Add listener.
.putObjectRequest(PutObjectRequest.builder()
.bucket(bucketName)
.key(s3Key)
.serverSideEncryption(ServerSideEncryption.AES256)
.build())
.build()
val upload = transferManager.uploadFile(uploadFileRequest)
val result: CompletedFileUpload = upload.completionFuture().join()
logger.info(s"Successfully uploaded: $s3Key")
}
} catch {
case e: Exception =>
logger.error("Error during upload", e)
} finally {
transferManager.close()
}
}
}
Possible Solution
No response
Additional Information/Context
No response
AWS Java SDK version used
2.30.38
JDK version used
21
Operating System and version
Windows 11