feat: consolidate S3 client creation and enable ARN role for MSQ export#19317
Conversation
|
The changes LGTM, no correctness issues found. |
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 1 |
| P3 | 0 |
| Total | 1 |
This is an automated review by Codex GPT-5
| return StsAssumeRoleCredentialsProvider.builder() | ||
| .stsClient(stsBuilder.build()) | ||
| .refreshRequest(assumeRoleRequestBuilder.build()) | ||
| .asyncCredentialUpdateEnabled(true) |
There was a problem hiding this comment.
[P2] Avoid leaking assume-role refresh resources
When an MSQ export specifies assumeRoleArn, S3ExportStorageProvider.createStorageConnector builds a fresh ServerSideEncryptingAmazonS3 for each connector creation. That path creates an StsAssumeRoleCredentialsProvider with asyncCredentialUpdateEnabled(true) plus a new STS client, but neither the provider, STS client, nor S3 clients are lifecycle-managed or closed after the export connector is done. Since exports create connectors for the empty-location check, worker writes, and manifest writing, each ARN export can leave background refresh/client resources behind. Reuse a lifecycle-managed role-specific client/provider or make the connector own and close the resources it creates.
| } | ||
| return new S3StorageConnector( | ||
| s3OutputConfig, | ||
| ServerSideEncryptingAmazonS3.builder( |
There was a problem hiding this comment.
Frank's automated review makes a good point about this here: https://github.com/apache/druid/pull/19317/files#r3161338202
Each of these ServerSideEncryptingAmazonS3 is going to leak some resources: thread pools, connection pools, and the like. This was a pre-existing problem with S3InputSource when a S3InputDataConfig is provided (via the properties field). I think because we have an S3InputSource per file (due to splitting), we'd potentially even in master be creating a lot of ServerSideEncryptingAmazonS3 when properties is set.
This PR isn't making it a ton worse, but it also isn't making it much better. I suppose it would be OK to merge it given that it's not a ton worse, but, could you please memoize the client in the constructor of S3ExportStorageProvider (similar to what S3InputSource does). That will at least allow it to be shared across calls to createStorageConnector.
Could you please also put some comments about the problem in ServerSideEncryptingAmazonS3#builder. Ultimately fixing it would I think involve making ServerSideEncryptingAmazonS3 closeable, and arranging for close to actually be called, which would be a larger change.
There was a problem hiding this comment.
Frank's automated review makes a good point about this here: https://github.com/apache/druid/pull/19317/files#r3161338202
Each of these
ServerSideEncryptingAmazonS3is going to leak some resources: thread pools, connection pools, and the like. This was a pre-existing problem withS3InputSourcewhen aS3InputDataConfigis provided (via thepropertiesfield). I think because we have anS3InputSourceper file (due to splitting), we'd potentially even inmasterbe creating a lot ofServerSideEncryptingAmazonS3whenpropertiesis set.This PR isn't making it a ton worse, but it also isn't making it much better. I suppose it would be OK to merge it given that it's not a ton worse, but, could you please memoize the client in the constructor of
S3ExportStorageProvider(similar to whatS3InputSourcedoes). That will at least allow it to be shared across calls tocreateStorageConnector.Could you please also put some comments about the problem in
ServerSideEncryptingAmazonS3#builder. Ultimately fixing it would I think involve makingServerSideEncryptingAmazonS3closeable, and arranging forcloseto actually be called, which would be a larger change.
Updated to use Suppliers.memoize and added javadoc for ServerSideEncryptingAmazonS3.Builder.build(). PTAL!
gianm
left a comment
There was a problem hiding this comment.
Main code looks good to me, just some docs are missing.
| INSERT INTO | ||
| EXTERN( | ||
| s3(bucket => 'your_bucket', prefix => 'prefix/to/files')) | ||
| s3(bucket => 'your_bucket', prefix => 'prefix/to/files', assumeRoleArn => 'arn:aws:iam::some-role')) |
There was a problem hiding this comment.
Please also update docs/multi-stage-query/reference.md. The table of s3 options should include assumeRoleArn and assumeRoleExternalId.
FrankChen021
left a comment
There was a problem hiding this comment.
I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.
Reviewed 7 of 7 changed files.
This is an automated review by Codex GPT-5.5
Description
Consolidates the creation of
ServerSideEncryptingAmazonS3into a single static builder method to reduce code duplication and improve maintainability.Release Note
This PR has: