Skip to content

Remove maxResultsSize config property from S3OutputConfig#14101

Merged
cryptoe merged 6 commits intoapache:masterfrom
LakshSingla:s3-better-chunking-error
Apr 18, 2023
Merged

Remove maxResultsSize config property from S3OutputConfig#14101
cryptoe merged 6 commits intoapache:masterfrom
LakshSingla:s3-better-chunking-error

Conversation

@LakshSingla
Copy link
Contributor

@LakshSingla LakshSingla commented Apr 17, 2023

Description

This PR removes the property maxResultsSize from the S3OutputConfig. That property was used to limit the max size that can be fetched from the S3 (and used primarily in the durable storage of MSQ).
However, since we now chunk downloads from S3, we donot need such a guardrail and the property can be removed from the config.
Moreover, a default chunk size of 100MiB has been introduced (earlier it was calculated based on the value passed in the maxResultsSize). Removing maxResultsSize also clears up the hurdles associated with calculating the appropriate chunkSize.

Release note

"maxResultsSize" has been removed from the S3OutputConfig and a default "chunkSize" of 100MiB is now present. This change primarily affects users who wish to use durable storage for MSQ jobs.


Key changed/added classes in this PR
  • S3OutputConfig
  • RetryableS3OutputStream

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@LakshSingla LakshSingla requested a review from cryptoe April 17, 2023 05:02
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.


@JsonProperty
private HumanReadableBytes maxResultsSize = new HumanReadableBytes("100MiB");
private HumanReadableBytes chunkSize = new HumanReadableBytes("100MiB");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need :

 public static final long S3_MULTIPART_UPLOAD_MIN_OBJECT_SIZE_BYTES = 5L * 1024*1024;
 public static final long S3_MULTIPART_UPLOAD_MAX_OBJECT_SIZE_BYTES = 5L * 1024 * 1024 * 1024 * 1024;


@Nullable
@JsonProperty
private HumanReadableBytes chunkSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the documentation also.
The release notes should mention that the defaults have changed and the value is removed.

@cryptoe cryptoe added this to the 26.0 milestone Apr 17, 2023
LakshSingla and others added 2 commits April 17, 2023 23:40
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just waiting for a green build.

@cryptoe cryptoe merged commit 8eb854c into apache:master Apr 18, 2023
cryptoe pushed a commit that referenced this pull request Apr 18, 2023
* "maxResultsSize" has been removed from the S3OutputConfig and a default "chunkSize" of 100MiB is now present. This change primarily affects users who wish to use durable storage for MSQ jobs.

(cherry picked from commit 8eb854c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants