Skip to content

Various performance improvements.#3178

Merged
millems merged 2 commits into
masterfrom
millem/perf
May 10, 2022
Merged

Various performance improvements.#3178
millems merged 2 commits into
masterfrom
millem/perf

Conversation

@millems
Copy link
Copy Markdown
Contributor

@millems millems commented May 5, 2022

  1. Replace uses of String.replace with StringUtils.replace and StringUtils.replaceEach to avoid the overhead of regex parsing the search term.
  2. Remove uses of SdkHttpRequest and SdkHttpResponse's headers and rawQueryParameters calls in favor of more specific methods that do not require data copying. Added spotbugs rules to prevent uses of these methods in the future.
  3. Update SdkHttpRequest and SdkHttpResponse to not copy data in headers and rawQueryParameters when round-tripped to/from builder unless it is actually modified.
  4. Reduce string copies performed during message signing.
  5. Remove uses of String.format that were performance-impacting on profiler output.
  6. Remove construction of log strings within the netty client unless the logging is enabled.
  7. Move most user-agent construction to client creation time instead of at request time.
  8. Initialize collection/map capacities to prevent resizes that will always occur.
  9. Prevent creation of RateLimitingTokenBucket unless adaptive retries are enabled.

@millems millems requested a review from a team as a code owner May 5, 2022 19:57
1. Replace uses of String.replace with StringUtils.replace and StringUtils.replaceEach to avoid the overhead of regex parsing the search term.
2. Remove uses of SdkHttpRequest and SdkHttpResponse's headers and rawQueryParameters calls in favor of more specific methods that do not require data copying. Added spotbugs rules to prevent uses of these methods in the future.
3. Update SdkHttpRequest and SdkHttpResponse to not copy data in headers and rawQueryParameters when round-tripped to/from builder unless it is actually modified.
4. Reduce string copies performed during message signing.
5. Remove uses of String.format that were performance-impacting on profiler output.
6. Remove construction of log strings within the netty client unless the logging is enabled.
7. Move most user-agent construction to client creation time instead of at request time.
8. Initialize collection/map capacities to prevent resizes that will always occur.
9. Prevent creation of RateLimitingTokenBucket unless adaptive retries are enabled.
this.dependencies = dependencies;
this.requestPipeline = requestPipeline;
this.rateLimitingTokenBucket = new RateLimitingTokenBucket();
this.rateLimitingTokenBucket = null;
Copy link
Copy Markdown
Contributor Author

@millems millems May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're reviewing, @dagnir: Isn't the RetryableStage created with each request, so this rate limiting token bucket is never reused between requests, making it essentially do nothing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't fix that here. I can create a backlog item to fix it...

this.dependencies = dependencies;
this.requestPipeline = requestPipeline;
this.rateLimitingTokenBucket = new RateLimitingTokenBucket();
this.rateLimitingTokenBucket = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch!

@dagnir
Copy link
Copy Markdown
Contributor

dagnir commented May 10, 2022

Awesome stuff!

@millems millems enabled auto-merge (squash) May 10, 2022 22:19
@millems millems disabled auto-merge May 10, 2022 22:53
@millems millems enabled auto-merge (squash) May 10, 2022 22:55
@sonarqubecloud
Copy link
Copy Markdown

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 24 Code Smells

61.1% 61.1% Coverage
11.3% 11.3% Duplication

@millems millems merged commit 1eaba50 into master May 10, 2022
chibenwa added a commit to chibenwa/james-project that referenced this pull request May 13, 2022
S3 team had been, partly on our reports [1] working
on numerous performance enhancements [2] [3] that
Apache James IMAP workload would definitely benefit
from.

[1] aws/aws-sdk-java-v2#3162

[2] aws/aws-sdk-java-v2#3175

[3] aws/aws-sdk-java-v2#3178

Upgrade

SdkHttpFullRequest.Builder builder = doSign(request, requestParams, signingParams,
new ContentChecksum(digestHex, sdkChecksum));
new ContentChecksum(digestHex, sdkChecksum));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the indent change here intended? Or an IDE fail?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intended, but more of an IDE success. The indent is an improvement, imo.

* @return true if the character is white space, false otherwise.
*/
private boolean isWhiteSpace(final char ch) {
return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\u000b' || ch == '\r' || ch == '\f';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guava's charmatcher allows denser checks, with a binary search and binary sets, which might be handy with 6 + chars to check for.... Especially if in the future you decide to add more.

Copy link
Copy Markdown
Contributor Author

@millems millems May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered doing something like a 256 size boolean array lookup table for chars being whitespace (with some special handling for unicode), but this method didn't show up very high on a profiler so it felt like premature optimization. It would be nice if we could have a Guava dependency, but since we have to implement it ourselves (or deal with importing it), I wasn't inclined.

Comment on lines 381 to +382
byte[] kSecret = ("AWS4" + credentials.secretAccessKey())
.getBytes(Charset.forName("UTF-8"));
.getBytes(StandardCharsets.UTF_8);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this needs to be recomputed all of the time? Can't the credentials holds kSecret?

Copy link
Copy Markdown
Contributor Author

@millems millems May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a cool idea. We'd probably want to use a concurrent hash map with weak reference keys from credential to cached values so that we don't leak that information out of the signer, but that could definitely save some computation for reused credentials at the cost of some volatile reads.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought credentials was immutable when commenting. If mutability is in the game it would ruin potentially ruin gains...

Copy link
Copy Markdown
Contributor Author

@millems millems May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credentials is immutable, but they might be used for other signing algorithms, so we don't really want to pre-calculate it within the credentials when it's created.

.stream()
.collect(Collectors.toMap(Map.Entry::getKey, e -> HeaderValue.fromString(
firstIfPresent(e.getValue()))));
Map<String, HeaderValue> headers = new LinkedHashMap<>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything preventing us to pre-size this hash map?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, good catch!

chibenwa added a commit to apache/james-project that referenced this pull request May 24, 2022
S3 team had been, partly on our reports [1] working
on numerous performance enhancements [2] [3] that
Apache James IMAP workload would definitely benefit
from.

[1] aws/aws-sdk-java-v2#3162

[2] aws/aws-sdk-java-v2#3175

[3] aws/aws-sdk-java-v2#3178
@millems millems deleted the millem/perf branch October 19, 2022 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants