Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flexible checksum support and update perf tests #3376

Merged
merged 3 commits into from
Aug 26, 2022

Conversation

zoewangg
Copy link
Contributor

@zoewangg zoewangg commented Aug 24, 2022

Motivation and Context

Add flexible checksum support and update perf tests

Testing

Added tests

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the CONTRIBUTING document
  • Local run of mvn install succeeds
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed
  • I have added a changelog entry. Adding a new entry must be accomplished by running the scripts/new-change script and following the instructions. Commit the new file created by the script in .changes/next-release with your changes.
  • My change is to implement 1.11 parity feature and I have updated LaunchChangelog

License

  • I confirm that this pull request can be released under the Apache 2 license


ResponseBytes<GetObjectResponse> getObjectResponseResponseBytes =
s3Crt.getObject(r -> r.bucket(TEST_BUCKET).key(TEST_KEY), AsyncResponseTransformer.toBytes()).join();
String getObjectChecksum = getObjectResponseResponseBytes.response().checksumSHA1();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly part of the PR, but I'm wondering about how customers who do a get without having done a put can know which checksum to get from the response. Do you think it's assumed that either they have a checksum from somewhere and then they know which one to use, or if they don't they won't require it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, customers do not need to worry about validating checksum because the SDK knows the algorithm from the response headers and validates it automatically. The SDK also maintains a list of response algorithms that should be used if there are multiple checksums available in the response headers.

I think users can do a headObject call to know if flexible checksum is enabled or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

SdkHttpExecutionAttributes.builder()
.put(OPERATION_NAME,
executionAttributes.getAttribute(SdkExecutionAttribute.OPERATION_NAME))
.build();
.put(HTTP_CHECKSUM, executionAttributes.getAttribute(SdkInternalExecutionAttribute.HTTP_CHECKSUM));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find where the SdkInternalExecutionAttribute.HTTP_CHECKSUM is set in the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's set in the generated service code based on the service model.

}

// TODO: revisit default checksum
Algorithm algorithm = httpChecksum.requestAlgorithm() == null ? Algorithm.CRC32 :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the results of running different algorithms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my previous tests, I did not notice much difference, but it may be because the machine I use have SHA acceleration. I'd need to run more tests.


/**
* Checksum algorithm is not applicable to the following situations:
* 1. checksum validation is disabled OR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful information for posterity

@sonarcloud
Copy link

sonarcloud bot commented Aug 26, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 56 Code Smells

71.0% 71.0% Coverage
8.3% 8.3% Duplication

@zoewangg zoewangg merged commit 42f47f5 into feature/master/transfermanager-GA Aug 26, 2022
@zoewangg zoewangg deleted the zoewang/tm-perfTests branch August 26, 2022 16:58
zoewangg added a commit that referenced this pull request Dec 14, 2022
* Replacing S3TransferManager interfaces that allowed builder methods of S3ClientConfiguration with builder methods of S3AsyncClient (#3247)

* Added customization in codegen to generate additional builder methods (#3252)

* S3Object based DownloadFilter and removing DownloadFileContext as destination based filter is removed (#3258)

* Moved tm POJO classes to model pckage and tm config classes to config package. Added integration tests for s3 select using S3CrtAsyncClient (#3289)

* Fix broken integ test (#3301)

* S3 Transfer manager renamings based on feedback: (#3297)

1. Rename destinationDirectory to destination.
2. Move DownloadDirectoryRequest.prefix and delimiter to just rely on modifying the list requests.
3. Remove upload directory recursive option in favor of using maxDepth(1).
4. Rename UploadDirectoryRequest's prefix and delimiter to s3Prefix and s3Delimiter.
5. Rename ResumableFileDownload's to* and writeTo* methods to serializeTo*. Remove charsets from write/read methods, and just use UTF-8.
6. Do not base64 encode when writing ResumableFileDownload to disk.

* Allow pausing a resumed download even when the download hasn't already started. (#3300)

* Add POJO classes for upload pause/resume (#3337)

* Refactoring of Transfer manager APIs (#3374)

* Refactoring of Transfer manager APIs

* Merging the integ test failure Pr 2119 from stagging branch

* Add flexible checksum support and update perf tests (#3376)

* Fix flexiblechecksum implementation (#3391)

* [TM upload pause/resume Part 2] Implement pause and resume for uploadFile (#3357)

* Implement pause and resume for uploadFile

* Update Javadocs

* address feedback

* Implement automatic multipart copy functionality in S3 CRT async client (#3403)

* Implement automatic multipart copy functionality in S3 CRT async client

* Add more tests

* fix cancellation logic

* Refactor CopyRequestProvider, fix request conversion and add more tests

* Fix checkstyle

* Transfer Manager tests refactoring (#3420)

* Remove use of Junit4, clean up and consolidate tests in tm module

* Ignoring the test if unicode can't be used as directory name

* Add serialization and deserialization support for ResumableFileUpload (#3432)

* Support serialization and deserialization of ResumableFileUpload

* Address feedback

* Empty json should be unmarshalled to empty map

* Errors should not be wrapped - S3 Transfer Manager (#3433)

* Errors should not be wrapped

* update handleException()

* Changelog entry

* Resolve comments
Update changelog description, refactor handleException(), add test

* Add failed message to SdkException

* Refactor handleException() and format changelog (#3461)

* Fixed an issue where SSEC params were not correctly passed in copy operation (#3464)

* Replace inline snippets with external compilable snippets (#3465)

* Replace inline snippets with external compilable snippets

* Fix build and address feedback

* Fix build

* Only enable CRT checksum for getObject and putObject (#3477)

* Only use CRT flexible checksum for getObject and putObject

* Fix build

* Fix integ tests set up and tear down steps (#3485)

* Enable backpressure in TM (#3533)

* integrate with crt s3 flow control

* Update benchmark code

* Add backpressure config

* Change window size

* Update initial window size

* Change intial window size

* Use heap max memory for initial window size

* Give some buffer

* change window size

* Make read buffer size configurable

* Log result to a file

* Various updates

* Various updates

* Add CRT benchmark

* Various updates

* Fix checkstyle errors and tests

* Fix flaky test

* Fix checkstyle errors

* Add validation

* Add tests

* For copy operation, always forward multipart copy exception from one … (#3549)

* For copy operation, always forward multipart copy exception from one request to other multipart copy requests

* Minor refactoring in CopyObjectHelper (#3552)

* Add benchmarks for copy, uploadDirectory and downloadDirectory (#3551)

* Add benchmarks for copy, uploadDirectory and downloadDirectory

* Update sample code and fix snippet path (#3567)

* Update sample code and fix snippet path

* Fix link

* Integrate with CRT checksum fix (#3566)

* Integrate with CRT checksum fix

* Rename sourceDirectory to source and add S3AsycncClient#crtCreate (#3572)

* Rname sourceDirectory to source and add S3AsycncClient#crtCreate

* Use ByteBufferStoringSubscriber (#3581)

* Use ByteBufferStoringSubscriber

* Add a comment

* Create constant for bytes bufferred

* Increase chunk size for file upload (#3583)

* Rename S3TransferManager.build().maxDepth to uploadDirectoryMaxDepth, rename S3TransferManager.builder().s3AsyncClient to .s3Client (#3584)

* Fixed an issue where sdkRepsonse is not present in the ProgressSnapshot for upload and copy (#3585)

* Throw UnsupportedOperationException if a user tries to pause a upload… (#3586)

* Throw UnsupportedOperationException if a user tries to pause a upload with non CRT-based S3 client

* Use SimplePublisher (#3594)

* Update documentation for Transfer Manager (#3592)

* Update javadoc

* Integrate with latest CRT pause/resume fix (#3588)

* Integrate with latest CRT pause/resume fix
* Bump CRT version

* Fixed an issue that could result in uncompletable future when headObject request threw exception in copy (#3609)

* Make crt dependency optional in transfer manager module (#3613)

* Make aws-crt an optional dependency in s3-transfer-manager module.

* Update README

* Fix category for changelog entries

Co-authored-by: John Viegas <70235430+joviegas@users.noreply.github.com>
Co-authored-by: Matthew Miller <millem@amazon.com>
Co-authored-by: David Ho <70000000+davidh44@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants