-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-4440. HDDS-4440-s3-performance branch Merge to Master PR - draft #3297
Conversation
…rt (apache#2485) * Commit for create specific OmTransportFactory initialized with META-INF/services api. Modifications made for green build targeted to s3 grpc incrementental PRs. Integration tests are now independent of s3g implementation (through pom file changes). Smoketests s3g specific are temp not run, delayed until further implementation submitted to support functionality.
…he#2580) * Initial commit with s3 gRPC client and associated unit tests. * Added configurable OZONE_OM_GRPC_MAXIMUM_RESPONSE_LENGTH to config keys for s3g om OmRequest OmResponses over gRPC. * Added suggested changes to unit tests from review. Including updating test timeout to appropriate test timeout value from 5min to 30s and using custom values (to reflect server response) in OMResponse to validate gRPC transport of Ozone Manager Protocol. * Added configuration setting in ozone-default for ozone.om.grpc.maximum.response.length.
…tion transport (apache#2655) * Modifications made as per reviewers comments. Clean up code, remove unnecessary code (not used or required). Changed OzoneManagerProtocol OmRequest from using multiple optional single variable entries for s3 auth to using an optional S3Authentication structure containing s3 auth info. Added default enabling of s3 Grpc on om through configuration setting in compose ozone cluster and re-enabled s3 smoke tests for that cluster.
Co-authored-by: Neil Joshi <neil2021.gres@gmail.com>
…ster branch with HDDS-5881 changes INTO HDDS-4400-s3-performance feature branch. Merge allows s3 gateway gRPC to work with s3 gateway persistent connection changes made in HDDS-5881 found in master branch.
…r to satisfy CI findbugs.
…ch green builds. Merging with Master branch impacts HDDS-4440 PRs in progress. This merge also requires partial changes that were initially in PR for HDDS-5545 to be included. Namely, the patch to allow switching between OmTransports, GrpcOmTransport and Hadoop3OmTransport through ozone configuration in the OmConfigKeys.
…nsport for s3 gateway - TestOzoneConfigurationFields (added config key not in xml).
…ansport support (apache#3074) * HDDS-6149. Remove unused keytabs (apache#2960) * HDDS-6094. Some unit tests are skipped due to using JUnit4 runner (apache#2909) * HDDS-6075. OzoneConfiguration constructor overrides input configuration keys. (apache#2921) * HDDS-4177. SCM Container DB bootstrap on Recon startup (apache#2942) * HDDS-6086. Compute MD5MD5CRC file checksum using chunk checksums from DataNodes (apache#2919) * HDDS-6148. Validate ContainerBalancerConfiguration before start ContainerBalancer (apache#2957) * HDDS-6161. SCM StateMachine failing to reinitialize doesn't terminate the process. (apache#2971) * HDDS-6134. Move replication-specific config to ReplicationServer (apache#2943) * HDDS-4010. S3G startup fails when multiple service ids are configured. (apache#2976) * HDDS-6170. Add metrics to replication manager to track container health states (apache#2975) * HDDS-3231. Cleanup KeyManagerImpl (apache#2961) * HDDS-5927. Improve defaults in ContainerBalancerConfiguration (apache#2892) * HDDS-6157. More consistent synchronization in InputStreams (apache#2965) * HDDS-4743. [FSO] Add FSO variant of ITestOzoneContractDistcp. (apache#2980) * HDDS-6114. Intermittent error due to Failed to init RocksDB (apache#2947) * HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient. (apache#2981) * HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient. * HDDS-5740. Enable ratis by default for SCM. (apache#2637) * HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM. (apache#2991) * HDDS-4190. Intermittent failure in TestOMVolumeSetOwnerRequest and TestOMVolumeSetQuotaRequest. (apache#2982) * HDDS-6120. Compute block checksum using chunk checksums (apache#2930) * HDDS-6147. Add ability in OM to get limited delta updates (apache#2956) * HDDS-6195. Remove unused jmh-core dependency. (apache#2997) * HDDS-6167. Update ozone-runner version to 20211202-1 (apache#2969) * HDDS-6171. Create an API to change Bucket Owner (apache#2988) * HDDS-6163. Fix PATH in docker image (apache#2967) * HDDS-6202. Avoid using jmh-generator-annprocess since it is GPL2.0. (apache#2998) * HDDS-6135. SCM Container DB bootstrap on Recon startup for SCM HA. (apache#2972) * HDDS-6109. Preserve the underlying exception raised in client lib. (apache#2989) * HDDS-3408. Rename ChunkLayOutVersion to ContainerLayoutVersion. (apache#2983) * HDDS-6203. CleanUp incomplete gz files during Container move (apache#3000) * HDDS-6216. Move OMOpenKeysDeleteRequest to package om.request.key (apache#3011) * HDDS-6191. Intermittent failure in TestDeleteWithSlowFollower (apache#3015) * HDDS-6128. CLI tool that downloads all the block replicas and creates a manifest file (apache#2987) * HDDS-6177. Extend container info command to include replica details (apache#2995) * HDDS-6211. [Docs] Image styling on deployed site does not replicate local builds. (apache#3007) * HDDS-6219. Switch to RATIS ReplicationType from STAND_ALONE in our tests. (apache#3014) * HDDS-6192. feature/Observability.md translated to Chinese (apache#2994) * HDDS-6205. Add CLI command to display the latest Replication Manager report (apache#3013) * HDDS-6227. Test helpers should observe naming conditions (apache#3020) * HDDS-6239. ozonesecure-mr failing with No URLs in mirrorlist (apache#3029) * HDDS-6201. Fix NPE for DataScanner with scanned container deleted by others. (apache#3005) * HDDS-5529. For any IOexception from @REPLicated method we should throw it (apache#2788) * HDDS-6181. Change SCMHAInvocationHandler#invokeRatis() logging to TRACE (apache#2992) * HDDS-6206. Application errors must not flood system log (apache#3001) * HDDS-6245. Add BucketLayout logging to Audit Logs (apache#3040) * HDDS-6238 Reduce memory requirements for list keys. (apache#3032) * HDDS-2919. Intermittent failure in TestRDBStore (apache#3028) * HDDS-6253. Unnecessary duplicate smoketest after defaulting to FSO (apache#3036) * HDDS-6204. Cleanup handling malformed authorization header (apache#2999) * HDDS-6169. Selective checks: skip junit tests on ozone-runner image update (apache#2974) * HDDS-6270. Use a dedicated file instead of /etc/passwd for xcompat acceptance test (apache#3050) * HDDS-6273. Amend doc SecuringTDE.md (apache#3047) * HDDS-6140. Selective checks: skip unit check for integration-test changes (apache#2948) * HDDS-6215. Recon get limited delta updates from OM (apache#3009) * HDDS-6125. Recon get limited delta updates from OM * HDDS-6215. Fix unit test * trigger new CI check * HDDS-6215. Fix typo * trigger new CI check Co-authored-by: Symious <yiyang0203@gmail.com> * HDDS-6226. Run tests for selective CI checks in CI (apache#3019) * HDDS-6247. Avoid logging stack trace for user input problems (apache#3039) * HDDS-6208. New checkstyle: WhitespaceAround (apache#3003) * HDDS-6289. Upgrade acceptance test log flooded with parse error (apache#3063) Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com> * Trigger Build * Fix integration test for added configuation field for selecting OmTransport for s3 gateway - TestOzoneConfigurationFields (added config key not in xml). Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com> Co-authored-by: Lokesh Jain <ljain@apache.org> Co-authored-by: Aswin Shakil Balasubramanian <aswinshakilbalu@gmail.com> Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org> Co-authored-by: Symious <yiyang0203@foxmail.com> Co-authored-by: Bharat Viswanadham <bharat@apache.org> Co-authored-by: Stephen O'Donnell <stephen.odonnell@gmail.com> Co-authored-by: GeorgeJahad <github@blackbirdsystems.net> Co-authored-by: Siddhant Sangwan <siddhantsangwan027@gmail.com> Co-authored-by: Jyotinder Singh <jyotindrsingh@gmail.com> Co-authored-by: Shashikant Banerjee <shashikant@apache.org> Co-authored-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org> Co-authored-by: Kiyoshi Mizumaru <kiyoshi.mizumaru@gmail.com> Co-authored-by: Ritesh H Shukla <kerneltime@gmail.com> Co-authored-by: Nibiru <axcsd3692@qq.com> Co-authored-by: Kaijie Chen <chen@kaijie.org> Co-authored-by: Zita Dombi <50611074+dombizita@users.noreply.github.com> Co-authored-by: Istvan Fajth <pifta@cloudera.com> Co-authored-by: steinsgateted <71066027+steinsgateted@users.noreply.github.com> Co-authored-by: Gui Hecheng <markgui@tencent.com> Co-authored-by: Jackson Yao <jacksonyao@tencent.com> Co-authored-by: Keyi Song <72794035+sky76093016@users.noreply.github.com> Co-authored-by: UENISHI Kota <kuenishi@users.noreply.github.com> Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com> Co-authored-by: Symious <yiyang0203@gmail.com>
…eature branch (apache#3257)" This reverts commit 3f585ab.
…ache#2945)" This reverts commit 0b0c586.
…hrpcOmTransport support (apache#3074)" This reverts commit b88a45e.
…440-s3-performance
… current master. Resolved conflicts affecting feature versioning PR#3155, propagating s3 exception stack trace HDDS-6257 PR#3066 and performFailover PR#3160.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @neils-dev and everyone who worked on this feature.
I have a few questions about the code change.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java
Show resolved
Hide resolved
hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
Outdated
Show resolved
Hide resolved
...one/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
Outdated
Show resolved
Hide resolved
...one/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
Show resolved
Hide resolved
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java
Show resolved
Hide resolved
...ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestGrpcOzoneManagerServer.java
Outdated
Show resolved
Hide resolved
...op-ozone/s3gateway/src/test/java/org/apache/hadoop/ozone/protocolPB/TestGrpcOmTransport.java
Outdated
Show resolved
Hide resolved
…ed constant for max len messages. Changes made to integration test for configuration test removing redundent statement. Small change to s3g Grpc unit tests for client and server to not only print out exception stack on error but to also assert on error ensuring test fails on error.'
…rpcOzoneManagerServer from definition in ConfigGroup for GrpcOmTransport.
|
||
<property> | ||
<name>ozone.om.grpc.maximum.response.length</name> | ||
<value>134217728</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we validated that list keys will never violate this max length (if I understand this correctly)?
How is this value selected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kerneltime for reviewing this and for your comments.
This config defines the maximum message size limit that we send / receive over the gRPC channel.
The value was selected based on the max size defined for Hadoop RPC messages :ipc.maximum.response.length
with default size 128 * 1024 * 1024.
If a list keys message exceeds this maximum limit for the channel, then it will not be processed and the RPC will fail with RESOURCE_EXHAUSTED
. The exception should be raised from the channel and propagated to the user client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this enforced at the client side or server side. As in will the server hit this when returning more than the max number of bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is enforced by the both the client and server as we define the limit, max message size. In the case of bucket or key listing, the server will feed the payload within the OMResponse
and the length of the message is enforced by the client in this case. The client raises a RESOURCE_EXHAUSTED - message length exceed
io.grpc exception. This occurs when the message passed from the server to the client exceeds the incoming max message length.
Thanks @kerneltime, in looking into handling this exception I found that it needs to be isolated by the failover proxy provider retryPolicy
to fail (not retry) on calling the shouldFailover
method. The failover provider and retryPolicy shouldFailover
is currently refactored and will be included in HDDS-6433 that will be commited to the Master after the feature branch merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @neils-dev for updating the patch. Seems like there is a conflict due to HDDS-6346.
...ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestGrpcOzoneManagerServer.java
Outdated
Show resolved
Hide resolved
...op-ozone/s3gateway/src/test/java/org/apache/hadoop/ozone/protocolPB/TestGrpcOmTransport.java
Outdated
Show resolved
Hide resolved
…rvice. Resync with master to resolve merge conflicts.
Yes, I added that to the master so that the patch was not lost. It was for the master branch. |
… OmClientProtocol for S3Authentication messages changing required fields to optional within message.
Thanks @neils-dev for addressing my comments and updating the branch. I'm fine with the change, let's see if @kerneltime (or anyone else) has further comments. |
Why is this marked as a draft? Also, please rebase. |
Latest commit to re-sync with master branch resolving conflicts. Thanks @adoroszlai and @kerneltime for your comments and help! |
Because we want to avoid a merge via this PR, as that would squash commits. The PR is only for discussion about code changes. Feature branch should be merged via command-line. CI checks are run in Neil's fork. |
HDDS-4440 feature branch was merged, this can be closed. |
What changes were proposed in this pull request?
Draft PR for feature branch merge to master.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4440
How was this patch tested?
CI workflow.