New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-5927. Improve defaults in ContainerBalancerConfiguration #2892
Conversation
Hey @lokeshj1703 @JacksonYao287
Proposed changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a review, I just skimmed the PR and wanted to make sure the proto.lock files aren't modified.
@@ -1334,7 +1334,7 @@ | |||
}, | |||
{ | |||
"id": 3, | |||
"name": "idleiterations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't EVER change proto lock files! These are automatically updated by protolock only on each major or minor release to ensure compatibility. This is an easy mistake to make when proto-backwards-compatability-check fails though, and it would definitely be helpful if our CI could help the devs out here : ) @adoroszlai is there an easy check we can put in CI to fail PRs that that modify anything called proto.lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This seems to have sneaked in while I was refactoring this field's name across the project. CI got skipped since this is still a draft PR.
BTW, from what I understand, it's compatible to change the name of a proto field as long as the ID and type remain the same, and JSON is not involved. The hadoop policy that's linked here also says it's compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the rename in the proto itself is safe, lets just make sure the proto lock stays in its original state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing only the proto file fails the backwards compatibility check here:
[INFO] CONFLICT: "StartContainerBalancerRequestProto" field: "idleiterations" has been removed, but is not reserved [ScmAdminProtocol.proto]
[INFO] CONFLICT: "StartContainerBalancerRequestProto" field: "iterations" ID: 3 has an updated name, previously "idleiterations" [ScmAdminProtocol.proto]
With the proto lock file in its original state.
# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerConfiguration.java
…tainerBalancer constructor. This change is required for ContainerBalancer#toString() to be able to call ContainerBalancerConfiguration#toString().
A summary of the changes:
|
@lokeshj1703 @JacksonYao287 @errose28 please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @siddhantsangwan for this work , the changes overall looks good ! i will take a deep review after #2893 is merged
optional int32 idleiterations = 3 [deprecated = true]; | ||
optional double maxDatanodesRatioToInvolvePerIteration = 4 [deprecated = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just replace them with the new. Compatibility may be not considered for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backwards compatibility goal fails during build if I just replace them.
" of the entire cluster) no more than the threshold value.") | ||
private String threshold = "0.1"; | ||
" of the entire cluster) no more than the threshold.") | ||
private String threshold = "10"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also make threshold int not String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I wanted to do. But we'll need to deprecate threshold in the proto message and add a new field with the int32 type. Should I go ahead with that?
# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerConfiguration.java # hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestContainerBalancer.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siddhantsangwan Thanks for working on this! The changes look good to me. Just a minor comment inline.
if(request.hasIdleiterations()) { | ||
idleiterations = Optional.of(request.getIdleiterations()); | ||
|
||
if (request.hasIterations()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also handle idleIterations here?
# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java
@lokeshj1703 Thanks for the review! I've updated the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @siddhantsangwan for this work! LGTM +1
Thanks for the review @JacksonYao287 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siddhantsangwan Thanks for updating the PR! The changes look good to me. +1.
@siddhantsangwan Thanks for the contribution! @errose28 @JacksonYao287 Thanks for the reviews! I have merged the PR to master branch. |
…ansport support (#3074) * HDDS-6149. Remove unused keytabs (#2960) * HDDS-6094. Some unit tests are skipped due to using JUnit4 runner (#2909) * HDDS-6075. OzoneConfiguration constructor overrides input configuration keys. (#2921) * HDDS-4177. SCM Container DB bootstrap on Recon startup (#2942) * HDDS-6086. Compute MD5MD5CRC file checksum using chunk checksums from DataNodes (#2919) * HDDS-6148. Validate ContainerBalancerConfiguration before start ContainerBalancer (#2957) * HDDS-6161. SCM StateMachine failing to reinitialize doesn't terminate the process. (#2971) * HDDS-6134. Move replication-specific config to ReplicationServer (#2943) * HDDS-4010. S3G startup fails when multiple service ids are configured. (#2976) * HDDS-6170. Add metrics to replication manager to track container health states (#2975) * HDDS-3231. Cleanup KeyManagerImpl (#2961) * HDDS-5927. Improve defaults in ContainerBalancerConfiguration (#2892) * HDDS-6157. More consistent synchronization in InputStreams (#2965) * HDDS-4743. [FSO] Add FSO variant of ITestOzoneContractDistcp. (#2980) * HDDS-6114. Intermittent error due to Failed to init RocksDB (#2947) * HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient. (#2981) * HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient. * HDDS-5740. Enable ratis by default for SCM. (#2637) * HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM. (#2991) * HDDS-4190. Intermittent failure in TestOMVolumeSetOwnerRequest and TestOMVolumeSetQuotaRequest. (#2982) * HDDS-6120. Compute block checksum using chunk checksums (#2930) * HDDS-6147. Add ability in OM to get limited delta updates (#2956) * HDDS-6195. Remove unused jmh-core dependency. (#2997) * HDDS-6167. Update ozone-runner version to 20211202-1 (#2969) * HDDS-6171. Create an API to change Bucket Owner (#2988) * HDDS-6163. Fix PATH in docker image (#2967) * HDDS-6202. Avoid using jmh-generator-annprocess since it is GPL2.0. (#2998) * HDDS-6135. SCM Container DB bootstrap on Recon startup for SCM HA. (#2972) * HDDS-6109. Preserve the underlying exception raised in client lib. (#2989) * HDDS-3408. Rename ChunkLayOutVersion to ContainerLayoutVersion. (#2983) * HDDS-6203. CleanUp incomplete gz files during Container move (#3000) * HDDS-6216. Move OMOpenKeysDeleteRequest to package om.request.key (#3011) * HDDS-6191. Intermittent failure in TestDeleteWithSlowFollower (#3015) * HDDS-6128. CLI tool that downloads all the block replicas and creates a manifest file (#2987) * HDDS-6177. Extend container info command to include replica details (#2995) * HDDS-6211. [Docs] Image styling on deployed site does not replicate local builds. (#3007) * HDDS-6219. Switch to RATIS ReplicationType from STAND_ALONE in our tests. (#3014) * HDDS-6192. feature/Observability.md translated to Chinese (#2994) * HDDS-6205. Add CLI command to display the latest Replication Manager report (#3013) * HDDS-6227. Test helpers should observe naming conditions (#3020) * HDDS-6239. ozonesecure-mr failing with No URLs in mirrorlist (#3029) * HDDS-6201. Fix NPE for DataScanner with scanned container deleted by others. (#3005) * HDDS-5529. For any IOexception from @REPLicated method we should throw it (#2788) * HDDS-6181. Change SCMHAInvocationHandler#invokeRatis() logging to TRACE (#2992) * HDDS-6206. Application errors must not flood system log (#3001) * HDDS-6245. Add BucketLayout logging to Audit Logs (#3040) * HDDS-6238 Reduce memory requirements for list keys. (#3032) * HDDS-2919. Intermittent failure in TestRDBStore (#3028) * HDDS-6253. Unnecessary duplicate smoketest after defaulting to FSO (#3036) * HDDS-6204. Cleanup handling malformed authorization header (#2999) * HDDS-6169. Selective checks: skip junit tests on ozone-runner image update (#2974) * HDDS-6270. Use a dedicated file instead of /etc/passwd for xcompat acceptance test (#3050) * HDDS-6273. Amend doc SecuringTDE.md (#3047) * HDDS-6140. Selective checks: skip unit check for integration-test changes (#2948) * HDDS-6215. Recon get limited delta updates from OM (#3009) * HDDS-6125. Recon get limited delta updates from OM * HDDS-6215. Fix unit test * trigger new CI check * HDDS-6215. Fix typo * trigger new CI check Co-authored-by: Symious <yiyang0203@gmail.com> * HDDS-6226. Run tests for selective CI checks in CI (#3019) * HDDS-6247. Avoid logging stack trace for user input problems (#3039) * HDDS-6208. New checkstyle: WhitespaceAround (#3003) * HDDS-6289. Upgrade acceptance test log flooded with parse error (#3063) Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com> * Trigger Build * Fix integration test for added configuation field for selecting OmTransport for s3 gateway - TestOzoneConfigurationFields (added config key not in xml). Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com> Co-authored-by: Lokesh Jain <ljain@apache.org> Co-authored-by: Aswin Shakil Balasubramanian <aswinshakilbalu@gmail.com> Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org> Co-authored-by: Symious <yiyang0203@foxmail.com> Co-authored-by: Bharat Viswanadham <bharat@apache.org> Co-authored-by: Stephen O'Donnell <stephen.odonnell@gmail.com> Co-authored-by: GeorgeJahad <github@blackbirdsystems.net> Co-authored-by: Siddhant Sangwan <siddhantsangwan027@gmail.com> Co-authored-by: Jyotinder Singh <jyotindrsingh@gmail.com> Co-authored-by: Shashikant Banerjee <shashikant@apache.org> Co-authored-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org> Co-authored-by: Kiyoshi Mizumaru <kiyoshi.mizumaru@gmail.com> Co-authored-by: Ritesh H Shukla <kerneltime@gmail.com> Co-authored-by: Nibiru <axcsd3692@qq.com> Co-authored-by: Kaijie Chen <chen@kaijie.org> Co-authored-by: Zita Dombi <50611074+dombizita@users.noreply.github.com> Co-authored-by: Istvan Fajth <pifta@cloudera.com> Co-authored-by: steinsgateted <71066027+steinsgateted@users.noreply.github.com> Co-authored-by: Gui Hecheng <markgui@tencent.com> Co-authored-by: Jackson Yao <jacksonyao@tencent.com> Co-authored-by: Keyi Song <72794035+sky76093016@users.noreply.github.com> Co-authored-by: UENISHI Kota <kuenishi@users.noreply.github.com> Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com> Co-authored-by: Symious <yiyang0203@gmail.com>
What changes were proposed in this pull request?
Making the defaults smarter and more readable, such as expressing threshold in percentage, making
maxSizeEnteringTarget
equal the size of five containers etc.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5927
How was this patch tested?
TestContainerBalancer
unit tests.