Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-6215. Recon get limited delta updates from OM #3009

Merged
merged 5 commits into from Feb 9, 2022

Conversation

symious
Copy link
Contributor

@symious symious commented Jan 24, 2022

What changes were proposed in this pull request?

In HDDS-6147, a new API was added to get limited delta updates from OM.

This ticket will let Recon use this new API to call OM to get limited updates. By default, Recon will fetch at least 2000 delta updates 10 times each time trying to sync from OM.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6215

How was this patch tested?

unit test

@symious symious changed the title HDDS-6125. Recon get limited delta updates from OM HDDS-6215. Recon get limited delta updates from OM Jan 24, 2022
@symious
Copy link
Contributor Author

symious commented Jan 24, 2022

@bharatviswa504 @ferhui @avijayanhwx Could you help to review this PR?

@ferhui
Copy link
Contributor

ferhui commented Jan 29, 2022

@JacksonYao287 Could you please help to review this?

metrics.getAverageNumUpdatesInDeltaRequest().value(), 0.0);
assertEquals(3, metrics.getNumNonZeroDeltaRequests().value());

// In this method, we have to assert the "GET" part and the "APPLY" path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"GET" part or path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo, thanks for the review.

@ferhui
Copy link
Contributor

ferhui commented Feb 7, 2022

If no other comments, plan to merge this in 2 days.

@avijayanhwx
Copy link
Contributor

@symious Thanks for working on this. I understand that this is fixing the need to throttle between the OM and Recon. I was wondering how we can monitor (and present to the user) the lag between Recon seq number and OM seq number. Right now, if we assume Recon picks up 20,000 every 10mins, there could easily be a chance that the lag grows higher and higher.

Maybe when the lag is too much, Recon may decide to do a full snapshot automatically or when a user requests it.

@symious
Copy link
Contributor Author

symious commented Feb 8, 2022

@avijayanhwx Thanks for the review.
For the lag, I think there are some scenarios:

  1. A spike, the lag should recover after the spike finishes.
  2. A slowly increasing lag, for example, recon consumes 20,000 records per 10 min, om produce 20,500 seconds per 10 min. in this case, the lag will always exist, it's better for the user to increase the threshold, and we need to give some numeric clue to users for the increase.
  3. A fast increasing lag, in this case, when recon requests updates from OM with an outdated sequenceNumber, the exception of SequenceNumberNotFoundException would be thrown and recon will ask for a full snapshot request. Also In this case, I think we should keep the threshold low, in case the huge updates cause the OOM issue of OM.

For the lag monitoring, an option would be to add a new field in DBUpdate to indicate the lag or the latestSequenceNumber so that recon can use this information to notify the users. Those changes might be not that related to this ticket, I can create another ticket for this request if this option is ok.

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @symious for the work! LGTM +1.
thanks @ferhui and @avijayanhwx for the review!

@JacksonYao287 JacksonYao287 merged commit 11f4c9b into apache:master Feb 9, 2022
arp7 pushed a commit that referenced this pull request Feb 17, 2022
…ansport support (#3074)

* HDDS-6149. Remove unused keytabs (#2960)

* HDDS-6094. Some unit tests are skipped due to using JUnit4 runner (#2909)

* HDDS-6075. OzoneConfiguration constructor overrides input configuration keys. (#2921)

* HDDS-4177. SCM Container DB bootstrap on Recon startup (#2942)

* HDDS-6086. Compute MD5MD5CRC file checksum using chunk checksums from DataNodes (#2919)

* HDDS-6148. Validate ContainerBalancerConfiguration before start ContainerBalancer (#2957)

* HDDS-6161. SCM StateMachine failing to reinitialize doesn't terminate the process. (#2971)

* HDDS-6134. Move replication-specific config to ReplicationServer (#2943)

* HDDS-4010. S3G startup fails when multiple service ids are configured. (#2976)

* HDDS-6170. Add metrics to replication manager to track container health states (#2975)

* HDDS-3231. Cleanup KeyManagerImpl (#2961)

* HDDS-5927. Improve defaults in ContainerBalancerConfiguration (#2892)

* HDDS-6157. More consistent synchronization in InputStreams (#2965)

* HDDS-4743. [FSO] Add FSO variant of ITestOzoneContractDistcp. (#2980)

* HDDS-6114. Intermittent error due to Failed to init RocksDB (#2947)

* HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient. (#2981)

* HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient.

* HDDS-5740. Enable ratis by default for SCM. (#2637)

* HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM. (#2991)

* HDDS-4190. Intermittent failure in TestOMVolumeSetOwnerRequest and TestOMVolumeSetQuotaRequest. (#2982)

* HDDS-6120. Compute block checksum using chunk checksums (#2930)

* HDDS-6147. Add ability in OM to get limited delta updates (#2956)

* HDDS-6195. Remove unused jmh-core dependency. (#2997)

* HDDS-6167. Update ozone-runner version to 20211202-1 (#2969)

* HDDS-6171. Create an API to change Bucket Owner (#2988)

* HDDS-6163. Fix PATH in docker image (#2967)

* HDDS-6202. Avoid using jmh-generator-annprocess since it is GPL2.0. (#2998)

* HDDS-6135. SCM Container DB bootstrap on Recon startup for SCM HA. (#2972)

* HDDS-6109. Preserve the underlying exception raised in client lib. (#2989)

* HDDS-3408. Rename ChunkLayOutVersion to ContainerLayoutVersion. (#2983)

* HDDS-6203. CleanUp incomplete gz files during Container move (#3000)

* HDDS-6216. Move OMOpenKeysDeleteRequest to package om.request.key (#3011)

* HDDS-6191. Intermittent failure in TestDeleteWithSlowFollower (#3015)

* HDDS-6128. CLI tool that downloads all the block replicas and creates a manifest file (#2987)

* HDDS-6177. Extend container info command to include replica details  (#2995)

* HDDS-6211. [Docs] Image styling on deployed site does not replicate local builds. (#3007)

* HDDS-6219. Switch to RATIS ReplicationType from STAND_ALONE in our tests. (#3014)

* HDDS-6192. feature/Observability.md translated to Chinese (#2994)

* HDDS-6205. Add CLI command to display the latest Replication Manager report (#3013)

* HDDS-6227. Test helpers should observe naming conditions (#3020)

* HDDS-6239. ozonesecure-mr failing with No URLs in mirrorlist (#3029)

* HDDS-6201. Fix NPE for DataScanner with scanned container deleted by others. (#3005)

* HDDS-5529. For any IOexception from @REPLicated method we should throw it (#2788)

* HDDS-6181. Change SCMHAInvocationHandler#invokeRatis() logging to TRACE (#2992)

* HDDS-6206. Application errors must not flood system log (#3001)

* HDDS-6245. Add BucketLayout logging to Audit Logs (#3040)

* HDDS-6238 Reduce memory requirements for list keys. (#3032)

* HDDS-2919. Intermittent failure in TestRDBStore (#3028)

* HDDS-6253. Unnecessary duplicate smoketest after defaulting to FSO (#3036)

* HDDS-6204. Cleanup handling malformed authorization header (#2999)

* HDDS-6169. Selective checks: skip junit tests on ozone-runner image update (#2974)

* HDDS-6270. Use a dedicated file instead of /etc/passwd for xcompat acceptance test (#3050)

* HDDS-6273. Amend doc SecuringTDE.md (#3047)

* HDDS-6140. Selective checks: skip unit check for integration-test changes (#2948)

* HDDS-6215. Recon get limited delta updates from OM (#3009)

* HDDS-6125. Recon get limited delta updates from OM

* HDDS-6215. Fix unit test

* trigger new CI check

* HDDS-6215. Fix typo

* trigger new CI check

Co-authored-by: Symious <yiyang0203@gmail.com>

* HDDS-6226. Run tests for selective CI checks in CI (#3019)

* HDDS-6247. Avoid logging stack trace for user input problems (#3039)

* HDDS-6208. New checkstyle: WhitespaceAround (#3003)

* HDDS-6289. Upgrade acceptance test log flooded with parse error (#3063)

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

* Trigger Build

* Fix integration test for added configuation field for selecting OmTransport for s3 gateway - TestOzoneConfigurationFields (added config key not in xml).

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
Co-authored-by: Lokesh Jain <ljain@apache.org>
Co-authored-by: Aswin Shakil Balasubramanian <aswinshakilbalu@gmail.com>
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
Co-authored-by: Symious <yiyang0203@foxmail.com>
Co-authored-by: Bharat Viswanadham <bharat@apache.org>
Co-authored-by: Stephen O'Donnell <stephen.odonnell@gmail.com>
Co-authored-by: GeorgeJahad <github@blackbirdsystems.net>
Co-authored-by: Siddhant Sangwan <siddhantsangwan027@gmail.com>
Co-authored-by: Jyotinder Singh <jyotindrsingh@gmail.com>
Co-authored-by: Shashikant Banerjee <shashikant@apache.org>
Co-authored-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org>
Co-authored-by: Kiyoshi Mizumaru <kiyoshi.mizumaru@gmail.com>
Co-authored-by: Ritesh H Shukla <kerneltime@gmail.com>
Co-authored-by: Nibiru <axcsd3692@qq.com>
Co-authored-by: Kaijie Chen <chen@kaijie.org>
Co-authored-by: Zita Dombi <50611074+dombizita@users.noreply.github.com>
Co-authored-by: Istvan Fajth <pifta@cloudera.com>
Co-authored-by: steinsgateted <71066027+steinsgateted@users.noreply.github.com>
Co-authored-by: Gui Hecheng <markgui@tencent.com>
Co-authored-by: Jackson Yao <jacksonyao@tencent.com>
Co-authored-by: Keyi Song <72794035+sky76093016@users.noreply.github.com>
Co-authored-by: UENISHI Kota <kuenishi@users.noreply.github.com>
Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
Co-authored-by: Symious <yiyang0203@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants