Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] historical fast restart by lazy load columns metadata(20X faster) #6988

Merged
merged 9 commits into from
Dec 3, 2019

Conversation

pzhdfy
Copy link
Contributor

@pzhdfy pzhdfy commented Feb 2, 2019

We have large data in druid, historical (12 * 2T SATA HDD) will have over 100k segments and 10TB size.
When we want to restart a historical to change configuration or after OOM, it will take 40 minutes , that is too slow.

We profile the restart progress, and make a flame graph.
image
We can see io.druid.segment.IndexIO$V9IndexLoader.deserializeColumn cost most time.
It was because HDD has only 100 IOPS, and segment often has over 100 columns, so it makes too many random I/O, the disk util gets 100%.

So we can make columns metadata lazy load , until it gets first used.

After optimize, historical restart will only spend 2 minutes( 20X faster).

And the flame graph after optimize is below, we can see load metadata spend little time.
image

we add a new config druid.segmentCache.lazyLoadOnStart (default is false), whether to do this optimize.
We suggest to open this on HDD historicals, while SDD historical is fast enough.

@pzhdfy pzhdfy changed the title [Improvement] historical fast restart by lazy load columns metadata [Improvement] historical fast restart by lazy load columns metadata(20X faster) Feb 2, 2019
@jihoonson
Copy link
Contributor

Hi @pzhdfy, have you had a chance to test druid.segmentCache.numBootstrapThreads option (http://druid.io/docs/latest/configuration/index.html#storing-segments)? It's to set # of segments to load concurrently from local storage at startup. I wonder how different its effect is from the lazy loading of column metadata.

@clintropolis
Copy link
Member

Interesting idea, I too have felt the pain of multi-day rollouts for tiers of densely loaded historical servers.

I've only scanned changes so far, I'll try to do a full review later. In the mean time, if possible could you run query benchmarks before this patch and after this patch with lazyLoadOnStart = false, as a sanity check to make sure that there are not any odd performance effects of introducing the memoized suppliers? I don't really expect any noticeable effect, but it would make buy in to this change easier. (No worries if you don't have a chance to get to it, i'll try to do this myself once I get around to full review).

I would also be interested in the performance cost for queries that do have to eat the deserialization when lazyLoadOnStart = true, and whether it opens up scenarios where the memoizing supplier becomes a sort of choke point for the processing pool if the cost is high.

@gianm
Copy link
Contributor

gianm commented Feb 6, 2019

IMO a lazy loading option is nice for very dense historical nodes. So I am supportive of this idea. It should be off by default, though, since it defers work from startup to query time, and for a medium/low-density historical node you'd prefer to do that work at startup to keep queries fast.

@pzhdfy
Copy link
Contributor Author

pzhdfy commented Feb 14, 2019

historical size: 100k segments and 10TB size
Each time, We will drop the page cache

1. without this patch

1) druid.segmentCache.numBootstrapThreads = 1
40min
2) druid.segmentCache.numBootstrapThreads = 10
also 40 min, because when numBootstrapThreads=1, reading all columns metadata has cost 100% disk util, setting numBootstrapThreads a higher number makes no sense

2. with this patch and lazyLoadOnStart = false
the result is very similar with scenario 1,there are not any odd performance effects, becauce we don't use memoized suppliers when lazyLoadOnStart = false

3. with this patch and lazyLoadOnStart = true
1) druid.segmentCache.numBootstrapThreads = 1
8min
2) druid.segmentCache.numBootstrapThreads = 10
2 min, 4 times faster than numBootstrapThreads = 1, because when numBootstrapThreads=1 and lazyLoadOnStart = true , we don't read all columns metadata , disk util will less than 100%, setting numBootstrapThreads a higher number will benefit

@kaijianding
Copy link
Contributor

kaijianding commented Feb 14, 2019

@pzhdfy I did similar lazy load thing in my code base, but I found there is a consequence. If historical node is force killed(kill -9) when it is asked to download new segment, it is very likely unzip failure and left a corrupted segment folder. When lazy load=false, this segment can be ignored during historical startup(loading will fail), but when lazy load=true, this corruption is only known when a query comes in. Currently there is no interface to tell SegmentLoaderLocalCacheManager to unload this segment or reload this segment, thus queries always fail. Currently the only solution is to delete this segment folder and restart historical node to ignore this segment(because folder is gone).

Better introduce unload/reload interface together in this PR.

@stale
Copy link

stale bot commented Apr 15, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Apr 15, 2019
@bputt
Copy link

bputt commented Apr 15, 2019

commenting as i'm not sure we want this to go stale or not?

@stale stale bot removed the stale label Apr 15, 2019
@jihoonson
Copy link
Contributor

@bputt thanks for commenting! I think this PR is worth. Would you please fix the conflicts?

Copy link
Contributor

@himanshug himanshug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, feature sounds useful.
I think, in your case, Historicals don't have enough memory to have all segments in page cache or else on process restart much disk activity shouldn't be expected because segment data would be in the page cache. if you do have enough memory then the behavior described in PR description looks suspicious and maybe there is something else going on.

@JackyYangPassion
Copy link

JackyYangPassion commented May 14, 2019

Recently,I want to apply this PR to 0.12.1;but have many conflicts;

error: indexing-service/src/test/java/io/druid/indexing/common/task/CompactionTaskTest.java: does not match index error: patch failed: indexing-service/src/test/java/io/druid/indexing/common/task/SameIntervalMergeTaskTest.java:230 error: indexing-service/src/test/java/io/druid/indexing/common/task/SameIntervalMergeTaskTest.java: patch does not apply error: processing/src/main/java/io/druid/segment/IndexIO.java: does not match index error: processing/src/main/java/io/druid/segment/SimpleQueryableIndex.java: does not match index error: patch failed: processing/src/main/java/io/druid/segment/loading/MMappedQueryableSegmentizerFactory.java:42 error: processing/src/main/java/io/druid/segment/loading/MMappedQueryableSegmentizerFactory.java: patch does not apply error: processing/src/main/java/io/druid/segment/loading/SegmentizerFactory.java: does not match index error: patch failed: server/src/main/java/io/druid/segment/loading/SegmentLoader.java:29 error: server/src/main/java/io/druid/segment/loading/SegmentLoader.java: patch does not apply error: server/src/main/java/io/druid/segment/loading/SegmentLoaderConfig.java: does not match index error: server/src/main/java/io/druid/segment/loading/SegmentLoaderLocalCacheManager.java: does not match index error: server/src/main/java/io/druid/server/SegmentManager.java: does not match index error: server/src/main/java/io/druid/server/coordination/SegmentLoadDropHandler.java: does not match index error: patch failed: server/src/test/java/io/druid/segment/loading/CacheTestSegmentLoader.java:48 error: server/src/test/java/io/druid/segment/loading/CacheTestSegmentLoader.java: patch does not apply error: server/src/test/java/io/druid/server/SegmentManagerTest.java: does not match index error: server/src/test/java/io/druid/server/coordination/ServerManagerTest.java: does not match index Patch failed at 0001 historical fast restart by lazy load columns metadata Use 'git am --show-current-patch' to see the failed patch

I want to Know which point fork from for the historical_fast_restart branch?
I have seen the git log the branch historical_fast_restart checkout from historical_fast_restart

| * 6f964e148 (HEAD -> historical_fast_restart, origin/historical_fast_restart) historical fast restart by lazy load columns metadata |/
I find the base branch from druid-hll 0.13.0-incubating

@pzhdfy
Copy link
Contributor Author

pzhdfy commented May 15, 2019

@JackyYangPassion
yes, this is for 0.13+, with package name 'org.apache.druid',
you can modify the code in 0.12 manually

@JackyYangPassion
Copy link

@pzhdfy yes! I have manually this PR for 0.12.1 ;

@JackyYangPassion
Copy link

apply this pr to 0.12.1
https://github.com/JackyYangPassion/incubator-druid/tree/0.12.1-fast-restart-historical

@RofiYao
Copy link

RofiYao commented Jul 15, 2019

I apply it to 0.12.3, but inoperative. I just apply and restart on one historical node of cluster.
Need I apply it on all nodes and restart?

@pzhdfy
Copy link
Contributor Author

pzhdfy commented Jul 17, 2019

I apply it to 0.12.3, but inoperative. I just apply and restart on one historical node of cluster.
Need I apply it on all nodes and restart?

Just on historical.

@capistrant
Copy link
Contributor

@pzhdfy does this change push the work to query time for the first time a segment is queried only? Or is this metadata loading pushed to query time for every query run against segments instead of only at historical startup?

@pzhdfy
Copy link
Contributor Author

pzhdfy commented Sep 4, 2019

@pzhdfy does this change push the work to query time for the first time a segment is queried only? Or is this metadata loading pushed to query time for every query run against segments instead of only at historical startup?

just push the work to query time for the first time a segment is queried only

@capistrant
Copy link
Contributor

@clintropolis @gianm @pzhdfy We love this patch. It makes our lives much easier when rolling out changes and running through upgrades.

@pzhdfy I did similar lazy load thing in my code base, but I found there is a consequence. If historical node is force killed(kill -9) when it is asked to download new segment, it is very likely unzip failure and left a corrupted segment folder. When lazy load=false, this segment can be ignored during historical startup(loading will fail), but when lazy load=true, this corruption is only known when a query comes in. Currently there is no interface to tell SegmentLoaderLocalCacheManager to unload this segment or reload this segment, thus queries always fail. Currently the only solution is to delete this segment folder and restart historical node to ignore this segment(because folder is gone).

Better introduce unload/reload interface together in this PR.

I am nervous about this though. As of now we have accepted the risk and are willing to intervene when needed. Do you have any plans of addressing this comment in this PR or would this be an additional PR. I do think that it would get more support if we did analysis on this and solved it if it is indeed a problem.

Interesting idea, I too have felt the pain of multi-day rollouts for tiers of densely loaded historical servers.

I've only scanned changes so far, I'll try to do a full review later. In the mean time, if possible could you run query benchmarks before this patch and after this patch with lazyLoadOnStart = false, as a sanity check to make sure that there are not any odd performance effects of introducing the memoized suppliers? I don't really expect any noticeable effect, but it would make buy in to this change easier. (No worries if you don't have a chance to get to it, i'll try to do this myself once I get around to full review).

I would also be interested in the performance cost for queries that do have to eat the deserialization when lazyLoadOnStart = true, and whether it opens up scenarios where the memoizing supplier becomes a sort of choke point for the processing pool if the cost is high.

Regarding Clint's comment on performance. We have been running it for nearly a week on a cluster with hundreds of thousands of segments and hundreds of thousands of queries a day and our metrics collection shows negligible change in performance (but this is vs druid 11. If druid 15 had large performance gains across the board before this patch there could be a bigger change). We added this as a part of our upgrade to druid 15 though so we have not seen performance with it off in druid 15, just in druid 11.

Regardless, we are eager to get this reviewed and accepted upstream. Our experience with it so far has been great and we'd love to help in whatever way possible to get it merged.

@capistrant
Copy link
Contributor

Throwing another comment at this to make sure it doesn't get marked as stale. We continue to run in production with no apparent issues so far (cluster with 100s of thousands of segments). Allowed us to complete a rolling prod upgrade of 70+ historical nodes in a single day which was not possible for us before. If there is anything that @mohammadjkhan or I can do to help get this moving forward again, let us know!

@clintropolis
Copy link
Member

Hi @pzhdfy, my apologies for totally forgetting about this PR. Could you please fix the conflicts and address the comments @himanshug had, at least about adding this to the documentation (I think here would be most appropriate), so we can try to get this merged? Overall lgtm 👍

@pzhdfy
Copy link
Contributor Author

pzhdfy commented Nov 22, 2019

Hi @pzhdfy, my apologies for totally forgetting about this PR. Could you please fix the conflicts and address the comments @himanshug had, at least about adding this to the documentation (I think here would be most appropriate), so we can try to get this merged? Overall lgtm 👍

ok, I will fix conflicts and add documentation later

@pzhdfy
Copy link
Contributor Author

pzhdfy commented Nov 23, 2019

@clintropolis done

@clintropolis
Copy link
Member

I did some testing with the latest version of this PR and everything is still working me. I also noticed that segment metadata queries performed by the broker to collect schema information for Druid SQL will likely do a fair bit of the work of loading the segments after the historical initializes, by lucky side-effect. Though I suspect that will not cover all replicas, it should definitely lessen the impact of using lazy loading I think.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks 👍

@himanshug
Copy link
Contributor

@pzhdfy thanks for the updates, I should be able to review it early next week. however , please don't force push the changes once a PR review is started or else it is difficult for me , as reviewer, to see what changed since I last reviewed .

@himanshug
Copy link
Contributor

@zhanglistar
Copy link

great!!!

@zhangyue19921010
Copy link
Contributor

Hi there. I have made a PR #10688. Maybe can solve the little shortcomings when historical node restart lazily. Please let me know if you have any questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet