add async or sync for :entry location index's rocksdb write #3103

StevenLuMT · 2022-03-13T06:11:48Z

Descriptions of the changes in this PR:

Motivation

bookie timed to flush latency mainly composed of three pieces:
1. flush-entrylog: it's the latency for flushing entrylog
2. flush-locations-index: it's the latency for flushing entrly location index, use sync mode to flush
3. flush-ledger-index: it's the latency for flushing ledger metadata index, this index(LedgerMetadataIndex) use async mode to flush

reduce entry location index flush latency to reduce bookie's flush latency,
so add async mode for entry location index's write:
1. default sync mode: not change the original logic
2. async mode : need update config to open, this mode is to speed up writing, this mode is the same as bookie's another index: LedgerMetadataIndex

sync is different from async：

sync mode:
1. create a batch;
4. add msg to the batch
5. call method(batch.flush) to flush the batch
sync mode:
1. just call method(locationsDb.sync) to write the data
2. the rocksdb server will be timed to flush the data

Changes

add async or sync for :location rocksdb write
add switch to open this feature, default not open, use default sync

StevenLuMT · 2022-03-13T06:14:04Z

@dlg99 @eolivelli @pkumar-singh @zymap @hangc0276 @lordcheng10 @merlimat
If you have time, please help me review it, thank you.

dlg99 · 2022-03-15T23:35:49Z

@hangc0276 @merlimat please take a look

lordcheng10

LGTM

eolivelli

You are on your way!
I left a suggestion for the configuration entry.

We must also add comprehensive tests

bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java

StevenLuMT · 2022-03-25T07:46:01Z

You are on your way! I left a suggestion for the configuration entry.

We must also add comprehensive tests

ok,thanks for reviewed, I will add testcases for this function

StevenLuMT · 2022-06-21T08:41:07Z

You are on your way! I left a suggestion for the configuration entry.

We must also add comprehensive tests

yeah, I have added two testcase: EntryLocationIndexAsyncTest/EntryLocationIndexSyncTest @eolivelli

...keeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/EntryLocationIndex.java

eolivelli

LGTM

merlimat · 2022-06-22T16:09:38Z

I'm not sure I understand why we need to have 2 different modes here. Do both methods provide the same guarantees or not?

StevenLuMT · 2022-06-23T01:13:46Z

I'm not sure I understand why we need to have 2 different modes here. Do both methods provide the same guarantees or not?

In the case of multiple replications, when the user pursues the writing speed and does not need to guarantee the success rate of each replication, the asynchronous writing method like LedgerMetadataIndex gives the user more choices @merlimat

merlimat · 2022-06-23T02:38:21Z

I still cannot see a case where flushing the entry location index (ledgerId, entryId, offset) becomes the bottleneck, compared to:

Journal
Entry log flush (where all the data has to be put on disk)

If it is not the bottleneck, then why reduce the guarantees? (eg: if we miss the async update, we have effectively lost data in the bookie).

StevenLuMT · 2022-06-23T07:13:07Z

I still cannot see a case where flushing the entry location index (ledgerId, entryId, offset) becomes the bottleneck, compared to:

Journal

Entry log flush (where all the data has to be put on disk)

If it is not the bottleneck, then why reduce the guarantees? (eg: if we miss the async update, we have effectively lost data in the bookie).

yes, in the step : entrylog flush
In order to speed up the flush, we only need to ensure that the EntryLocationIndex is successfully written under normal circumstances, regardless of machine downtime.
Like LedgerMetadataIndex, it also uses an asynchronous method to write rocksdb @merlimat

merlimat · 2022-06-23T18:15:32Z

In order to speed up the flush, we only need to ensure that the EntryLocationIndex is successfully written under normal circumstances, regardless of machine downtime.

I'm not sure I follow here.

My point is that if flushing entry logs takes 90% and RocksDB takes 10% of time (I made up these number here), then adding the risk of losing data to shave time on the 10% portion doesn't make much sense.

StevenLuMT · 2022-06-24T02:10:05Z

ok,I understand what you mean, let me collect the time and proportion of the three parts of flush @merlimat

eolivelli

what happens in case of unclean shutdown ?
will we lose some entries in the index ?

2.add async testcase

StevenLuMT · 2022-09-25T10:09:36Z

rerun failure checks

StevenLuMT mentioned this pull request Mar 13, 2022

add async or sync for :entry location index's rocksdb write #3036

Closed

dlg99 requested a review from merlimat March 15, 2022 23:34

lordcheng10 approved these changes Mar 18, 2022

View reviewed changes

eolivelli requested changes Mar 25, 2022

View reviewed changes

bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java Outdated Show resolved Hide resolved

StevenLuMT force-pushed the master_locationASyncV2 branch from 3e33c5c to a345c80 Compare June 21, 2022 06:59

StevenLuMT requested a review from eolivelli June 21, 2022 10:54

eolivelli reviewed Jun 22, 2022

View reviewed changes

...keeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/EntryLocationIndex.java Outdated Show resolved Hide resolved

StevenLuMT requested a review from eolivelli June 22, 2022 08:31

eolivelli approved these changes Jun 22, 2022

View reviewed changes

StevenLuMT force-pushed the master_locationASyncV2 branch 2 times, most recently from 74b825b to 4ba94b7 Compare August 16, 2022 12:51

StevenLuMT force-pushed the master_locationASyncV2 branch from 4ba94b7 to 5cf1d53 Compare August 19, 2022 06:18

eolivelli reviewed Aug 19, 2022

View reviewed changes

lushiji added 7 commits August 20, 2022 01:04

add async or sync for :entry location index write

9f5ff46

change locationsDb to protected

f8ea9cc

format code

5168ba2

clean mistake added code

19f2b0c

1.rename switch to dbLedgerLocationIndexSyncEnable

e075138

2.add async testcase

fix checkstyle

bff4c71

clear merged uncleaned code

32de6cf

lushiji added 2 commits August 20, 2022 01:04

checkstyle fix

13c6c25

rebase newest code from master

6fdba80

StevenLuMT force-pushed the master_locationASyncV2 branch from 5cf1d53 to 6fdba80 Compare August 19, 2022 17:05

StevenLuMT closed this Aug 20, 2022

StevenLuMT reopened this Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add async or sync for :entry location index's rocksdb write #3103

add async or sync for :entry location index's rocksdb write #3103

StevenLuMT commented Mar 13, 2022 •

edited

StevenLuMT commented Mar 13, 2022 •

edited

dlg99 commented Mar 15, 2022

lordcheng10 left a comment

eolivelli left a comment

StevenLuMT commented Mar 25, 2022

StevenLuMT commented Jun 21, 2022

eolivelli left a comment

merlimat commented Jun 22, 2022

StevenLuMT commented Jun 23, 2022

merlimat commented Jun 23, 2022

StevenLuMT commented Jun 23, 2022

merlimat commented Jun 23, 2022

StevenLuMT commented Jun 24, 2022

eolivelli left a comment

StevenLuMT commented Sep 25, 2022

add async or sync for :entry location index's rocksdb write #3103

Are you sure you want to change the base?

add async or sync for :entry location index's rocksdb write #3103

Conversation

StevenLuMT commented Mar 13, 2022 • edited

Motivation

Changes

StevenLuMT commented Mar 13, 2022 • edited

dlg99 commented Mar 15, 2022

lordcheng10 left a comment

Choose a reason for hiding this comment

eolivelli left a comment

Choose a reason for hiding this comment

StevenLuMT commented Mar 25, 2022

StevenLuMT commented Jun 21, 2022

eolivelli left a comment

Choose a reason for hiding this comment

merlimat commented Jun 22, 2022

StevenLuMT commented Jun 23, 2022

merlimat commented Jun 23, 2022

StevenLuMT commented Jun 23, 2022

merlimat commented Jun 23, 2022

StevenLuMT commented Jun 24, 2022

eolivelli left a comment

Choose a reason for hiding this comment

StevenLuMT commented Sep 25, 2022

StevenLuMT commented Mar 13, 2022 •

edited

StevenLuMT commented Mar 13, 2022 •

edited