Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add async or sync for :entry location index's rocksdb write #3103

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

StevenLuMT
Copy link
Contributor

@StevenLuMT StevenLuMT commented Mar 13, 2022

Descriptions of the changes in this PR:

Motivation

bookie timed to flush latency mainly composed of three pieces:
1. flush-entrylog: it's the latency for flushing entrylog
2. flush-locations-index: it's the latency for flushing entrly location index, use sync mode to flush
3. flush-ledger-index: it's the latency for flushing ledger metadata index, this index(LedgerMetadataIndex) use async mode to flush

reduce entry location index flush latency to reduce bookie's flush latency,
so add async mode for entry location index's write:
1. default sync mode: not change the original logic
2. async mode : need update config to open, this mode is to speed up writing, this mode is the same as bookie's another index: LedgerMetadataIndex

sync is different from async:

  • sync mode:
    1. create a batch;
    4. add msg to the batch
    5. call method(batch.flush) to flush the batch

  • sync mode:
    1. just call method(locationsDb.sync) to write the data
    2. the rocksdb server will be timed to flush the data

Changes

  1. add async or sync for :location rocksdb write
  2. add switch to open this feature, default not open, use default sync

@StevenLuMT
Copy link
Contributor Author

StevenLuMT commented Mar 13, 2022

@dlg99 @eolivelli @pkumar-singh @zymap @hangc0276 @lordcheng10 @merlimat
If you have time, please help me review it, thank you.

@dlg99 dlg99 requested a review from merlimat March 15, 2022 23:34
@dlg99
Copy link
Contributor

dlg99 commented Mar 15, 2022

@hangc0276 @merlimat please take a look

Copy link
Contributor

@lordcheng10 lordcheng10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are on your way!
I left a suggestion for the configuration entry.

We must also add comprehensive tests

@StevenLuMT
Copy link
Contributor Author

You are on your way! I left a suggestion for the configuration entry.

We must also add comprehensive tests

ok,thanks for reviewed, I will add testcases for this function

@StevenLuMT
Copy link
Contributor Author

You are on your way! I left a suggestion for the configuration entry.

We must also add comprehensive tests

yeah, I have added two testcase: EntryLocationIndexAsyncTest/EntryLocationIndexSyncTest @eolivelli

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@merlimat
Copy link
Contributor

I'm not sure I understand why we need to have 2 different modes here. Do both methods provide the same guarantees or not?

@StevenLuMT
Copy link
Contributor Author

I'm not sure I understand why we need to have 2 different modes here. Do both methods provide the same guarantees or not?

In the case of multiple replications, when the user pursues the writing speed and does not need to guarantee the success rate of each replication, the asynchronous writing method like LedgerMetadataIndex gives the user more choices @merlimat

@merlimat
Copy link
Contributor

I still cannot see a case where flushing the entry location index (ledgerId, entryId, offset) becomes the bottleneck, compared to:

  1. Journal
  2. Entry log flush (where all the data has to be put on disk)

If it is not the bottleneck, then why reduce the guarantees? (eg: if we miss the async update, we have effectively lost data in the bookie).

@StevenLuMT
Copy link
Contributor Author

I still cannot see a case where flushing the entry location index (ledgerId, entryId, offset) becomes the bottleneck, compared to:

  1. Journal
  2. Entry log flush (where all the data has to be put on disk)

If it is not the bottleneck, then why reduce the guarantees? (eg: if we miss the async update, we have effectively lost data in the bookie).

yes, in the step : entrylog flush
In order to speed up the flush, we only need to ensure that the EntryLocationIndex is successfully written under normal circumstances, regardless of machine downtime.
Like LedgerMetadataIndex, it also uses an asynchronous method to write rocksdb @merlimat

@merlimat
Copy link
Contributor

In order to speed up the flush, we only need to ensure that the EntryLocationIndex is successfully written under normal circumstances, regardless of machine downtime.

I'm not sure I follow here.

My point is that if flushing entry logs takes 90% and RocksDB takes 10% of time (I made up these number here), then adding the risk of losing data to shave time on the 10% portion doesn't make much sense.

@StevenLuMT
Copy link
Contributor Author

ok,I understand what you mean, let me collect the time and proportion of the three parts of flush @merlimat

@StevenLuMT StevenLuMT force-pushed the master_locationASyncV2 branch 2 times, most recently from 74b825b to 4ba94b7 Compare August 16, 2022 12:51
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens in case of unclean shutdown ?
will we lose some entries in the index ?

@StevenLuMT
Copy link
Contributor Author

rerun failure checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants