[RFC] osd: ec partial stripe reads #23138

thinkercui · 2018-07-20T01:40:39Z

If a read is limited at one stripe and all the original data
chunks are available. We will only read the chunks that contain
range <off, len> to reduce the read latency and bandwidth.

If the read is fast read or not all the original data chunks are
available we will do a normal read.

This idea comes from http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/23872.

Signed-off-by: Xiaofei Cui cuixiaofei@sangfor.com.cn

thinkercui · 2018-07-20T09:30:06Z

retest this please

gregsfortytwo · 2018-07-23T23:46:49Z

This doesn't seem right:

it's open-coding the striping strategy without any reference to the erasure code plugin in use?
don't we already read only the necessary shards by asking the EC plugin what the minimum-to-decode is?

Maybe we still do a full-shard read and I've made up that we're efficient about that, but it seems like extending the interfaces around minimum-to-decode is the way to go here.

If a read is limited at one stripe and all the original data chunks are available. We will only read the chunks that contain range <off, len> to reduce the read latency and bandwidth. If the read is fast read or not all the original data chunks are available we will do a normal read. Signed-off-by: Xiaofei Cui <cuixiaofei@sangfor.com.cn>

thinkercui · 2018-07-24T12:00:47Z

Thanks for your review. @gregsfortytwo

This is for all the erasure code plugin but not for some one.
We need to tell what shards we want before we ask the EC plugin what the minimum-to-decode is.
The less shards we want the less shards we need to read. The main point of this commit is to reduce
want to read shards.

Do I get the point what you mean? Thanks.

gregsfortytwo · 2018-07-24T17:41:47Z

Maybe you'd better write some docs explaining the strategy then. When I skim this, it looks like you're trying to map from logical data offsets to which chunks to read, in the OSD code, without reference to the EC plugin in use. So I think you're just assuming that the EC plugin's data placement is systematic and is a simple linear stream across the objects? That is frequently not the case for more exotic EC codes (some of which we have in-tree).

So it's not that we need to tell the EC plugin what shards we want to read; we have to tell the EC plugin what data offsets we want to read, and let it tell us the minimum number of reads to get that data.

jdurgin · 2018-07-24T18:05:54Z

src/osd/ECBackend.cc

@@ -2295,11 +2323,51 @@ void ECBackend::objects_read_and_reconstruct(
  }

  map<hobject_t, set<int>> obj_want_to_read;
-  set<int> want_to_read;
-  get_want_to_read_shards(&want_to_read);


this seems like a much larger change than necessary... here's all it took to make recovery read the minimum data required: 468ad4b

This modification only relates the read from client. The read of recovery is left as it is. So, don't worry about 468ad4b. And the rop.want_to_read comes from here. We can't do like 468ad4b.

I'm suggesting that the client read path can be similarly simple as the recovery read path. There should not be redundancies with the EC plugin or the recovery code.

EC plugin tells whether it a systematic or not. Partial read is available only when the ec_impl is systematic. Signed-off-by: Xiaofei Cui <cuixiaofei@sangfor.com.cn>

thinkercui · 2018-07-25T09:50:09Z

You are right! I assume that the EC plugin's data placement is systematic and is a simple linear stream across the objects. First, all the EC plugins of ceph are systematic right now. Second, I think the chunk_mapping can make sure the data is a simple linear stream across the objects. In addition to make the code more general, I have made another commit to restrict the partial read to systematic EC plugins.

We really can tell the EC plugin what data offsets we want to read, and let it tell us the minimum number of reads to get that data. But, I think a EC plugin should better only deal with the stripe, but not the whole object, left the work to osd maybe better.

Is there any other suggestion? Thanks. @gregsfortytwo

stale · 2018-10-17T21:54:22Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

stale · 2018-12-19T00:27:08Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

jdurgin · 2019-02-08T22:45:01Z

Closing due to unaddressed comments, please reopen if you're still working on this

Signed-off-by: Mark Nelson <mnelson@redhat.com>

Signed-off-by: Mark Nelson <mark.nelson@clyso.com>

This is a re-implementation of PR ceph#23138 rebased on main with a couple of nitpicky changes to make the code a littler clearer (to me at least). Credit goes to Xiaofei Cui [cuixiaofei@sangfor.com.cn](mailto:cuixiaofei@sangfor.com.cn) for the original implementation. Looking at the original PR's review, it does not appear that we can use the same technique as in ceph@468ad4b. We don't have the ReadOp yet. I'm not sure if @gregsforytwo's idea to query the plugin works, but it's clear we are not doing the efficient thing from the get-go here. Signed-off-by: Mark Nelson <mark.nelson@clyso.com>

This is a re-implementation of PR ceph#23138 rebased on main with a couple of nitpicky changes to make the code a little more clear (to me at least). Credit goes to Xiaofei Cui [cuixiaofei@sangfor.com.cn](mailto:cuixiaofei@sangfor.com.cn) for the original implementation. Looking at the original PR's review, it does not appear that we can use the same technique as in ceph@468ad4b. We don't have the ReadOp yet. I'm not sure if @gregsforytwo's idea to query the plugin works, but it's clear we are not doing the efficient thing from the get-go here. The performance and efficiency benefits for small random reads appears to be quite substantial, especially for large stripe widths. Signed-off-by: Mark Nelson <mark.nelson@clyso.com>

This is a re-implementation of PR ceph#23138 rebased on main with a couple of nitpicky changes to make the code a little more clear (to me at least). Credit goes to Xiaofei Cui [cuixiaofei@sangfor.com.cn](mailto:cuixiaofei@sangfor.com.cn) for the original implementation. Looking at the original PR's review, it does not appear that we can use the same technique as in 468ad4b. We don't have the ReadOp yet. I'm not sure if @gregsforytwo's idea to query the plugin works, but it's clear we are not doing the efficient thing from the get-go here. The performance and efficiency benefits for small random reads appears to be quite substantial, especially for large stripe widths. --- This commit is a further ressurection, this time of the Mark Nelson's work in ceph#52746. It brings it on top of the recent rework of `ECBackend` and addresses review comments. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

This commit is a further ressurection of the EC partial reads concept; this time of the Mark Nelson's work sent as PR ceph#52746. The modifications in this commit are mostly about settling Mark's work on top of the recent rework of `ECBackend` which had shared the EC codebase with the crimson-osd. At the original description says, Mark's work is based on earlier attempt from Xiaofei Cui. Therefore credits go to: * Mark Nelson (Clyso), * Xiaofei Cui (cuixiaofei@sangfor.com.cn). The original commit description is preserved below: > This is a re-implementation of PR ceph#23138 rebased on main with a couple of nitpicky changes to make the code a little more clear (to me at least). Credit goes to Xiaofei Cui [cuixiaofei@sangfor.com.cn](mailto:cuixiaofei@sangfor.com.cn) for the original implementation. > > Looking at the original PR's review, it does not appear that we can use the same technique as in 468ad4b. We don't have the ReadOp yet. I'm not sure if @gregsforytwo's idea to query the plugin works, but it's clear we are not doing the efficient thing from the get-go here. > > The performance and efficiency benefits for small random reads appears to be quite substantial, especially for large stripe widths. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

thinkercui changed the title ~~osd: only read needed chunks for EC read~~ [RFC] osd: ec partial stripe reads Jul 20, 2018

thinkercui force-pushed the feature branch from d7ed65e to c8d1cc2 Compare July 20, 2018 08:27

tchaikov added core performance labels Jul 20, 2018

thinkercui force-pushed the feature branch from c8d1cc2 to 4338cdd Compare July 24, 2018 11:50

jdurgin reviewed Jul 24, 2018

View reviewed changes

osd: restrict the partial read to systematic EC plugin

b3185d9

EC plugin tells whether it a systematic or not. Partial read is available only when the ec_impl is systematic. Signed-off-by: Xiaofei Cui <cuixiaofei@sangfor.com.cn>

stale bot added stale and removed stale labels Oct 17, 2018

stale bot added the stale label Dec 19, 2018

jdurgin closed this Feb 8, 2019

markhpc pushed a commit to markhpc/ceph that referenced this pull request Aug 2, 2023

osd: EC Partial Stripe Reads (Retry of ceph#23138)

d1b7b58

Signed-off-by: Mark Nelson <mnelson@redhat.com>

markhpc pushed a commit to markhpc/ceph that referenced this pull request Aug 2, 2023

osd: EC Partial Stripe Reads (Retry of ceph#23138)

dd135b5

Signed-off-by: Mark Nelson <mark.nelson@clyso.com>

markhpc mentioned this pull request Aug 2, 2023

[RFC] osd: EC Partial Stripe Reads (Retry of #23138) #52746

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] osd: ec partial stripe reads #23138

[RFC] osd: ec partial stripe reads #23138

thinkercui commented Jul 20, 2018

thinkercui commented Jul 20, 2018

gregsfortytwo commented Jul 23, 2018

thinkercui commented Jul 24, 2018

gregsfortytwo commented Jul 24, 2018

jdurgin Jul 24, 2018

thinkercui Jul 25, 2018 •

edited

jdurgin Oct 19, 2018

thinkercui commented Jul 25, 2018

stale bot commented Oct 17, 2018

stale bot commented Dec 19, 2018

jdurgin commented Feb 8, 2019

[RFC] osd: ec partial stripe reads #23138

[RFC] osd: ec partial stripe reads #23138

Conversation

thinkercui commented Jul 20, 2018

thinkercui commented Jul 20, 2018

gregsfortytwo commented Jul 23, 2018

thinkercui commented Jul 24, 2018

gregsfortytwo commented Jul 24, 2018

jdurgin Jul 24, 2018

Choose a reason for hiding this comment

thinkercui Jul 25, 2018 • edited

Choose a reason for hiding this comment

jdurgin Oct 19, 2018

Choose a reason for hiding this comment

thinkercui commented Jul 25, 2018

stale bot commented Oct 17, 2018

stale bot commented Dec 19, 2018

jdurgin commented Feb 8, 2019

thinkercui Jul 25, 2018 •

edited