New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: introduce sub-chunks to erasure code plugin interface #15193
Conversation
jenkins test this please |
retest this please |
this needs to be rebased on latest master, please! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two main things are:
- i think a vector will work better than list for these since it'll avoid hammer on the allocator
- whitespace cleanup
- rebase!
Thanks!
src/erasure-code/ErasureCode.cc
Outdated
@@ -60,6 +60,28 @@ int ErasureCode::minimum_to_decode(const set<int> &want_to_read, | |||
return 0; | |||
} | |||
|
|||
int ErasureCode::minimum_to_decode2(const set<int> &want_to_read, | |||
const set<int> &available_chunks, | |||
map<int, list<pair<int, int>>> *minimum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we replace the list<pair<int,int>> with vector<pair<int,int>>? it'll be a zillion times faster to avoid the memory allocations.
src/erasure-code/ErasureCode.cc
Outdated
default_subchunks.push_back(make_pair(0,get_sub_chunk_count())); | ||
|
||
|
||
for(set<int>::iterator i=minimum_shard_ids.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (auto id : minimum_shard_ids) {
src/erasure-code/ErasureCode.cc
Outdated
|
||
if ( r != 0) return r; | ||
|
||
list<pair<int, int>> default_subchunks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector
src/erasure-code/ErasureCode.cc
Outdated
set<int> minimum_shard_ids; | ||
int r = minimum_to_decode(want_to_read, available_chunks, &minimum_shard_ids); | ||
|
||
if ( r != 0) return r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please follow CodingStyle... in this case, no blank like after 'int r = ...', no space after 'if (', braces and newlines around 'return r'
src/erasure-code/ErasureCode.h
Outdated
|
||
virtual int minimum_to_decode2(const set<int> &want_to_read, | ||
const set<int> &available, | ||
map<int, list<pair<int,int>>> *minimum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list -> vector throughout this patch...
src/osd/ECBackend.cc
Outdated
j->get<1>(), | ||
bl, j->get<2>(), | ||
true); // Allow EIO return | ||
if((op.subchunks.find(i->first)->second.size() == 1) && (op.subchunks.find(i->first)->second.front().second == ec_impl->get_sub_chunk_count())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please clean up the whitespace here (space after if, 80 chars per line), etc.
also use the fancy for loop syntax 'for (auto myvar : thecontainer)' where it helps readability
src/osd/ECMsgTypes.cc
Outdated
::encode(from, bl); | ||
::encode(tid, bl); | ||
::encode(to_read, bl); | ||
::encode(attrs_to_read, bl); | ||
::encode(subchunks,bl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space between , and bl
src/osd/ECMsgTypes.cc
Outdated
@@ -210,10 +212,17 @@ void ECSubRead::decode(bufferlist::iterator &bl) | |||
} | |||
to_read[m->first] = tlist; | |||
} | |||
} else { | |||
} else{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep space between else and {
src/osd/ECMsgTypes.cc
Outdated
::decode(to_read, bl); | ||
} | ||
::decode(attrs_to_read, bl); | ||
if((struct_v > 2) && (struct_v > struct_compat) ){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after if, inner parens are unnecessary
src/osd/ECMsgTypes.cc
Outdated
::decode(subchunks, bl); | ||
} else { | ||
for(set<hobject_t>::iterator i = attrs_to_read.begin(); i != attrs_to_read.end(); ++i) { | ||
subchunks[*i].push_back(make_pair(0,1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 0,1 means a single subchunk covering the whole thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
aa321cf
to
8726993
Compare
Hi Sage, I updated the changes requested in the review. Thanks, |
see rebased version at #17428 |
649ae79
to
7c96329
Compare
Hi Sage,
I updated the code in this current PR to fix the crash seen there. I don't
have push permissions to the branch of new PR (#17428). Is there a way I
can trigger the tests or run them locally?
Thanks
Myna.
…On Sat, Sep 2, 2017 at 2:50 AM, Sage Weil ***@***.***> wrote:
see rebased version at #17428 <#17428>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15193 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGxnvW2jPrdmqGcXeFaz-nQoKRJYZNAsks5seHUygaJpZM4Nh3ie>
.
|
I'll queue up another test run, thanks! |
@markhpc can you do an a/b test on this vs master with an ec workload (smallish writes preferably)? |
With HDDs this PR worked fine and may have provided a slight performance advantage (1-2%). When testing with NVMe, fio regularly locked up. Eventually I was able to get a partial trace from the fio executable: home/perf/src/markhpc/ceph/src/common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f51b7fff700 time 2017-09-20 15:53:49.073369 /home/perf/src/markhpc/ceph/src/common/Mutex.cc: 110: FAILED assert(r == 0) ceph version 10.0.4-29492-ga72ed6e (a72ed6e642521735b3faeeeccf56835a57e3d43f) mimic (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f51e27e5ee0] 2: (Mutex::Lock(bool)+0x1a4) [0x7f51e27b86e4] 3: (()+0x11f550) [0x7f51ec73e550] 4: /home/ubuntu/src/fio/fio() [0x463c54] 5: (()+0x11eb06) [0x7f51ec73db06] 6: (()+0x11fbbf) [0x7f51ec73ebbf] 7: (()+0x136dbd) [0x7f51ec755dbd] 8: (()+0x12224d) [0x7f51ec74124d] 9: (()+0x5cc39) [0x7f51ec67bc39] 10: (()+0x132b57) [0x7f51ec751b57] 11: (librados::C_AioComplete::finish(int)+0x22) [0x7f51ec363672] 12: (Context::complete(int)+0x9) [0x7f51ec341079] 13: (Finisher::finisher_thread_entry()+0x198) [0x7f51e27e3538] 14: (()+0x7dc5) [0x7f51eb787dc5] 15: (clone()+0x6d) [0x7f51eb2b273d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. I will attempt to reproduce with master as it may not be the fault of this PR. Edit: I can confirm that the above issue is happening with master, so not this PR's fault. This will block NVMe testing until we fix though. |
Hi Mark, Is the NVMe issue fixed in master now ? |
with new decode, minimum_to_decode in ErasureCodeInterface. Updated ECBackend, ECUtil to use the new functions.Fixed the test cases to use the new functions. Fixed the review comments. Authors: Myna, Elita. Signed-off-by: Myna Vajha <mynaramana@gmail.com>
…efore it asserts for non-zeros size. Authors: Myna, Elita. Signed-off-by: Myna Vajha <mynaramana@gmail.com>
src/osd/ECUtil.cc
Outdated
@@ -71,28 +68,57 @@ int ECUtil::decode( | |||
need.insert(i->first); | |||
} | |||
|
|||
for (uint64_t i = 0; i < total_data_size; i += sinfo.get_chunk_size()) { | |||
set<int> avail; | |||
for (auto i = to_decode.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, use range-based for loop if you please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mynaramana could you address this comment also?
src/osd/ECUtil.cc
Outdated
int repair_data_per_chunk; | ||
int subchunk_size = sinfo.get_chunk_size()/ec_impl->get_sub_chunk_count(); | ||
|
||
for(auto i=to_decode.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space after for
.
src/osd/ECUtil.cc
Outdated
for(auto i=to_decode.begin(); | ||
i != to_decode.end(); | ||
++i) { | ||
if(min.find(i->first) == min.end()) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space after if
.
src/osd/ECUtil.cc
Outdated
repair_subchunk_count += j->second; | ||
} | ||
repair_data_per_chunk = repair_subchunk_count*subchunk_size; | ||
chunks_count = (int) i->second.length() / repair_data_per_chunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the space after (int)
.
src/osd/ECUtil.cc
Outdated
j != min[i->first].end(); ++j) { | ||
repair_subchunk_count += j->second; | ||
} | ||
repair_data_per_chunk = repair_subchunk_count*subchunk_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add spaces around *
.
Hi Kefu, I updated the code according to the review comments. |
src/osd/ECUtil.cc
Outdated
@@ -71,28 +68,57 @@ int ECUtil::decode( | |||
need.insert(i->first); | |||
} | |||
|
|||
for (uint64_t i = 0; i < total_data_size; i += sinfo.get_chunk_size()) { | |||
set<int> avail; | |||
for (auto i = to_decode.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mynaramana could you address this comment also?
src/osd/ECMsgTypes.cc
Outdated
@@ -214,6 +216,13 @@ void ECSubRead::decode(bufferlist::iterator &bl) | |||
::decode(to_read, bl); | |||
} | |||
::decode(attrs_to_read, bl); | |||
if (struct_v > 2 && struct_v > struct_compat) { | |||
::decode(subchunks, bl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong indent.
src/osd/ECMsgTypes.cc
Outdated
if (struct_v > 2 && struct_v > struct_compat) { | ||
::decode(subchunks, bl); | ||
} else { | ||
for (auto &&i:attrs_to_read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add spaces around :
.
src/osd/ECMsgTypes.cc
Outdated
::decode(subchunks, bl); | ||
} else { | ||
for (auto &&i:attrs_to_read) { | ||
subchunks[i].push_back(make_pair(0,1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space before ,
.
src/osd/ECUtil.cc
Outdated
if (min.find(i.first) == min.end()) continue; | ||
else { | ||
int repair_subchunk_count = 0; | ||
for (auto j = min[i.first].begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use range-based loop here also.
src/osd/ECUtil.cc
Outdated
i != out.end(); | ||
++i) { | ||
assert(i->second->length() == total_data_size); | ||
for (auto i = out.begin(); i != out.end(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you switch to range-based for loop, since you are at here.
src/osd/ECUtil.cc
Outdated
} | ||
|
||
map<int, vector<pair<int, int>>> min; | ||
assert(ec_impl->minimum_to_decode(need, avail, &min) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mynaramana we should not rely on the side effect in assert()
, in future, assert()
in Ceph could be optimized out if NDEBUG
is defined.
src/erasure-code/ErasureCode.cc
Outdated
} | ||
vector<pair<int, int>> default_subchunks; | ||
default_subchunks.push_back(make_pair(0, get_sub_chunk_count())); | ||
for(auto &&id:minimum_shard_ids){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add spaces around :
src/osd/ECUtil.cc
Outdated
int subchunk_size = sinfo.get_chunk_size()/ec_impl->get_sub_chunk_count(); | ||
|
||
for (auto &&i : to_decode) { | ||
if (min.find(i.first) == min.end()) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, this is an anti-pattern:
- find and throw away the returned iterator,
- use
operator[]
to locate the element to find again.
also, j
deserves a better name. could you change it to something like:
auto found = min.find(i.first);
if (found != min.end()) {
int repair_subchunk_count = 0;
for (auto& subchunks : found) {
repair_subchunk_count += subchunks.second;
}
break;
}
@mynaramana the rados run looks good. most of my comments are just formatting and code style related issues , but the |
src/osd/ECUtil.cc
Outdated
} | ||
|
||
map<int, vector<pair<int, int>>> min; | ||
assert(ec_impl->minimum_to_decode(need, avail, &min) == 0); | ||
int r = ec_impl->minimum_to_decode(need, avail, &min) == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r
is a boolean casted to int, and it should always be 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated this!
@mynaramana looks great! could you squash the changes addressing review comments into the related commits respectively? |
…data than needed. Made all the helper information uniform across all helper nodes. Authors: Myna, Elita. Signed-off-by: Myna Vajha <mynaramana@gmail.com>
the failed test is unrelated, i will take a look at it tmr. |
Introducing sub-chunks. Added functions decode2, minimum_to_decode2 to the ErasureCodeInterface.
Updated ECBackend, ECUtil to use the new functions.
Authors: Myna, Elita
Related PR: 14300.
Fixes: http://tracker.ceph.com/issues/19278
Signed-off-by: Myna Vajha mynaramana@gmail.com