Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: introduce sub-chunks to erasure code plugin interface #15193

Merged
merged 3 commits into from Nov 1, 2017

Conversation

mynaramana
Copy link
Contributor

@mynaramana mynaramana commented May 22, 2017

Introducing sub-chunks. Added functions decode2, minimum_to_decode2 to the ErasureCodeInterface.
Updated ECBackend, ECUtil to use the new functions.
Authors: Myna, Elita
Related PR: 14300.

Fixes: http://tracker.ceph.com/issues/19278

Signed-off-by: Myna Vajha mynaramana@gmail.com

@joscollin joscollin changed the title Array Codes (Fractional read during repair) osd: Array Codes (Fractional read during repair) May 22, 2017
@liewegas
Copy link
Member

jenkins test this please

@liewegas
Copy link
Member

retest this please

@liewegas
Copy link
Member

this needs to be rebased on latest master, please!

@liewegas liewegas added core and removed needs-qa labels May 23, 2017
@liewegas liewegas changed the title osd: Array Codes (Fractional read during repair) osd: introduce sub-chunks to erasure code plugin interface May 23, 2017
Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main things are:

  • i think a vector will work better than list for these since it'll avoid hammer on the allocator
  • whitespace cleanup
  • rebase!

Thanks!

@@ -60,6 +60,28 @@ int ErasureCode::minimum_to_decode(const set<int> &want_to_read,
return 0;
}

int ErasureCode::minimum_to_decode2(const set<int> &want_to_read,
const set<int> &available_chunks,
map<int, list<pair<int, int>>> *minimum)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we replace the list<pair<int,int>> with vector<pair<int,int>>? it'll be a zillion times faster to avoid the memory allocations.

default_subchunks.push_back(make_pair(0,get_sub_chunk_count()));


for(set<int>::iterator i=minimum_shard_ids.begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (auto id : minimum_shard_ids) {


if ( r != 0) return r;

list<pair<int, int>> default_subchunks;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector

set<int> minimum_shard_ids;
int r = minimum_to_decode(want_to_read, available_chunks, &minimum_shard_ids);

if ( r != 0) return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow CodingStyle... in this case, no blank like after 'int r = ...', no space after 'if (', braces and newlines around 'return r'


virtual int minimum_to_decode2(const set<int> &want_to_read,
const set<int> &available,
map<int, list<pair<int,int>>> *minimum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list -> vector throughout this patch...

j->get<1>(),
bl, j->get<2>(),
true); // Allow EIO return
if((op.subchunks.find(i->first)->second.size() == 1) && (op.subchunks.find(i->first)->second.front().second == ec_impl->get_sub_chunk_count())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clean up the whitespace here (space after if, 80 chars per line), etc.

also use the fancy for loop syntax 'for (auto myvar : thecontainer)' where it helps readability

::encode(from, bl);
::encode(tid, bl);
::encode(to_read, bl);
::encode(attrs_to_read, bl);
::encode(subchunks,bl);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between , and bl

@@ -210,10 +212,17 @@ void ECSubRead::decode(bufferlist::iterator &bl)
}
to_read[m->first] = tlist;
}
} else {
} else{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep space between else and {

::decode(to_read, bl);
}
::decode(attrs_to_read, bl);
if((struct_v > 2) && (struct_v > struct_compat) ){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if, inner parens are unnecessary

::decode(subchunks, bl);
} else {
for(set<hobject_t>::iterator i = attrs_to_read.begin(); i != attrs_to_read.end(); ++i) {
subchunks[*i].push_back(make_pair(0,1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 0,1 means a single subchunk covering the whole thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@mynaramana mynaramana force-pushed the arraycode branch 2 times, most recently from aa321cf to 8726993 Compare May 25, 2017 06:19
@mynaramana
Copy link
Contributor Author

Hi Sage,

I updated the changes requested in the review.

Thanks,
Myna.

@liewegas
Copy link
Member

liewegas commented Sep 1, 2017

see rebased version at #17428

@mynaramana mynaramana force-pushed the arraycode branch 3 times, most recently from 649ae79 to 7c96329 Compare September 2, 2017 10:02
@mynaramana
Copy link
Contributor Author

mynaramana commented Sep 2, 2017 via email

@liewegas
Copy link
Member

liewegas commented Sep 2, 2017

I'll queue up another test run, thanks!

@liewegas
Copy link
Member

liewegas commented Sep 6, 2017

@markhpc can you do an a/b test on this vs master with an ec workload (smallish writes preferably)?

@markhpc markhpc self-assigned this Sep 18, 2017
@markhpc
Copy link
Member

markhpc commented Sep 20, 2017

With HDDs this PR worked fine and may have provided a slight performance advantage (1-2%). When testing with NVMe, fio regularly locked up. Eventually I was able to get a partial trace from the fio executable:

home/perf/src/markhpc/ceph/src/common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f51b7fff700 time 2017-09-20 15:53:49.073369
/home/perf/src/markhpc/ceph/src/common/Mutex.cc: 110: FAILED assert(r == 0)
 ceph version 10.0.4-29492-ga72ed6e (a72ed6e642521735b3faeeeccf56835a57e3d43f) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f51e27e5ee0]
 2: (Mutex::Lock(bool)+0x1a4) [0x7f51e27b86e4]
 3: (()+0x11f550) [0x7f51ec73e550]
 4: /home/ubuntu/src/fio/fio() [0x463c54]
 5: (()+0x11eb06) [0x7f51ec73db06]
 6: (()+0x11fbbf) [0x7f51ec73ebbf]
 7: (()+0x136dbd) [0x7f51ec755dbd]
 8: (()+0x12224d) [0x7f51ec74124d]
 9: (()+0x5cc39) [0x7f51ec67bc39]
 10: (()+0x132b57) [0x7f51ec751b57]
 11: (librados::C_AioComplete::finish(int)+0x22) [0x7f51ec363672]
 12: (Context::complete(int)+0x9) [0x7f51ec341079]
 13: (Finisher::finisher_thread_entry()+0x198) [0x7f51e27e3538]
 14: (()+0x7dc5) [0x7f51eb787dc5]
 15: (clone()+0x6d) [0x7f51eb2b273d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.

I will attempt to reproduce with master as it may not be the fault of this PR.

Edit: I can confirm that the above issue is happening with master, so not this PR's fault. This will block NVMe testing until we fix though.

@mynaramana
Copy link
Contributor Author

Hi Mark, Is the NVMe issue fixed in master now ?

with new decode, minimum_to_decode in ErasureCodeInterface.
Updated ECBackend, ECUtil to use the new functions.Fixed the test cases
to use the new functions.
Fixed the review comments.

Authors: Myna, Elita.

Signed-off-by: Myna Vajha <mynaramana@gmail.com>
…efore

it asserts for non-zeros size.

Authors: Myna, Elita.

Signed-off-by: Myna Vajha <mynaramana@gmail.com>
@@ -71,28 +68,57 @@ int ECUtil::decode(
need.insert(i->first);
}

for (uint64_t i = 0; i < total_data_size; i += sinfo.get_chunk_size()) {
set<int> avail;
for (auto i = to_decode.begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, use range-based for loop if you please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mynaramana could you address this comment also?

int repair_data_per_chunk;
int subchunk_size = sinfo.get_chunk_size()/ec_impl->get_sub_chunk_count();

for(auto i=to_decode.begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after for.

for(auto i=to_decode.begin();
i != to_decode.end();
++i) {
if(min.find(i->first) == min.end()) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after if.

repair_subchunk_count += j->second;
}
repair_data_per_chunk = repair_subchunk_count*subchunk_size;
chunks_count = (int) i->second.length() / repair_data_per_chunk;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the space after (int).

j != min[i->first].end(); ++j) {
repair_subchunk_count += j->second;
}
repair_data_per_chunk = repair_subchunk_count*subchunk_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add spaces around *.

@mynaramana
Copy link
Contributor Author

Hi Kefu, I updated the code according to the review comments.
Thanks
Myna.

@@ -71,28 +68,57 @@ int ECUtil::decode(
need.insert(i->first);
}

for (uint64_t i = 0; i < total_data_size; i += sinfo.get_chunk_size()) {
set<int> avail;
for (auto i = to_decode.begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mynaramana could you address this comment also?

@@ -214,6 +216,13 @@ void ECSubRead::decode(bufferlist::iterator &bl)
::decode(to_read, bl);
}
::decode(attrs_to_read, bl);
if (struct_v > 2 && struct_v > struct_compat) {
::decode(subchunks, bl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indent.

if (struct_v > 2 && struct_v > struct_compat) {
::decode(subchunks, bl);
} else {
for (auto &&i:attrs_to_read) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add spaces around :.

::decode(subchunks, bl);
} else {
for (auto &&i:attrs_to_read) {
subchunks[i].push_back(make_pair(0,1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space before ,.

if (min.find(i.first) == min.end()) continue;
else {
int repair_subchunk_count = 0;
for (auto j = min[i.first].begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use range-based loop here also.

i != out.end();
++i) {
assert(i->second->length() == total_data_size);
for (auto i = out.begin(); i != out.end(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you switch to range-based for loop, since you are at here.

}

map<int, vector<pair<int, int>>> min;
assert(ec_impl->minimum_to_decode(need, avail, &min) == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mynaramana we should not rely on the side effect in assert(), in future, assert() in Ceph could be optimized out if NDEBUG is defined.

}
vector<pair<int, int>> default_subchunks;
default_subchunks.push_back(make_pair(0, get_sub_chunk_count()));
for(auto &&id:minimum_shard_ids){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add spaces around :

int subchunk_size = sinfo.get_chunk_size()/ec_impl->get_sub_chunk_count();

for (auto &&i : to_decode) {
if (min.find(i.first) == min.end()) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this is an anti-pattern:

  1. find and throw away the returned iterator,
  2. use operator[] to locate the element to find again.

also, j deserves a better name. could you change it to something like:

auto found = min.find(i.first);
if (found != min.end()) {
  int repair_subchunk_count = 0;
  for (auto& subchunks : found) {
    repair_subchunk_count += subchunks.second;
  }
  break;
}

@tchaikov
Copy link
Contributor

tchaikov commented Oct 31, 2017

@mynaramana the rados run looks good. most of my comments are just formatting and code style related issues , but the assert() one is a more important one.

}

map<int, vector<pair<int, int>>> min;
assert(ec_impl->minimum_to_decode(need, avail, &min) == 0);
int r = ec_impl->minimum_to_decode(need, avail, &min) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r is a boolean casted to int, and it should always be 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this!

@tchaikov
Copy link
Contributor

@mynaramana looks great! could you squash the changes addressing review comments into the related commits respectively?

…data than needed.

Made all the helper information uniform across all helper nodes.
Authors: Myna, Elita.
Signed-off-by: Myna Vajha <mynaramana@gmail.com>
@tchaikov tchaikov merged commit 01a65f0 into ceph:master Nov 1, 2017
@tchaikov
Copy link
Contributor

tchaikov commented Nov 1, 2017

the failed test is unrelated, i will take a look at it tmr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants