New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/bluestore: add new garbage collector #12144
Conversation
4cc0768
to
588e942
Compare
f67dd94
to
810776f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits here, but I think we can do a bit better if we move this logic into _wctx_finish, after we put_ref. That way we can look at the final ref_map instead of making a full copy inside the GC class.
(Also, I think this will change somewhat once we have a different/more compact representation of the ref_map...)
src/os/bluestore/BlueStore.h
Outdated
extent1 <loffs = 100, boffs = 100, len = 10> -> blob1<compressed, len_on_disk=4096, logical_len=8192< | ||
extent2 <loffs = 200, boffs = 100, len = 10> -> blob2<raw, len_on_disk=4096, llen=4096> | ||
extent3 <loffs = 300, boffs = 100, len = 10> -> blob1<compressed, len_on_disk=4096, llen=8192> | ||
extent4 <loffs = 4096, boffs = 100, len = 10> -> blob3<raw, len_on_disk=4096, llen=4096> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume for these 4 len = 100, not 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for extent3, boffs = 300?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extent lengths doesn't matter much in this example - both 10 and 100 result the same behavior.
* per all blobsb to enable compressed blobs garbage collection | ||
* | ||
*/ | ||
OPTION(bluestore_gc_enable_total_threshold, OPT_INT, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want defaults at 0? Doesn't that mean aggressively collect even if there is only a tiny benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My considerations on that:
- There is no need to store data compressed if this provides no storage saving.
- This way we'll have data layout more consistent with the original write handling. No saving due to compression - no compressed blobs written.
Any other suggestions?
src/os/bluestore/BlueStore.h
Outdated
class GarbageCollector | ||
{ | ||
public: | ||
typedef vector<AllocExtent> SimpleAllocExtentsVector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this typedef obscures more than it helps
BTW few style nits for this PR and others: // double-slash comments have a space //not like this // comparison operators have spaces, like so if (a >= b) ; // not if (a >=b) ; // same with braces... void func(int foo) const { // not void func(int foo) const{ // etc. // 80 columns please |
810776f
to
20e62bb
Compare
@liewegas - resolved and rebased. Please take a look. |
W.r.t. to do GC at _wctx_finish - I refactored a code to avoid ref_map copying but left GC at the original location to avoid the need for duplicate do_write_data/do_alloc_write/_wctx_finish call sequence for garbage collected data. Still any objections against that? |
src/os/bluestore/bluestore_types.h
Outdated
ref_map.begin()->first == offset && | ||
ref_map.begin()->second.length == length && | ||
ref_map.begin()->second.refs == 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this hunk is obsolete
src/os/bluestore/BlueStore.cc
Outdated
<< " unref 0x" << std::hex << o << "~" << l | ||
<< std::dec << dendl; | ||
BlobInfo& bi = affected_blobs.emplace(b, BlobInfo(ref_bytes)).first->second; | ||
bi.referenced_bytes -= l; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the subtraction up front?
affected_blobs.emplace(b, BlobInfo(ref_bytes - l));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emplace() call returns previously existed entry for the key instead of inserting a new one. Hence we need to decrement counter afterwards to cover both new and existed entry cases.
src/os/bluestore/BlueStore.cc
Outdated
// by the write into account too | ||
auto b_it = | ||
affected_blobs.emplace(b, BlobInfo(b->get_referenced_bytes())).first; | ||
BlobInfo& bi = b_it->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could skip teh b_it intermediary and assign bi to ...emplace(...).first->second
src/os/bluestore/BlueStore.cc
Outdated
blob_info_counted = &bi; | ||
} | ||
used_alloc_unit = i; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this loop can be replaced with a
bi.expected_allocations += (alloc_unit_end - alloc_unit_start); if (used_alloc_unit && used_alloc_unit >= alloc_unit_start && used_alloc_unit < alloc_unit_end) { --bi.expected_allocations; } blob_info_counted = &bi; used_alloc_unit = alloc_unit_end; 
or similar, right? That avoids looping over AU's in a potentially large blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
src/os/bluestore/BlueStore.cc
Outdated
// when fake_ref_map is empty since subsequent extents might | ||
// decrement its expected_allocation. | ||
// Hence need to enumerate all the extents first. | ||
bi.collect_candidates.emplace_back(it->logical_offset, it->length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's better to skip building collection_candidates, and instead, if we decide this blob is toast, just iterate over the extent_map for the blob range. It'll be hot in the CPU cache, and we'll avoid the allocations for the map. Most of the time there won't be extents in that range to other blobs anyway, so we won't even be enumerating more elements than are in collect_candidates, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that makes sense.
20e62bb
to
c3160a3
Compare
@liewegas - resolved your comments. |
can you rebase on master please? |
Already rebased |
ah, i thought ref_map had already merged, that's why
|
c3160a3
to
37f76b5
Compare
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
…on from ref_map to ref counting properly. Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
…e_test.OnodeSizeTracking Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
37f76b5
to
3df2444
Compare
Per @liewegas removed from testing batch for now seeing some errors, trying to separate tests. http://pulpito.ceph.com/yuriw-2017-02-09_04:55:44-rados-wip-yuri-testing2_2017_2_9---basic-smithi/ |
Made brief analysis for @yuriw report. I don't see any bluestore/GC related issues there.
[ RUN ] ObjectStore/StoreTest.Synthetic/1
2017-02-09T03:37:59.502 INFO:teuthology.run:Summary data:
2017-02-09T02:59:00.225 INFO:teuthology.orchestra.run.smithi139.stderr:rm: cannot remove т<80><98>/var/lib/cephт<80><99>: No such file or directory 2017-02-09T02:59:00.383 DEBUG:teuthology.report:Pushing job info to http://paddles.front.sepia.ceph.com/
2017-02-09T02:31:18.714 INFO:teuthology.run:Summary data: 2017-02-09T02:31:18.714 DEBUG:teuthology.report:Pushing job info to http://paddles.front.sepia.ceph.com/
2017-02-09T02:25:55.661 INFO:tasks.workunit.client.0.smithi168.stdout:[ RUN ] EnvLibradosMutipoolTest.DBBasics
017-02-09T06:43:45.521 INFO:teuthology.run:Summary data: 2017-02-09T06:43:45.522 DEBUG:teuthology.report:Pushing job info to http://paddles.front.sepia.ceph.com/
[ RUN ] EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder
[ RUN ] ObjectStore/StoreTest.Synthetic/1
2017-02-09T02:54:13.211 INFO:teuthology.run:Summary data: 2017-02-09T02:54:13.212 DEBUG:teuthology.report:Pushing job info to http://paddles.front.sepia.ceph.com/ |
This is a final version version that estimates how many allocation units one can save if uncompress overlapped blob(s) and save them in raw format.
Rebased on top of PR #12904