New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: fine-grained statistics of logical object space usage #15199
Conversation
580be10
to
5b76b55
Compare
@liewegas Any comments? |
Interesting! This is surprisingly simple. In order to rely on this we probably need to update (deep?) scrub to verify the accuracy of the extents field. Presumably by reading everything outside the range and asserting it is zeros... |
@jdurgin what do you think? |
I like the idea, I'm afraid it will end up a little more complex though. e.g. what about rbd clusters that use the set_alloc_hint to fully allocate space for an object with the first write? This would also need to handle other write ops, e.g. truncate, zero, rollback, delete, etc. write_update_size_and_usage() is only called for write, writefull, and ops that are implemented in terms of those. |
Yeah, I like the idea. If we can capture the other ops with similarly concise updates I'm all for this. We'd need to make the compat decode path assume the full range is allocated in order to handle upgrades, and make sure we only trigger the sub-extent updates only if require_osd_release >= MIMIC. And scrub. |
fa6e957
to
0e2628a
Compare
5ff30cf
to
5e91a99
Compare
This passed several round of QA tests: @liewegas I believe this is ready for review! |
@liewegas ping |
src/osd/PrimaryLogPG.cc
Outdated
// trunc up | ||
interval_set<uint64_t> to_add; | ||
to_add.insert(oi.size, truncate_size - oi.size); | ||
oi.extents.union_of(to_add); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should a simple oi.extents.insert(oi.size, truncate_size - oi.size) be sufficient? no need for to_add and union_of
src/osd/PrimaryLogPG.cc
Outdated
rollback_to->obs.oi.extents.begin(); | ||
p != rollback_to->obs.oi.extents.end(); ++p) { | ||
obs.oi.extents.insert(p.get_start(), p.get_len()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't obs.oi.extents = rollback_to->obs.oid.extents be sufficient?
src/osd/osd_types.cc
Outdated
@@ -5091,6 +5100,7 @@ void object_info_t::dump(Formatter *f) const | |||
f->dump_unsigned("expected_write_size", expected_write_size); | |||
f->dump_unsigned("alloc_hint_flags", alloc_hint_flags); | |||
f->dump_object("manifest", manifest); | |||
f->dump_stream("logical_extents") << extents; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may as well make this structures. open_array_section("extents"), open_object_section("extent"), dump_unsigned("offset", ...), ...
src/osdc/Objecter.h
Outdated
copy_reply.extents.begin(); | ||
p != copy_reply.extents.end(); ++p) { | ||
out_extents->insert(p.get_start(), p.get_len()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, here, why not use operator=?
281be43
to
30055e0
Compare
retest this please |
@liewegas All comments addressed. QA: http://pulpito.ceph.com/xxg-2017-09-27_07:46:11-rados-wip-object-logic-size-distro-basic-smithi/ |
30055e0
to
de8dad5
Compare
changeset: |
7349b26
to
b4b5c54
Compare
There are suspicious scrub-errors, need more testing and investigations. |
b4b5c54
to
c11ca2d
Compare
E.g.: subset_of([5~10,20~5], 0, 100) -> [5~10,20~5] subset_of([5~10,20~5], 5, 25) -> [5~10,20~5] subset_of([5~10,20~5], 1, 10) -> [5~5] subset_of([5~10,20~5], 8, 24) -> [8~7, 20~4] Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
To test this change, we create an image of 5GB and do rbd bench write of 1GB: ./bin/rbd create bar -s 5120 && ./bin/rbd bench --io-type write --io-size 32K --io-total 100M --io-pattern rand rbd/bar Below is the test result. Was: GLOBAL: SIZE AVAIL RAW USED %RAW USED 30911M 27052M 3859M 12.49 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 3191M 26.36 8914M 1174 cephfs_data_a 1 0 0 8914M 0 cephfs_metadata_a 2 2246 0 8914M 21 Now: GLOBAL: SIZE AVAIL RAW USED %RAW USED 30911M 27050M 3861M 12.49 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 101216k 1.10 8913M 1178 cephfs_data_a 1 0 0 8913M 0 cephfs_metadata_a 2 892 0 8913M 21 E.g., this change can make "osd pool set-quota max_bytes" work nicely. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Normal reads support trimmed read length, and so shall checksums! This fixes occasionally failure of rados/thrash test scripts, e.g.: (1) create object using WriteOp with random generated length (2) normal writes might accompany with TruncOp of randomized chosen truncate_size (3) for ReadOp, pick a random 'length' to read, and do checksum simultaneously for the same range ([0, 'length']) to read too. Since the 'length' for reading is randomized chosen, it might exceed the current object size, and hence causing an EOVERFLOW error. Related issues: http://qa-proxy.ceph.com/teuthology/xxg-2017-09-22_01:52:47-rados-wip-object-logic-size-distro-basic-smithi/1657337 http://qa-proxy.ceph.com/teuthology/xxg-2017-09-22_14:14:19-rados-wip-object-logic-size-distro-basic-smithi/1658015 Fix the above problems by keeping pace with normal reads. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
c11ca2d
to
1e4263f
Compare
retest this please |
…_info_t Introduced-by: ceph#15199 Fixes: http://tracker.ceph.com/issues/21618 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
…_info_t Introduced-by: ceph#15199 Fixes: http://tracker.ceph.com/issues/21618 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
It is originally introduced in ceph#15199 aiming at improving the pool-based **du** stats. For performance concerns, ceph#19616 did an incomplete revert of that PR and hence comes the clean-up job... Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
It is originally introduced in ceph#15199 aiming at improving the pool-based **du** stats. For performance concerns, ceph#19616 did an incomplete revert of that PR and hence comes the following clean-up job... Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
To test this change, we create an image of 5GB and do rbd bench write of 100MB:
./bin/rbd create bar -s 5120 && ./bin/rbd bench --io-type write --io-size 32K --io-total 100M --io-pattern rand rbd/bar
Below is the test result.
Was(3191MB):
Now(about 100MB):
E.g., this change can make "osd pool set-quota max_bytes" work nicely.
Signed-off-by: xie xingguo xie.xingguo@zte.com.cn