Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: fine-grained statistics of logical object space usage #15199

Merged
merged 3 commits into from Sep 30, 2017

Conversation

Projects
None yet
3 participants
@xiexingguo
Copy link
Member

commented May 22, 2017

To test this change, we create an image of 5GB and do rbd bench write of 100MB:
./bin/rbd create bar -s 5120 && ./bin/rbd bench --io-type write --io-size 32K --io-total 100M --io-pattern rand rbd/bar

Below is the test result.

Was(3191MB):

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27052M        3859M         12.49
POOLS:
    NAME                  ID     USED      %USED     MAX AVAIL     OBJECTS
    rbd                   0      3191M     26.36         8914M        1174
    cephfs_data_a         1          0         0         8914M           0
    cephfs_metadata_a     2       2246         0         8914M          21

Now(about 100MB):

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27050M        3861M         12.49
POOLS:
    NAME                  ID     USED        %USED     MAX AVAIL     OBJECTS
    rbd                   0      101216k      1.10         8913M        1178
    cephfs_data_a         1            0         0         8913M           0
    cephfs_metadata_a     2          892         0         8913M          21

E.g., this change can make "osd pool set-quota max_bytes" work nicely.

Signed-off-by: xie xingguo xie.xingguo@zte.com.cn

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-object-logic-size branch from 580be10 to 5b76b55 Jun 14, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

commented Jun 14, 2017

@liewegas Any comments?

@liewegas liewegas requested a review from jdurgin Jun 14, 2017

@liewegas

This comment has been minimized.

Copy link
Member

commented Jun 14, 2017

Interesting! This is surprisingly simple.

In order to rely on this we probably need to update (deep?) scrub to verify the accuracy of the extents field. Presumably by reading everything outside the range and asserting it is zeros...

@liewegas

This comment has been minimized.

Copy link
Member

commented Jun 14, 2017

@jdurgin what do you think?

@jdurgin

This comment has been minimized.

Copy link
Member

commented Jun 14, 2017

I like the idea, I'm afraid it will end up a little more complex though. e.g. what about rbd clusters that use the set_alloc_hint to fully allocate space for an object with the first write?

This would also need to handle other write ops, e.g. truncate, zero, rollback, delete, etc.

write_update_size_and_usage() is only called for write, writefull, and ops that are implemented in terms of those.

@liewegas

This comment has been minimized.

Copy link
Member

commented Sep 1, 2017

Yeah, I like the idea. If we can capture the other ops with similarly concise updates I'm all for this.

We'd need to make the compat decode path assume the full range is allocated in order to handle upgrades, and make sure we only trigger the sub-extent updates only if require_osd_release >= MIMIC.

And scrub.

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-object-logic-size branch 6 times, most recently from fa6e957 to 0e2628a Sep 11, 2017

@xiexingguo xiexingguo added the DNM label Sep 18, 2017

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-object-logic-size branch 6 times, most recently from 5ff30cf to 5e91a99 Sep 18, 2017

@xiexingguo xiexingguo removed the DNM label Sep 24, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

commented Sep 25, 2017

@liewegas ping

// trunc up
interval_set<uint64_t> to_add;
to_add.insert(oi.size, truncate_size - oi.size);
oi.extents.union_of(to_add);

This comment has been minimized.

Copy link
@liewegas

liewegas Sep 25, 2017

Member

should a simple oi.extents.insert(oi.size, truncate_size - oi.size) be sufficient? no need for to_add and union_of

rollback_to->obs.oi.extents.begin();
p != rollback_to->obs.oi.extents.end(); ++p) {
obs.oi.extents.insert(p.get_start(), p.get_len());
}

This comment has been minimized.

Copy link
@liewegas

liewegas Sep 25, 2017

Member

wouldn't obs.oi.extents = rollback_to->obs.oid.extents be sufficient?

@@ -5091,6 +5100,7 @@ void object_info_t::dump(Formatter *f) const
f->dump_unsigned("expected_write_size", expected_write_size);
f->dump_unsigned("alloc_hint_flags", alloc_hint_flags);
f->dump_object("manifest", manifest);
f->dump_stream("logical_extents") << extents;

This comment has been minimized.

Copy link
@liewegas

liewegas Sep 25, 2017

Member

may as well make this structures. open_array_section("extents"), open_object_section("extent"), dump_unsigned("offset", ...), ...

copy_reply.extents.begin();
p != copy_reply.extents.end(); ++p) {
out_extents->insert(p.get_start(), p.get_len());
}

This comment has been minimized.

Copy link
@liewegas

liewegas Sep 25, 2017

Member

again, here, why not use operator=?

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-object-logic-size branch 2 times, most recently from 281be43 to 30055e0 Sep 26, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

commented Sep 26, 2017

retest this please

@xiexingguo

This comment has been minimized.

Copy link
Member Author

commented Sep 27, 2017

@liewegas liewegas added the needs-qa label Sep 27, 2017

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-object-logic-size branch from 30055e0 to de8dad5 Sep 27, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

commented Sep 27, 2017

changeset:
fix a typo in comment

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-object-logic-size branch 3 times, most recently from 7349b26 to b4b5c54 Sep 28, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2017

There are suspicious scrub-errors, need more testing and investigations.

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-object-logic-size branch from b4b5c54 to c11ca2d Sep 29, 2017

xiexingguo added some commits Sep 21, 2017

common/interval_set: override subset_of for given range
E.g.:
subset_of([5~10,20~5], 0, 100)  -> [5~10,20~5]
subset_of([5~10,20~5], 5, 25)   -> [5~10,20~5]
subset_of([5~10,20~5], 1, 10)   -> [5~5]
subset_of([5~10,20~5], 8, 24)   -> [8~7, 20~4]

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
osd: fine-grained statistics of logical object space usage
To test this change, we create an image of 5GB and do rbd bench write of 1GB:
./bin/rbd create bar -s 5120 && ./bin/rbd bench --io-type write --io-size 32K --io-total 100M --io-pattern rand  rbd/bar

Below is the test result.

Was:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27052M        3859M         12.49
POOLS:
    NAME                  ID     USED      %USED     MAX AVAIL     OBJECTS
    rbd                   0      3191M     26.36         8914M        1174
    cephfs_data_a         1          0         0         8914M           0
    cephfs_metadata_a     2       2246         0         8914M          21

Now:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27050M        3861M         12.49
POOLS:
    NAME                  ID     USED        %USED     MAX AVAIL     OBJECTS
    rbd                   0      101216k      1.10         8913M        1178
    cephfs_data_a         1            0         0         8913M           0
    cephfs_metadata_a     2          892         0         8913M          21

E.g., this change can make "osd pool set-quota max_bytes" work nicely.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
osd/PrimaryLogPG: allow trimmed read for OP_CHECKSUM
Normal reads support trimmed read length, and so shall checksums!

This fixes occasionally failure of rados/thrash test scripts, e.g.:
(1) create object using WriteOp with random generated length
(2) normal writes might accompany with TruncOp of randomized chosen truncate_size
(3) for ReadOp, pick a random 'length' to read, and do checksum simultaneously
    for the same range ([0, 'length']) to read too.

Since the 'length' for reading is randomized chosen, it might
exceed the current object size, and hence causing an EOVERFLOW error.

Related issues:
http://qa-proxy.ceph.com/teuthology/xxg-2017-09-22_01:52:47-rados-wip-object-logic-size-distro-basic-smithi/1657337
http://qa-proxy.ceph.com/teuthology/xxg-2017-09-22_14:14:19-rados-wip-object-logic-size-distro-basic-smithi/1658015

Fix the above problems by keeping pace with normal reads.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-object-logic-size branch from c11ca2d to 1e4263f Sep 30, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

commented Sep 30, 2017

retest this please

@xiexingguo xiexingguo merged commit cd6b983 into ceph:master Sep 30, 2017

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@xiexingguo xiexingguo deleted the xiexingguo:wip-object-logic-size branch Sep 30, 2017

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Oct 3, 2017

qa/standalone/scrub/osd-scrub-repair.sh: add extents flag into object…
…_info_t

Introduced-by: ceph#15199
Fixes: http://tracker.ceph.com/issues/21618
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Oct 3, 2017

qa/standalone/scrub/osd-scrub-repair.sh: add extents flag into object…
…_info_t

Introduced-by: ceph#15199
Fixes: http://tracker.ceph.com/issues/21618
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Mar 29, 2018

interval_set: kill subset_of
It is originally introduced in ceph#15199
aiming at improving the pool-based **du** stats.
For performance concerns, ceph#19616 did
an incomplete revert of that PR and hence comes the clean-up job...

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Mar 29, 2018

interval_set: kill subset_of
It is originally introduced in ceph#15199
aiming at improving the pool-based **du** stats.
For performance concerns, ceph#19616 did
an incomplete revert of that PR and hence comes the following clean-up job...

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.