Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: fine-grained statistics of logical object space usage #15199

Merged
merged 3 commits into from Sep 30, 2017

Conversation

xiexingguo
Copy link
Member

@xiexingguo xiexingguo commented May 22, 2017

To test this change, we create an image of 5GB and do rbd bench write of 100MB:
./bin/rbd create bar -s 5120 && ./bin/rbd bench --io-type write --io-size 32K --io-total 100M --io-pattern rand rbd/bar

Below is the test result.

Was(3191MB):

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27052M        3859M         12.49
POOLS:
    NAME                  ID     USED      %USED     MAX AVAIL     OBJECTS
    rbd                   0      3191M     26.36         8914M        1174
    cephfs_data_a         1          0         0         8914M           0
    cephfs_metadata_a     2       2246         0         8914M          21

Now(about 100MB):

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27050M        3861M         12.49
POOLS:
    NAME                  ID     USED        %USED     MAX AVAIL     OBJECTS
    rbd                   0      101216k      1.10         8913M        1178
    cephfs_data_a         1            0         0         8913M           0
    cephfs_metadata_a     2          892         0         8913M          21

E.g., this change can make "osd pool set-quota max_bytes" work nicely.

Signed-off-by: xie xingguo xie.xingguo@zte.com.cn

@xiexingguo
Copy link
Member Author

@liewegas Any comments?

@liewegas liewegas requested a review from jdurgin June 14, 2017 13:59
@liewegas
Copy link
Member

Interesting! This is surprisingly simple.

In order to rely on this we probably need to update (deep?) scrub to verify the accuracy of the extents field. Presumably by reading everything outside the range and asserting it is zeros...

@liewegas
Copy link
Member

@jdurgin what do you think?

@jdurgin
Copy link
Member

jdurgin commented Jun 14, 2017

I like the idea, I'm afraid it will end up a little more complex though. e.g. what about rbd clusters that use the set_alloc_hint to fully allocate space for an object with the first write?

This would also need to handle other write ops, e.g. truncate, zero, rollback, delete, etc.

write_update_size_and_usage() is only called for write, writefull, and ops that are implemented in terms of those.

@liewegas
Copy link
Member

liewegas commented Sep 1, 2017

Yeah, I like the idea. If we can capture the other ops with similarly concise updates I'm all for this.

We'd need to make the compat decode path assume the full range is allocated in order to handle upgrades, and make sure we only trigger the sub-extent updates only if require_osd_release >= MIMIC.

And scrub.

@xiexingguo xiexingguo force-pushed the wip-object-logic-size branch 6 times, most recently from fa6e957 to 0e2628a Compare September 18, 2017 05:38
@xiexingguo xiexingguo added the DNM label Sep 18, 2017
@xiexingguo xiexingguo force-pushed the wip-object-logic-size branch 6 times, most recently from 5ff30cf to 5e91a99 Compare September 24, 2017 11:44
@xiexingguo xiexingguo removed the DNM label Sep 24, 2017
@xiexingguo
Copy link
Member Author

@liewegas ping

// trunc up
interval_set<uint64_t> to_add;
to_add.insert(oi.size, truncate_size - oi.size);
oi.extents.union_of(to_add);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should a simple oi.extents.insert(oi.size, truncate_size - oi.size) be sufficient? no need for to_add and union_of

rollback_to->obs.oi.extents.begin();
p != rollback_to->obs.oi.extents.end(); ++p) {
obs.oi.extents.insert(p.get_start(), p.get_len());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't obs.oi.extents = rollback_to->obs.oid.extents be sufficient?

@@ -5091,6 +5100,7 @@ void object_info_t::dump(Formatter *f) const
f->dump_unsigned("expected_write_size", expected_write_size);
f->dump_unsigned("alloc_hint_flags", alloc_hint_flags);
f->dump_object("manifest", manifest);
f->dump_stream("logical_extents") << extents;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may as well make this structures. open_array_section("extents"), open_object_section("extent"), dump_unsigned("offset", ...), ...

copy_reply.extents.begin();
p != copy_reply.extents.end(); ++p) {
out_extents->insert(p.get_start(), p.get_len());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, here, why not use operator=?

@xiexingguo xiexingguo force-pushed the wip-object-logic-size branch 2 times, most recently from 281be43 to 30055e0 Compare September 26, 2017 07:07
@xiexingguo
Copy link
Member Author

retest this please

@xiexingguo
Copy link
Member Author

xiexingguo commented Sep 27, 2017

@xiexingguo
Copy link
Member Author

changeset:
fix a typo in comment

@xiexingguo xiexingguo force-pushed the wip-object-logic-size branch 3 times, most recently from 7349b26 to b4b5c54 Compare September 28, 2017 10:22
@xiexingguo
Copy link
Member Author

There are suspicious scrub-errors, need more testing and investigations.

E.g.:
subset_of([5~10,20~5], 0, 100)  -> [5~10,20~5]
subset_of([5~10,20~5], 5, 25)   -> [5~10,20~5]
subset_of([5~10,20~5], 1, 10)   -> [5~5]
subset_of([5~10,20~5], 8, 24)   -> [8~7, 20~4]

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
To test this change, we create an image of 5GB and do rbd bench write of 1GB:
./bin/rbd create bar -s 5120 && ./bin/rbd bench --io-type write --io-size 32K --io-total 100M --io-pattern rand  rbd/bar

Below is the test result.

Was:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27052M        3859M         12.49
POOLS:
    NAME                  ID     USED      %USED     MAX AVAIL     OBJECTS
    rbd                   0      3191M     26.36         8914M        1174
    cephfs_data_a         1          0         0         8914M           0
    cephfs_metadata_a     2       2246         0         8914M          21

Now:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27050M        3861M         12.49
POOLS:
    NAME                  ID     USED        %USED     MAX AVAIL     OBJECTS
    rbd                   0      101216k      1.10         8913M        1178
    cephfs_data_a         1            0         0         8913M           0
    cephfs_metadata_a     2          892         0         8913M          21

E.g., this change can make "osd pool set-quota max_bytes" work nicely.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Normal reads support trimmed read length, and so shall checksums!

This fixes occasionally failure of rados/thrash test scripts, e.g.:
(1) create object using WriteOp with random generated length
(2) normal writes might accompany with TruncOp of randomized chosen truncate_size
(3) for ReadOp, pick a random 'length' to read, and do checksum simultaneously
    for the same range ([0, 'length']) to read too.

Since the 'length' for reading is randomized chosen, it might
exceed the current object size, and hence causing an EOVERFLOW error.

Related issues:
http://qa-proxy.ceph.com/teuthology/xxg-2017-09-22_01:52:47-rados-wip-object-logic-size-distro-basic-smithi/1657337
http://qa-proxy.ceph.com/teuthology/xxg-2017-09-22_14:14:19-rados-wip-object-logic-size-distro-basic-smithi/1658015

Fix the above problems by keeping pace with normal reads.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
@xiexingguo
Copy link
Member Author

retest this please

@xiexingguo xiexingguo merged commit cd6b983 into ceph:master Sep 30, 2017
@xiexingguo xiexingguo deleted the wip-object-logic-size branch September 30, 2017 06:50
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Oct 3, 2017
…_info_t

Introduced-by: ceph#15199
Fixes: http://tracker.ceph.com/issues/21618
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Oct 3, 2017
…_info_t

Introduced-by: ceph#15199
Fixes: http://tracker.ceph.com/issues/21618
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Mar 29, 2018
It is originally introduced in ceph#15199
aiming at improving the pool-based **du** stats.
For performance concerns, ceph#19616 did
an incomplete revert of that PR and hence comes the clean-up job...

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Mar 29, 2018
It is originally introduced in ceph#15199
aiming at improving the pool-based **du** stats.
For performance concerns, ceph#19616 did
an incomplete revert of that PR and hence comes the following clean-up job...

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants