New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: add writesame API #12645

Merged
merged 14 commits into from Feb 28, 2017

Conversation

Projects
None yet
2 participants
@guihecheng
Contributor

guihecheng commented Dec 23, 2016

The writesame support for librbd base on LiumxNL's pr: #10019

I rebased the code on the current master branch and drive it to work.
Changes:

  1. drop writesame2 as suggested
  2. keep "writesame" semantics on image request level
    (use librados::write instead of librados::writesame, I think normal write is just fine since librados::writesame also has the limit that write_len % data_len == 0 for an object request, which may not be statisfied when we split our image request into object requests.)
  3. make it workable under the current code base, because there are many changes after the original pr.

ToDo:

  1. add more tests: fsx etc.
  2. make writesame passthrough the cache, as suggested by jason
@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Dec 27, 2016

Oh, I think I've made a mistake on point 2.
If we do not use librados::writesame, librbd::writesame will lost the "offload" benifit completely which is right just the core value of writesame operation. :(

The main difficulty in using the librados::writesame is how to handle the inevitable data_buf across two adjacent objectextents. split into 2 object requests? or offload it to librados::writesame by passing an extra parameter? I shall figure it out, anyone has suggestions or other comments?

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 3, 2017

@guihecheng I think any writesame object extents that are not properly aligned should be submitted as standard writes. In reality, I suspect most users are going to send small (512byte or 4Kbyte) writesame requests to initialize an image, so splitting up the unaligned sections and sending them off is not a big deal. Of course, it would be really nice to detect a writesame of all zeros and just issue a discard instead since ESX issues writesame requests to zero out new VM images (and why waste the space and send zeroes when we don't have to do that).

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Jan 4, 2017

@dillaman Oh, I got it, writesame is mostly (if not only) needed by iscsi targets, and the data block passed along should be expected to be small.(Splitting object extents makes things a bit complex as I tried :( )

Detecting zeroing is definite, and I think it shall be detected very early in the writesame path.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 10, 2017

@guihecheng Sorry -- just noticed you had posted an update (I don't get automatic notifications when an update is pushed). I'll review ASAP.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 10, 2017

Can you update "test/librbd/fsx.cc" to exercise the new writesame API method? Just have it select the writesame op for same random small percent of ops.

bool write_full = (m_object_off == 0 && m_object_len == m_ictx->get_object_size());
ldout(m_ictx->cct, 20) << "send_write " << this << " " << m_oid << " "
<< m_object_off << "~" << m_object_len
<< " object exist " << m_object_exist

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Nit: no need to write m_object_exist to debug log since it won't impact the logic of this function

This comment has been minimized.

@guihecheng

guihecheng Jan 11, 2017

Contributor

yes

@@ -212,6 +212,14 @@ void AioImageRequest<I>::aio_write(I *ictx, AioCompletion *c,
}
template <typename I>
void AioImageRequest<I>::aio_writesame(I *ictx, AioCompletion *c,
uint64_t off, uint64_t len, const char *buf,

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Nit: spacing

journal::EventEntry event_entry(journal::AioWriteSameEvent(offset, length,
bl));
tid = image_ctx.journal->append_io_event(std::move(event_entry), requests,
offset, length, synchronous);

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Nit: spacing

class AioImageWriteSame : public AbstractAioImageWrite<ImageCtxT> {
public:
AioImageWriteSame(ImageCtxT &image_ctx, AioCompletion *aio_comp, uint64_t off,
uint64_t len, const char *buf, uint64_t data_len, int op_flags)

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Nit: spacing

AioImageWriteSame(ImageCtxT &image_ctx, AioCompletion *aio_comp, uint64_t off,
uint64_t len, const char *buf, uint64_t data_len, int op_flags)
: AbstractAioImageWrite<ImageCtxT>(image_ctx, aio_comp, {{off, len}}),
m_buf(buf), m_data_len(data_len), m_off(0), offset(off), length(len),

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Nit: spacing

@@ -68,6 +68,31 @@ ssize_t AioImageRequestWQ::write(uint64_t off, uint64_t len, const char *buf,
return len;
}
ssize_t AioImageRequestWQ::writesame(uint64_t off, uint64_t len, const char *buf,
uint64_t data_len, int op_flags) {

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Nit: spacing

@@ -157,6 +182,33 @@ void AioImageRequestWQ::aio_write(AioCompletion *c, uint64_t off, uint64_t len,
}
}
void AioImageRequestWQ::aio_writesame(AioCompletion *c, uint64_t off, uint64_t len,
const char *buf, uint64_t data_len, int op_flags,

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Nit: spacing

@@ -613,6 +613,10 @@ CEPH_RBD_API ssize_t rbd_write(rbd_image_t image, uint64_t ofs, size_t len,
*/
CEPH_RBD_API ssize_t rbd_write2(rbd_image_t image, uint64_t ofs, size_t len,
const char *buf, int op_flags);
CEPH_RBD_API ssize_t rbd_writesame(rbd_image_t image, uint64_t off, size_t write_len,
const char *buf, size_t data_len, int op_flags);

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Nit: spacing

if (len == 0) {
ictx->snap_lock.get_read();
len = ictx->get_image_size(ictx->snap_id) - ofs;

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

Why would we want to default the write length to the (remaining) image size instead of letting the client tell us the length?

This comment has been minimized.

@guihecheng

guihecheng Jan 11, 2017

Contributor

yes, it's not reasonable to use image size when 'len' is given a zero exactly by caller.

@@ -174,6 +174,10 @@ void mirror_image_status_cpp_to_c(const librbd::mirror_image_status_t &cpp_statu
c_status->up = cpp_status.up;
}
bool all_zero(const char *buf, uint64_t len) {

This comment has been minimized.

@dillaman

dillaman Jan 10, 2017

Contributor

See mem_is_zero in "include/inline_memory.h"

This comment has been minimized.

@guihecheng

guihecheng Jan 11, 2017

Contributor

thanks for your reminding.

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Jan 11, 2017

@dillaman thanks for your comments and sorry for the spacing problems.
I'll fix problems and add a fsx case for writesame in the next round.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 11, 2017

@guihecheng No problem -- appreciate the contribution

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Jan 20, 2017

@dillaman hi, jason, I've updated things:

  1. handle stripping for writesame, the handle is different from before. I use an extra bufferlist m_cache_bl to hold the whole writesame buffer, so we could get rid of the m_data_offset and directly use buffer_extent's offset and length to get the right bl. Then we check each object_extent to see if we could do librados::writesame or not.
    For stripping, now we only do writesame under 'su % m_data_len == 0', if odd stripping is encountered(I think it should be a rare case), just do a normal write. I think callers of librbd should wisely choose a 'friendly' stripping strategy when writesame is to be done.
  2. add a fsx test for writesame(a new option -G is introduced to specify stripe_unit, so we can test writesame on stripping).
  3. other fixes: handle perfcounter for writesame separately; fix spacings.

feedbacks appreciated :)

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 20, 2017

@guihecheng Can you add a new teuthology test case for running fsx w/ writesame enabled? Just create a new "fsx_writesame.yaml" file here [1]. You will need to tweak the fsx task runner [2] to add support for the new command-line option.

[1] https://github.com/ceph/ceph/tree/master/qa/suites/rbd/librbd/workloads
[2] https://github.com/ceph/ceph/blob/master/qa/tasks/rbd_fsx.py#L71

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Jan 22, 2017

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Jan 23, 2017

@dillaman I think it shall be enough to add the new '-G' option to the fsx task runner. The '-G' option just gives a way to specify stripe_unit & stripe_count to test writesame on stripped images, not a switch to turn on/off writesame for fsx.
Now writesame is just a normal operation that may be selected randomly for execution.
So when we run 'rbd_fsx' task, writesame is already included. If we want to test writesame on stripped images, just add 'given_stripping: "64K:16"' to the fsx.yaml.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 23, 2017

@guihecheng I don't think the -G option is that useful since fsx already randomly tests various striping settings. Locking in to a fixed stripe setting would reduce the test coverage. As for writesame being randomly selected, just ensure that it isn't randomly selected when in krbd / nbd mode (haven't reviewed the code yet, might already be handled).

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 23, 2017

retest this please

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 23, 2017

@guihecheng FYI - recent merged changes to the IO path have resulted in a build failure

if (randomoplen)
size = get_random() % (maxoplen + 1);
if (randomoplen) {
data_size = get_random() % (maxoplen / wsf);

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

This change would affect all other ops and align them based on writesame limits. I would suggest following the example of writebdy and tweak the size directly within the dowritesame method.

This comment has been minimized.

@guihecheng

guihecheng Jan 24, 2017

Contributor

ok. (actually I am trying best to make writesame operations aligned.)

This comment has been minimized.

@dillaman

dillaman Jan 24, 2017

Contributor

Definitely good to verify that unaligned writesames are functional

@@ -47,6 +47,17 @@ struct AioImageRequest<MockReplayImageCtx> {
s_instance->aio_flush(c);
}
MOCK_METHOD6(aio_writesame, void(AioCompletion *c, uint64_t off,
uint64_t len, const char *buf, uint64_t data_len,

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

Nit: spacing

@@ -30,6 +30,8 @@ struct MockImageCache {
MOCK_METHOD3(aio_discard, void(uint64_t, uint64_t, Context *));
MOCK_METHOD1(aio_flush, void(Context *));
MOCK_METHOD6(aio_writesame, void(uint64_t, uint64_t, const char*, uint64_t,
int, Context *));

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

Nit: spacing

AioCompletion *aio_comp = AioCompletion::create_and_start(
&aio_comp_ctx, ictx, AIO_TYPE_WRITESAME);
MockAioImageWriteSame mock_aio_image_writesame(mock_image_ctx, aio_comp,
0, 1, "1", 1, 0);

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

Nit: spacing

m_buf(buf), m_data_len(data_len), m_op_flags(op_flags),
may_writesame(false) {
for (int i = 0; i < len / data_len; i++)
m_cache_bl.append(buf, data_len);

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

Shouldn't need to allocate space if the cache isn't in use (e.g. tcmu-runner will have the cache disabled and is the primary use-case). Also, if I writesame 512bytes over 4TB of image, this is going to be very bad.

This comment has been minimized.

@guihecheng

guihecheng Jan 24, 2017

Contributor

yes, a huge bl will it be on this case. I shall eliminate the badness.

template <typename I>
bool AioImageWriteSame<I>::check_may_writesame(
const ObjectExtents &object_extents) {

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

Nit: spacing

uint64_t m_data_len;
int m_op_flags;
bufferlist m_cache_bl;
bool may_writesame;

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

Member variables should be prefixed w/ m_

This comment has been minimized.

@guihecheng

guihecheng Jan 24, 2017

Contributor

ok

}
assemble_write_extent(object_extent, &bl);
req = AioObjectRequest<I>::create_write(&image_ctx,
object_extent.oid.name, object_extent.objectno,

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

Nit: spacing

if (may_writesame) {
if (assemble_writesame_extent(object_extent, &bl)) {
req = AioObjectRequest<I>::create_writesame(&image_ctx,
object_extent.oid.name, object_extent.objectno,

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

Nit: spacing

return;
}
if (size % data_size != 0) {

This comment has been minimized.

@dillaman

dillaman Jan 23, 2017

Contributor

If you generated the data_size here (tweaking the size if needed) you could eliminate this condition.

This comment has been minimized.

@guihecheng

guihecheng Jan 24, 2017

Contributor

only if the data_size generated is 'good' enough, I shall tweak it.

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Jan 24, 2017

@dillaman I checked the building details above, but I don't think it is directly related to my changes. I will do a rebase in next round and check the patch spacings again.
If we do not specify stripe_unit & stripe_count with '-G' option, then we may never hit the writesamable code path(su % data_len == 0, as explained in my comment above) on stripped images even after thousands of random rounds of operations. Actually, I am just right considering test coverage on code paths.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 24, 2017

@guihecheng There are other ways to exercise that code path besides hard-coding specific stripe settings. Of course, I'd still like to avoid that check_may_writesame logic

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Jan 26, 2017

@dillaman OK, I've got a better idea to check object writesame without the use of the check_may_writesame. I'll cook it out in the next round.

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Feb 7, 2017

@dillaman hi, jason, I've refactored the object extent writesame logic with the use of the offset & len of a 'buffer_extent'. And the '-G' option is kicked out since now we do not have to check specifically for stripe_unit. (somewhat delayed to fix a bug when testing with fsx:#13287)

@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 13, 2017

retest this please

unsigned good = 1;
unsigned curr = good;
srandom(time(NULL));

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

random has already been initialized -- no need to keep re-initializing it. In fact, for test repeatability we rely on the fact that we can pass the seed value and recreate a test run exactly.

This comment has been minimized.

@guihecheng

guihecheng Feb 14, 2017

Contributor

@dillaman oh, yes, I shall use the random_generator instead.

return true;
}
uint64_t sub_off, sub_len, extent_left;

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

Nit: use tightest variable scope as possible

This comment has been minimized.

@guihecheng

guihecheng Feb 14, 2017

Contributor

Yes, Inside the loop.

sub_len = m_data_len;
}
}
if (extent_left)

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

Nit: braces around one-line if statements

bufferlist bl;
if (assemble_writesame_extent(object_extent, &bl)) {
for (int i = 0; i < object_extent.length / m_data_len - 1; i++)

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

Nit: braces around single-line for loop

bufferlist bl;
if (assemble_writesame_extent(object_extent, &bl)) {
for (int i = 0; i < object_extent.length / m_data_len - 1; i++)

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

Nit: size_t or uint32_t instead of int

bufferlist bl;
if (assemble_writesame_extent(object_extent, &bl)) {
for (int i = 0; i < object_extent.length / m_data_len - 1; i++)

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

If object extent == data length, it won't append any data to the bufferlist

This comment has been minimized.

@guihecheng

guihecheng Feb 14, 2017

Contributor

@dillaman Actually if assemble_writesame_extent returns true, bl will have exactly one copy of m_buf for writesame, no need to append anything on object_extent.length == m_data_len, so there is a '-1'.

This comment has been minimized.

@dillaman

dillaman Feb 14, 2017

Contributor

Ack -- missed that the bl is passed into the assemble_writesame_extent method. A short comment would be helpful.

This comment has been minimized.

@guihecheng

guihecheng Feb 14, 2017

Contributor

@dillaman then, what about:
// fill cache bl with (N -1) copies that is optimized out for writesame.
it goes right above the for loop.

This comment has been minimized.

@dillaman

dillaman Feb 14, 2017

Contributor

or -- perhaps the cleaner approach is to add a new "bool force_write" param to assemble_writesame_extent that disables the writesame for this cache case?

This comment has been minimized.

@guihecheng

guihecheng Feb 14, 2017

Contributor

@dillaman understood, it's workable and cleaner, thanks.

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Feb 14, 2017

@dillaman 'force_write' adopted and retested under several random seeds.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 14, 2017

retest this please

@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 14, 2017

The ceph_test_librbd application is crashing during teuthology test runs [1]. Also -- the fsx test run via the nbd driver crashed (probably because it doesn't support writesame).

[1] http://pulpito.ceph.com/jdillaman-2017-02-14_10:15:35-rbd-wip-jd-testing-distro-basic-smithi/

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Feb 15, 2017

@dillaman after take a look at the logs, I found 3 problems in total:

  1. ceph_test_librbd_fsx segfault -- due to nbd writesame not implemented as you said, will fix it
  2. ceph_test_librbd failed on buffer all zero case -- because we are doing discard actually, so read returns random data while all zero expected, so we modify the expected result of the test or just drop the discard logic?
  3. ceph_test_librbd segfault -- I think it is not related to this series because it segfaults on the current master as well, I am bisecting the commits, hoping to find the cause soon.
@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 15, 2017

@guihecheng

ceph_test_librbd_fsx segfault -- due to nbd writesame not implemented as you said, will fix it

The krbd test case would have the same issue.

ceph_test_librbd failed on buffer all zero case -- because we are doing discard actually, so read returns random data while all zero expected, so we modify the expected result of the test or just drop the discard logic?

I assume that failure was hit when "skip_partial_discard" was enabled? If so, I would modify the internal discard IO operation to take an extra param if partial discards should be skipped. In the normal librbd discard API methods, it would pass the value in ImageCtx::skip_partial_discard whereas writesame would always pass false.

ceph_test_librbd segfault -- I think it is not related to this series because it segfaults on the current master as well, I am bisecting the commits, hoping to find the cause soon.

The daily teuthology test runs against master are not showing any current seg faults on the master branch.

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Feb 20, 2017

@dillaman hi, jason, I add a skip case for fsx and a pass-in param to make writesame get all zeros with rbd_skip_partial_discard = true. The segfault is due to this option as well, easy to reproduce.

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Feb 21, 2017

sorry, conflicting, I'm rebasing.

@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Feb 22, 2017

@dillaman hi, jason, patches have been rebased, using 2 extra commits just to make the changes more clear for reviewing.

@@ -511,7 +522,7 @@ template <typename I>
void ImageDiscardRequest<I>::prune_object_extents(ObjectExtents &object_extents) {
I &image_ctx = this->m_image_ctx;
CephContext *cct = image_ctx.cct;
if (!cct->_conf->rbd_skip_partial_discard) {
if (!cct->_conf->rbd_skip_partial_discard || this->m_force_discard) {

This comment has been minimized.

@dillaman

dillaman Feb 22, 2017

Contributor

Sorry -- thinking about this a little more, I think that the ImageDiscardRequest should just take a "skip_partial_discard" boolean instead of "force" discard.

There is already a tracker ticket for recording the state of "skip partial discard" in the journal event [1], so seeing your change to record the "force discard" in the journal event reminded me of the issue.

The "writesame" API methods would always pass "false" -- other callers to discard would pass the contents of "ImageCtx::skip_partial_discard" (which would need to be added like other config params are stored in ImageCtx).

[1] http://tracker.ceph.com/issues/16571

This comment has been minimized.

@guihecheng

guihecheng Feb 23, 2017

Contributor

oh, yes, I didn't find the ImageCtx::skip_partial_discard which is missing, will update to add it.

@@ -28,17 +28,21 @@ class ImageRequestWQ : protected ThreadPool::PointerWQ<ImageRequest<ImageCtx> >
ssize_t read(uint64_t off, uint64_t len, ReadResult &&read_result,
int op_flags);
ssize_t write(uint64_t off, uint64_t len, bufferlist &&bl, int op_flags);
int discard(uint64_t off, uint64_t len);
int discard(uint64_t off, uint64_t len, bool force_discard=false);

This comment has been minimized.

@dillaman

dillaman Feb 22, 2017

Contributor

Nit: avoid defaulting the param and instead just fix the compile errors. Especially important given the comment re: changing the logic to just pass "skip partial discard" bool around instead of "force"

void aio_read(AioCompletion *c, uint64_t off, uint64_t len,
ReadResult &&read_result, int op_flags, bool native_async=true);
void aio_write(AioCompletion *c, uint64_t off, uint64_t len,
bufferlist &&bl, int op_flags, bool native_async=true);
void aio_discard(AioCompletion *c, uint64_t off, uint64_t len,
bool native_async=true);
bool native_async=true, bool force_discard=false);

This comment has been minimized.

@dillaman

dillaman Feb 22, 2017

Contributor

Nit: same comment here re: defaulted force discard bool

@@ -71,16 +71,19 @@ class DumpVisitor : public boost::static_visitor<void> {
void AioDiscardEvent::encode(bufferlist& bl) const {
::encode(offset, bl);
::encode(length, bl);
::encode(force_discard, bl);

This comment has been minimized.

@dillaman

dillaman Feb 22, 2017

Contributor

You need to increment the encode/decode versions when changing structs.

[1] https://github.com/ceph/ceph/blob/master/src/librbd/journal/Types.cc#L321

}
void AioDiscardEvent::decode(__u8 version, bufferlist::iterator& it) {
::decode(offset, it);
::decode(length, it);
::decode(force_discard, it);

This comment has been minimized.

@dillaman

dillaman Feb 22, 2017

Contributor

Only decode this param if version >= 5 (the new struct version)

Gui Hecheng added some commits Feb 21, 2017

Gui Hecheng
librbd: add writesame AioObjectRequest
Based on pr: #10019.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
Gui Hecheng
librbd/journal: handle writesame event
Based on pr: #10019.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
Gui Hecheng
librbd: add writesame AioImageRequest
Based on pr: #10019.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
Gui Hecheng
librbd: handle writesame request in AioImageRequestWQ
Based on pr: #10019.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
Gui Hecheng
librbd: implement writesame API
Based on pr: #10019.
This drops the original writesame2 api.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
Gui Hecheng
librbd: writesame do discard on an all-zero buffer
Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Gui Hecheng
librbd: handle perfcounter for writesame separately
Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Gui Hecheng
test/librbd: handle writesame op in librados_test_stub, required
Based on pr: #10019.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
Gui Hecheng
test/librbd: handle writesame op in test_mock_Replay
Based on pr: #10019.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
Gui Hecheng
test/librbd: add tests for rbd writesame
Based on pr: #10019.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
Gui Hecheng
test/librbd: add writesame test for fsx
Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Gui Hecheng
librbd: make writesame get all zeros with rbd_skip_partial_discard on
Now writesame tends to do a discard on all zero buffer, but
if rbd_skip_partial_discard is turned on, discard will not do any
zeroing work on partial discard, so we read random data while expecting
all zeros.
To fix it, we introduce the ImageCtx::skip_partial_discard and
pass an extra parameter to discard. Writesame will always pass false
and other discards will pass the ImageCtx::skip_partial_discard
which is set by the rbd_skip_partial_discard option.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Gui Hecheng
librbd: adopt override for writesame related header files
Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
Gui Hecheng
test/librbd: skip expect object request send when skip_partial_discar…
…d on

With skip_partial_discard on, there may be no object request send calls,
because they are skipped.

Signed-off-by: Gui Hecheng <guihecheng@cmss.chinamobile.com>
@guihecheng

This comment has been minimized.

Contributor

guihecheng commented Feb 24, 2017

@dillaman hi, jason, the skip_partial_discard is adopted, thanks.

@dillaman

lgtm

@dillaman dillaman changed the title from Wip rbd writesame api to librbd: add writesame API Feb 27, 2017

@dillaman dillaman merged commit 4756ce6 into ceph:master Feb 28, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@guihecheng guihecheng deleted the guihecheng:wip-rbd-writesame-api branch Feb 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment