New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: import real thin-provision image #12883

Merged
merged 4 commits into from Apr 4, 2017

Conversation

Projects
None yet
5 participants
@mslovy
Contributor

mslovy commented Jan 11, 2017

currently, import check whether object is zero based on the image order (object size)
therefore, holes inside object will be filled with zero and hence waste spaces for the cluster.

@mslovy

This comment has been minimized.

Contributor

mslovy commented Jan 11, 2017

@ceph-jenkins retest this please

@@ -81,6 +82,7 @@ static int do_import(librbd::RBD &rbd, librados::IoCtx& io_ctx,
// try to fill whole imgblklen blocks for sparsification
uint64_t image_pos = 0;
size_t imgblklen = 1 << order;
size_t pagesize = CEPH_PAGE_SIZE;

This comment has been minimized.

@dillaman

dillaman Jan 12, 2017

Contributor

I think this should be configurable via a cli optional (i.e. -S / --sparse-size) -- I'm fine w/ defaulting to 4k, but don't use the page size since that changes on different architectures

size_t p_offset = 0;
size_t p_shift = 0;
while (b_end < blklen) {
if (b_end + pagesize > blklen)

This comment has been minimized.

@dillaman

dillaman Jan 12, 2017

Contributor

Nit: braces around one line if/else clauses

bufferptr ptr(blk_ptr, p_offset, p_shift);
p_offset += p_shift;
b_end += p_shift;
if (!ptr.is_zero())

This comment has been minimized.

@dillaman

dillaman Jan 12, 2017

Contributor

Nit: braces

b_end += p_shift;
if (!ptr.is_zero())
b_length += p_shift;
if ((ptr.is_zero() && b_length != 0) || b_end == blklen) {

This comment has been minimized.

@dillaman

dillaman Jan 12, 2017

Contributor

Nit: cache result of is_zero (since it isn't constant time op) or restructure the if blocks.

size_t b_length = 0;
size_t b_end = 0;
size_t p_offset = 0;
size_t p_shift = 0;

This comment has been minimized.

@dillaman

dillaman Jan 12, 2017

Contributor

Nit: move to local variable within the while block

@mslovy

This comment has been minimized.

Contributor

mslovy commented Jan 19, 2017

@dillaman , refine based on the comment. Also add it to ImportDiff.
I think it may also be needed for rbd cp, do you think so?

So we add --sparse_size option into image_options or image_specs_options? But we also need to pass the sparse_size params into librbd::copy so that it becomes librbd::copy( .... , size_t sparse_size = 0). is it reasonable?

@@ -815,6 +815,31 @@ int snap_set(librbd::Image &image, const std::string &snap_name) {
return 0;
}
bool calc_sparse_extent(const bufferptr &bp, size_t sparse_size, uint64_t length,
size_t &write_offset, size_t &write_length,

This comment has been minimized.

@yangdongsheng

yangdongsheng Jan 19, 2017

Member

I found the write_offset is not user in this function at all. do we really need this parameter?

This comment has been minimized.

@mslovy

mslovy Jan 23, 2017

Contributor

Oh, I must miss something when move the calc logic into the function. Thanks

bool calc_sparse_extent(const bufferptr &bp, size_t sparse_size, uint64_t length,
size_t &write_offset, size_t &write_length,
size_t &pos, size_t &offset) {
size_t shift;

This comment has been minimized.

@yangdongsheng

yangdongsheng Jan 19, 2017

Member

do you mean a size actually? I mean, when I see the variable of shift, I guess you are going to do a 'shift', such as ">>" or "<<". but actually you mean a size. I think you can make it more clear.

This comment has been minimized.

@mslovy

mslovy Jan 23, 2017

Contributor

urh, so any suggestions? shift_size?

This comment has been minimized.

@yangdongsheng

yangdongsheng Jan 23, 2017

Member

maybe extent_size?

write_length += shift;
}
if ((page_is_zero && write_length != 0) || pos == length) {

This comment has been minimized.

@yangdongsheng

yangdongsheng Jan 19, 2017

Member

how can we get a (page_is_zero && write_length != 0) ? I think when the page_is_zero is true, the write_length is always not 0. In addition why we return True when (pos == length)? maybe a comment here is great.

This comment has been minimized.

@mslovy

mslovy Jan 20, 2017

Contributor

No,when data ---> hole, we need a write. when reaching the end of the bl, we need a write any way. Please rethink it.

This comment has been minimized.

@yangdongsheng

yangdongsheng Jan 21, 2017

Member

Sorry for the typo in my last comment, I mean "if the page_is_zero is false the write_length is always not 0". Then it should be that:

/* Do a write for this extent when it is not zero or it is the last extent. */
if (!page_is_zero || pos == length) {
  return true;
}

Am I understanding it correctly? thanx

This comment has been minimized.

@yangdongsheng

yangdongsheng Jan 21, 2017

Member

BTW, can we use extent to replace page ?

This comment has been minimized.

@mslovy

mslovy Jan 23, 2017

Contributor

"extent" may make sense for me, thanks.

if continue holes or holes at the beginning, write_length is possible to be zero, right? Any other considerations?

This comment has been minimized.

@yangdongsheng

yangdongsheng Jan 23, 2017

Member

Yes, write_length is possible to be zero. but in that case, the page_is_zero is true, we will return a false. Is that correct behaviour of it? Let me try another question: is this logical correctly from your POV:

/* Do a write for this extent when it is not zero or it is the last extent. */
if (!page_is_zero || pos == length) {
  return true;
}

This comment has been minimized.

@mslovy

mslovy Jan 23, 2017

Contributor

absolutely, we don't want to submit holes...

@yangdongsheng

This comment has been minimized.

Member

yangdongsheng commented Jan 19, 2017

@mslovy please modify the help.t accordingly, and could you add a test case in qa/workunits/rbd/import_export.sh for the new feature? thanx

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 19, 2017

@mslovy There is an existing tracker ticket for sparsifying the copy operation (http://tracker.ceph.com/issues/15648). If you work on it as part of this PR, please add the "Fixes: " to that commit. As for updating the API, unfortunately you cannot update existing API methods. You would need to add *copy4 / *copy_with_progress4 methods.

@@ -293,6 +293,12 @@ void add_size_option(boost::program_options::options_description *opt) {
"image size (in M/G/T)");
}
void add_sparse_size_option(boost::program_options::options_description *opt) {
opt->add_options()
(IMAGE_SPARSE_SIZE.c_str(), po::value<ImageSparseSize>(),

This comment has been minimized.

@dillaman

dillaman Jan 19, 2017

Contributor

I would just vote to re-use ImageObjectSize here so that you don't need to create another validator that does the same thing.

@@ -815,6 +815,31 @@ int snap_set(librbd::Image &image, const std::string &snap_name) {
return 0;
}
bool calc_sparse_extent(const bufferptr &bp, size_t sparse_size, uint64_t length,
size_t &write_offset, size_t &write_length,

This comment has been minimized.

@dillaman

dillaman Jan 19, 2017

Contributor

In/out params should use pointers, not references

@@ -254,6 +270,11 @@ int execute(const po::variables_map &vm) {
deprecated_image_name = path.substr(path.find_last_of("/") + 1);
}
size_t sparse_size = 4096;

This comment has been minimized.

@dillaman

dillaman Jan 19, 2017

Contributor

Nit: move 4096 to a constant that can be used here and in import-diff

@mslovy

This comment has been minimized.

Contributor

mslovy commented Feb 12, 2017

@yangdongsheng , do you think what we should add in qa/workunits/rbd/import_export.sh since default sparse_size is 4k so that it is automatically cover the case?

@mslovy

This comment has been minimized.

Contributor

mslovy commented Feb 12, 2017

@dillaman , updated, can you take a look, thanks.

@@ -640,6 +641,7 @@
--path arg import file (or '-' for stdin)
-p [ --pool ] arg pool name
--image arg image name
--sparse-size arg sparse size in B/K/M [defaut: 4K]

This comment has been minimized.

@yangdongsheng

yangdongsheng Feb 13, 2017

Member

please squash this commit to the original ones.

@yangdongsheng

This comment has been minimized.

Member

yangdongsheng commented Feb 13, 2017

@mslovy You added a new option for import/import-diff here, but we need to make sure this option works well in different values, such as 4M or 8k.

@dillaman

Need some unit tests to verify that the sparse function actually works -- i.e. modify one or more of the tests under "qa/workunits/rbd" to exercise sparse import, import-diff, and copy functions.

void add_sparse_size_option(boost::program_options::options_description *opt) {
opt->add_options()
(IMAGE_SPARSE_SIZE.c_str(), po::value<ImageObjectSize>(),
"sparse size in B/K/M [defaut: 4K]");

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

Nit: "defaut" --> "default"

@@ -2078,7 +2082,66 @@ void filter_out_mirror_watchers(ImageCtx *ictx,
bufferlist *m_bl;
};
int copy(ImageCtx *src, ImageCtx *dest, ProgressContext &prog_ctx)
class C_CopyReadSparse : public Context {

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

Nit: I'd prefer to see C_CopyRead and C_CopyReadSparse combined -- update calc_sparse_extent to handle the "sparse_size == 0" case

@@ -151,8 +152,8 @@ namespace librbd {
int snap_is_protected(ImageCtx *ictx, const char *snap_name,
bool *is_protected);
int copy(ImageCtx *ictx, IoCtx& dest_md_ctx, const char *destname,
ImageOptions& opts, ProgressContext &prog_ctx);
int copy(ImageCtx *src, ImageCtx *dest, ProgressContext &prog_ctx);
ImageOptions& opts, ProgressContext &prog_ctx, size_t sparse_size = 0);

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

Nit: avoid defaulted params here and below -- just update librbd.cc

@@ -15,6 +15,7 @@ namespace rbd {
namespace utils {
static const std::string RBD_DIFF_BANNER ("rbd diff v1\n");
static const size_t RBD_DEFAULT_PAGESIZE = 4096;

This comment has been minimized.

@dillaman

dillaman Feb 13, 2017

Contributor

Nit: rename to RBD_DEFAULT_SPARSE_SIZE

@mslovy

This comment has been minimized.

Contributor

mslovy commented Feb 25, 2017

@dillaman , updated as comment. can you take a look again, thanks.

@dillaman

I would like to see some test cases that actually demonstrate that the sparse size is being respected. For example, if you create an image with "--image-feature layering" and run "rbd disk-usage", it will give you an accurate usage. You could also compare the output from "rbd diff" to see that the diffs exactly match the non-zero extents.

@@ -358,6 +358,8 @@ CEPH_RBD_API int rbd_copy(rbd_image_t image, rados_ioctx_t dest_io_ctx,
CEPH_RBD_API int rbd_copy2(rbd_image_t src, rbd_image_t dest);
CEPH_RBD_API int rbd_copy3(rbd_image_t src, rados_ioctx_t dest_io_ctx,
const char *destname, rbd_image_options_t dest_opts);
CEPH_RBD_API int rbd_copy4(rbd_image_t src, rados_ioctx_t dest_io_ctx,
const char *destname, rbd_image_options_t dest_opts, size_t sparse_size);

This comment has been minimized.

@dillaman

dillaman Feb 27, 2017

Contributor

Nit: move size_t sparse_size down to the next line since >80 character line length

rados_ioctx_t dest_p,
const char *destname,
rbd_image_options_t dest_opts,
librbd_progress_fn_t cb, void *cbdata, size_t sparse_size);

This comment has been minimized.

@dillaman

dillaman Feb 27, 2017

Contributor

Nit: move size_t sparse_size down to the next line since >80 character line length

@@ -253,11 +253,14 @@ class CEPH_RBD_API Image
int copy(IoCtx& dest_io_ctx, const char *destname);
int copy2(Image& dest);
int copy3(IoCtx& dest_io_ctx, const char *destname, ImageOptions& opts);
int copy4(IoCtx& dest_io_ctx, const char *destname, ImageOptions& opts, size_t sparse_size);

This comment has been minimized.

@dillaman

dillaman Feb 27, 2017

Contributor

Nit: move size_t sparse_size down to the next line since >80 character line length

int copy_with_progress(IoCtx& dest_io_ctx, const char *destname,
ProgressContext &prog_ctx);
int copy_with_progress2(Image& dest, ProgressContext &prog_ctx);
int copy_with_progress3(IoCtx& dest_io_ctx, const char *destname,
ImageOptions& opts, ProgressContext &prog_ctx);
int copy_with_progress4(IoCtx& dest_io_ctx, const char *destname,
ImageOptions& opts, ProgressContext &prog_ctx, size_t sparse_size);

This comment has been minimized.

@dillaman

dillaman Feb 27, 2017

Contributor

Nit: move size_t sparse_size down to the next line since >80 character line length

bufferlist *bl)
: m_throttle(throttle), m_dest(dest), m_offset(offset), m_bl(bl) {
bufferlist *bl, size_t sparse_size)
: m_throttle(throttle), m_dest(dest), m_offset(offset), m_bl(bl), m_sparse_size(sparse_size) {

This comment has been minimized.

@dillaman

dillaman Feb 27, 2017

Contributor

Nit: move size_t sparse_size down to the next line since >80 character line length

std::move(*m_bl),
LIBRADOS_OP_FLAG_FADVISE_DONTNEED);
} else {
m_throttle->end_op(r);

This comment has been minimized.

@dillaman

dillaman Feb 27, 2017

Contributor

Don't end the op here -- that would result in too many concurrent operations. You can use a C_Gather context to associate multiple aio_write calls with a single, joined callback to this object.

This comment has been minimized.

@mslovy

mslovy Mar 16, 2017

Contributor

it seems C_Gather in Context.h is not exactly suitable here. C_CopyWrite cannot become C_GatherSub roles in C_GatherBase.

Here, actually, we need to do completions for each sub_write, right? While in C_GatherBase, sub_finish() just --sub_existing_count, and does not call actual finisher callback here.

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

@mslovy The purpose of C_Gather is to associate the completion of multiple callbacks with the firing of a single callback. The C_Gather::new_sub() method should be to create new child subscriptions.

This comment has been minimized.

@mslovy

mslovy Mar 20, 2017

Contributor

sorry, I'm not catch up with you. Can you explain more specifically.

I know the purpose is to gather multipule C_CopyWrite callbacks as a single xxxx callback, and we want to call end_op() to release the throttle in the single callbacks.

But still, C_CopyWrite is need to call in order to release the corresponding bufferlist, in this case, as well as to decrease the count in C_Gather. Until count == 0 in C_Gather, automatically call the C_Gather callback to end_op(), am I right?

If though, current C_GatherBase in Context.h can do all those thing? or we need to use a new class inherit from C_GatherBase and add something new, such as override new_sub() to do this.

This comment has been minimized.

@dillaman

dillaman Mar 20, 2017

Contributor

@mslovy The easiest way to approach it would be to use lambdas (or inherit from C_Gather). If you use lambdas, you would do something like the following:

auto *end_op_ctx = new FunctionContext([m_throttle](int r) {
    m_throttle->end_op(r);
  });
auto gather_ctx = new C_Gather(m_dest->cct, end_op_ctx);

for (each chunk to write) {
  ... snip ...
  auto ctx = new C_CopyWrite(m_bl, gather_ctx->new_sub());
  auto comp = io::AioCompletion::create(ctx);
  ... snip ...
}

This allows multiple writes to be issued within the same "throttled" op slot, prevents starvation, prevents unbounded memory growth, etc.

bufferptr write_ptr(m_ptr, write_offset, write_length);
bufferlist *write_bl = new bufferlist();
write_bl->push_back(write_ptr);
m_throttle->start_op();

This comment has been minimized.

@dillaman

dillaman Feb 27, 2017

Contributor

Once you switch to a C_Gather and move the end_op down you shouldn't need this anymore

@@ -1850,29 +1851,65 @@ int mirror_image_disable_internal(ImageCtx *ictx, bool force,
}
assert(m_bl->length() == (size_t)r);
if (m_bl->is_zero()) {
if (!m_sparse_size) {

This comment has been minimized.

@dillaman

dillaman Feb 27, 2017

Contributor

Nit: for clarity, when m_sparse_size is zero, just assume it's set to the object size (1 << ictx->order) so that you can use the same code path and logic for both cases.

@mslovy

This comment has been minimized.

Contributor

mslovy commented Mar 24, 2017

updated

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 27, 2017

@mslovy Nit: please update the subject line of "Fixes: fix issue #15648" to something along the lines of "librbd: add sparse copy support"

auto *end_op_ctx = new FunctionContext([this](int r) {
this->m_throttle->end_op(r);
});
C_GatherBuilder gather(m_dest->cct, end_op_ctx);

This comment has been minimized.

@dillaman

dillaman Mar 27, 2017

Contributor

auto gather_ctx = new C_Gather(m_dest->cct, end_op_ctx)

This comment has been minimized.

@mslovy

mslovy Mar 28, 2017

Contributor

why not use C_GatherBuilder to create C_Gather? for Nit or other some other purposes?

This comment has been minimized.

@dillaman

dillaman Mar 28, 2017

Contributor

For consistency, librbd already uses the mentioned approach vs using an intermediate.

}
}
delete m_bl;
assert(gather.has_subs());

This comment has been minimized.

@dillaman

dillaman Mar 27, 2017

Contributor

Might not have any subs if it was 100% sparse, correct?

This comment has been minimized.

@mslovy

mslovy Mar 28, 2017

Contributor

yeah, but if 100% sparse,
it will return at
if (m_bl->is_zero()) {
delete m_bl;
m_throttle->end_op(r);
return;
}

So I make this assertion here

This comment has been minimized.

@dillaman

dillaman Mar 28, 2017

Contributor

OK -- overlooked that part

if (!extent_is_zero) {
*write_length += extent_size;
}
if (extent_is_zero && *write_length == 0) {

This comment has been minimized.

@dillaman

dillaman Mar 27, 2017

Contributor

Nit: extra space before *write_length

@mslovy

This comment has been minimized.

Contributor

mslovy commented Mar 28, 2017

update

@mslovy mslovy closed this Mar 28, 2017

@dillaman dillaman reopened this Mar 28, 2017

@dillaman

lgtm

}
auto *end_op_ctx = new FunctionContext([this](int r) {
this->m_throttle->end_op(r);

This comment has been minimized.

@dillaman

dillaman Mar 30, 2017

Contributor

Cannot access this pointer since this instance most likely will have already been deleted by the time the C_CopyWrite contexts complete.

This comment has been minimized.

@mslovy

mslovy Mar 31, 2017

Contributor

C_CopyWrite complete will not lead to destroy C_CopyRead, C_CopyRead will be automatically destroyed after throttle.wait_for_ret() in copy()。

In my environment, it passes the tests when run the tests unittest_librbd individually, when run all test by using make check,it may be failed unexpectedly。

This comment has been minimized.

@dillaman

dillaman Mar 31, 2017

Contributor

This instance will be automatically deleted by the time it leaves this function -- that means the FunctionContext here will have a pointer to a deleted object, which it will try to dereference and it will fail. You need to make a copy of the m_throttle pointer and pass the copy of the pointer into the lambda.

@mslovy

This comment has been minimized.

Contributor

mslovy commented Mar 31, 2017

false positive now for cephtool-test-mon.sh?

@mslovy

This comment has been minimized.

Contributor

mslovy commented Mar 31, 2017

retest this please

@dillaman

The invalid memory access needs to be fixed

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Mar 31, 2017

the failure of cephtool-test-mon.sh:

/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:550: test_auth_profiles:  ceph -n client.xx-profile-ro -k client.xx.keyring pg dump

is addressed by #14266.

@mslovy

This comment has been minimized.

Contributor

mslovy commented Mar 31, 2017

oh, got it. after finish, we delete this in complete(), fix it.

@mslovy

This comment has been minimized.

Contributor

mslovy commented Mar 31, 2017

rebase to master in order to avoid issue #14266

@mslovy

This comment has been minimized.

Contributor

mslovy commented Apr 1, 2017

@tchaikov still false positive exists.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Apr 3, 2017

retest this please

m_dest->io_work_queue->aio_write(comp, m_offset, m_bl->length(),
std::move(*m_bl),
LIBRADOS_OP_FLAG_FADVISE_DONTNEED);
auto *m_throttle = this->m_throttle;

This comment has been minimized.

@dillaman

dillaman Apr 3, 2017

Contributor

Nit: m_throttle -> throttle since the "m_" prefix is only for member variables.

This comment has been minimized.

@mslovy

mslovy Apr 4, 2017

Contributor

ok, refined Nit.

yaoning added some commits Jan 19, 2017

yaoning
rbd: add sparse size support in rbd
add --sparse_size in rbd command options
add calc_sparse_extent in rbd utils

Signed-off-by: yaoning <yaoning@unitedstack.com>
yaoning
rbd: import/import-diff real thin-provision image
currently, import/import-diff check whether object is zero based on the whole object
therefore, holes inside object will be filled with zero

Signed-off-by: yaoning <yaoning@unitedstack.com>
yaoning
librbd: add sparse copy support
Fixes: fix issue #15648
http://tracker.ceph.com/issues/15648

Signed-off-by: yaoning <yaoning@unitedstack.com>
Ning Yao
test: add test for rbd import,import-diff,copy with sparse-size option
Signed-off-by: Ning Yao <yaoning@unitedstack.com>

@dillaman dillaman merged commit 6be45f8 into ceph:master Apr 4, 2017

2 of 3 checks passed

default Build finished.
Details
Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment