Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd, test: reworks for manifest dedup test cases #39216

Merged
merged 52 commits into from Apr 9, 2021

Conversation

myoungwon
Copy link
Member

@myoungwon myoungwon commented Feb 2, 2021

This PR is what we've planed here (https://docs.ceph.com/en/latest/dev/osd_internals/manifest/).
The purpose of this PR is to rework and add manifest ops tests based on current
snapshot and flush scheme.

Signed-off-by: Myoungwon Oh myoungwon.oh@samsung.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@myoungwon myoungwon added tests and removed tests labels Feb 2, 2021
@myoungwon myoungwon force-pushed the wip-manifest-dedup-test branch 3 times, most recently from 06ab533 to 2459366 Compare February 8, 2021 03:02
@sebastian-philipp
Copy link
Contributor

jenkins test make check

@myoungwon myoungwon changed the title WIP: test: reworks for manifest dedup test cases test: reworks for manifest dedup test cases Feb 10, 2021
@myoungwon
Copy link
Member Author

@athanatos Base on https://docs.ceph.com/en/latest/dev/osd_internals/manifest/, this PR probably covers the scope of a test we have planned excepts for RBD, RGW. Please take a look if you have time to review it, and let me know what I missed.

@athanatos
Copy link
Contributor

I think the teuthology radosmodel tests need a portion at the end to run and validate that the references in the target pool are valid. We won't be able to catch refcount mistakes without that.

@athanatos
Copy link
Contributor

Have you run the full suite on this yet?

@myoungwon
Copy link
Member Author

@athanatos This PR is only tested on my local machine. If you agree to this, I'll run this PR through rados suite test. Also, I added a commit that checks the refcount is correct at the end of the test.

@athanatos
Copy link
Contributor

Go ahead and run at least your added test cases. I'll have a closer look once you've got it passing consistently.

{}

void get_rand_off_len(uint64_t &rand_offset, uint32_t &rand_length, uint32_t max_len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return a pair.

if (r == 0) {
// ok
} else if (r == -EINVAL) {
// probably this is not manifest object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have enough information to validate this return value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an explanation.

} else if (r == -EINVAL) {
// probably this is not manifest object
} else if (r == -ENOENT) {
// may have raced with a remove?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, don't we have state that should exclude racing removes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a specific condition to check ENOENT.

} else if (r == -EBUSY) {
// could fail if snap is not oldest
} else if (r == -ENOENT) {
// could fail if obj is removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a specific condition to check ENOENT.

if (snap == -1) {
ChunkDesc info {tgt_offset, length, oid_tgt};
context->update_object_chunk_target(oid, offset, info);
context->update_object_version(oid, comp->get_version64());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually only valid for HEAD? Wouldn't we also need to update chunk_target for a clone if we call SET_CHUNK on a clone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: how does chunk_target relate to FLUSH? We don't appear to populate it in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChunkDesc is only for SetChunkReadOp to check whether chunk entries in chunk_map can be read correctly. Also, SetChunkReadOp is only used in case the object is head (/qa/suites/rados/thrash/workloads/set-chunks-read.yaml).
Probably, deleting SetCunkReadOp is the right way to address your comment?

rollback_to);
if (!rollback_to->obs.oi.has_manifest()) {
// rollback_to is not manifest object
tier_mode_result = cache_result_t::NOOP;
Copy link
Contributor

@athanatos athanatos Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to move this check into maybe_handle_manifest_detail. The only other caller (do_op via maybe_handle_manifest) does the same check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@athanatos athanatos Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't update the call site in do_op.

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
current inc_refcount_by_set only supports a case
where a single entry is updated via SET_CHUNK.
This commit will make existing inc_refcount_by_set to handle
multiple entries.

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Upon rollback, we should update new chunk_map in head.
To do so, all entries in the chunk_map need to be updated
via inc_refcount_by_set().

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
@myoungwon myoungwon changed the title test: reworks for manifest dedup test cases osd, test: reworks for manifest dedup test cases Mar 29, 2021
This commit prevent updating wrong state, which happens when
TierFlush receives error values.

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
@myoungwon myoungwon force-pushed the wip-manifest-dedup-test branch 2 times, most recently from 605956a to 9a8c72f Compare April 1, 2021 03:16
The object updated by the Ops should be set unflushed.

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
If the head is deleted in rollback(), manifest info also needs to be clear.

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
@myoungwon
Copy link
Member Author

@athanatos Rebase is done. Also, I added the last seven commits to fix issues during QA and cleanup code. Here is the test result. Can you take a look?

https://pulpito.ceph.com/myoungwon-2021-04-01_11:20:37-rados-wip-manifest-dedup-test-distro-basic-smithi/
https://pulpito.ceph.com/myoungwon-2021-04-01_14:00:55-rados-wip-manifest-dedup-test-distro-basic-smithi/

@myoungwon
Copy link
Member Author

@athanatos Ping. Please take a look when you are available.

@athanatos
Copy link
Contributor

The commit 'osd: move handling ref. counting to finish_ctx()' doesn't actually add anything to finish_ctx.

@myoungwon
Copy link
Member Author

myoungwon commented Apr 8, 2021

There are update_chunk_map_by_dirty() and dec_refcount_by_diry() in finish_ctx().
The purpose of the commit is to move ref count calculation to there by removing ctx->new_obs.oi.size != 0

@athanatos
Copy link
Contributor

In that case, change the commit message to "osd: remove unnecessary ref handling in _delete_oid" and fix the commit message body.

Let's consider the following case when handling a delete op.
1. Delete --> whiteouted
2. Make clone

In this case, current code clears chunk_map and calls dec_all_manifest_refcount()
in _delete_oid() even if the clone still has the references.

To fix this, This commit remove unnecessary ref handling in _delete_oid, and
makes finish_ctx() to handle ref handling, which can aware of whether the
clone is created or not.

Also, remove oi.size == 0 condition in finish_ctx() to handle ref. counting
upon a delete op with whitedouted clone.

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Upon rollback, we should handle ENOENT case,
so what we should do here is to return NOOP.

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
@myoungwon
Copy link
Member Author

Fixed.

@athanatos athanatos self-requested a review April 9, 2021 19:41
@athanatos athanatos merged commit 055ebe3 into ceph:master Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants