New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: fix to allow inc manifest leaked #42302
Conversation
I'm not sure I understand the scenario. client->osd op re submission will detect and drop duplicates. |
I misunderstood why increment operation is applied twice. Please look at the following scenario.
The message sent at 2 and the message sent at 6 have different tid, so OSD can not recognize they are the same. As a result, To resolve this issue, I suppose that there are two options. First is to use update_log_only to detect dup ops, but set_chunk is not write_operations, so I am not sure the following code is a right way.
Second is to make reference count operation idempotent as described in the commits. What do you think? |
In step 5, why doesn't it see the in progress op? |
To check whether it is in progress op, the op should be recorded in either projected_log or recovery state. But, in this case, the first set_chunk op returns EINPROGRESS without updating log. So, check_in_progess_op returns false. Am I wrong? |
Ah, right, so the previous op is in progress (returned -EINPROGRESS and will be completed later), but the repop hasn't been submitted yet and therefore it's not in the projected log. I think the fix is to add a registry of requests in that state and check them in check_in_progress_op as well. |
@athanatos Sorry, I missed one thing---step 4 is triggered by requeue_op(), not issued by the user. So, the scenario is as follows.
As a result, I think adding registry (something like unorded_map<osd_reqid_t, version>) in check_in_progress_op() is not appropriate because the registry should be dropped when on_change is called. What do you think? |
I'll take a closer look at this next week. |
@athanatos ping |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
src/cls/cas/cls_cas_client.h
Outdated
@@ -17,17 +17,20 @@ void cls_cas_chunk_create_or_get_ref( | |||
librados::ObjectWriteOperation& op, | |||
const hobject_t& soid, | |||
const bufferlist& data, | |||
bool verify=false); | |||
bool verify=false, | |||
int count = -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server side asserts that count isn't -1. When is it valid to leave this as the default?
I'm not sure about this approach. The refcount design in general has always explicitly permitted the recorded count in the base pool to be >= the real refcount. I see two ways a replayed operation can go:
Neither case to me obviously breaks the contract. Specifying the source object refcount on submission doesn't fix 2) at all, and so also won't work correctly on 1) either as the recorded base refcount can still be an overestimate. Specifying the absolute refcount for a given source object also requires that the base pool actually track them, which probably won't scale in cases where a base pool object has a genuinely large number of incoming references. As far as I can tell, this behavior is one case of the general refcount leak behavior we've specifically chosen to allow. The remedy is meant to be a background refcount scrubbing process to correct leaked references. Otherwise, we'd need a real intent log, and that would be expensive. Am I missing something? |
Current dedup allow to contain multiple same sources using multiset, which results in inconsistent situation as follow (during set_chunk, but not confined in set_chunk). 1. User issues set_chunk 2. OSD receives the set_chunk, and sends increment message to an object in the low tier (INPROGRESS). 3. OSD map is changed (841 → 843) 3.5. on_change() is called 4. the set_chunk op is reenqueued by requeue_op() 5. OSD handles the duplicated set_chunk, but it is not able to know the set_chunk is duplicated because it does not log on the disk yet. 6. OSD issues increment message again to the object in the low tier. (increment operation is executed twice) To fix this, this commit allows >= the real refcount in test cases fixes: https://tracker.ceph.com/issues/51000 Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
@athanatos Yeah, right, it also requires the overhead to track the reference and the mismatch can be fixed via the background refcount scrubbing process. The previous commits just intend to avoid reference mismatch as possible to save space. Anyway, if so, updated commit can fix to avoid a false alarm. Can you take a look? |
Current dedup allows to contain multiple same sources using
multiset, which results in inconsistent situation as follow
(during set_chunk, but not confined in set_chunk).
Due to 4, inc. op is applied one more time. At this time,
current inc op---this is similar to ++ operation--- increases
the ref. count once again.
To prevent this, inc/dec op needs to be idempotent using
absolute value.
fixes: https://tracker.ceph.com/issues/51000
Signed-off-by: Myoungwon Oh myoungwon.oh@samsung.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox