Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: refcount for manifest object (redirect, chunked) #19935

Merged
merged 8 commits into from Apr 15, 2018

Conversation

myoungwon
Copy link
Member

@myoungwon myoungwon commented Jan 12, 2018

This PR introduces refcount operations for redirect and chunked object.
(please refer to #19294, #15482)

Signed-off-by: Myoungwon Oh omwmw@sk.com

@myoungwon myoungwon force-pushed the wip-manifest-ref-count branch 2 times, most recently from d5e5e56 to 6a36122 Compare January 15, 2018 11:29
@myoungwon myoungwon force-pushed the wip-manifest-ref-count branch 3 times, most recently from e18f958 to 6be622a Compare February 22, 2018 16:19
@myoungwon
Copy link
Member Author

@liewegas In the previous discussion (#15482), we has discussed the following a issue.

  1. how to handle ref cleanup (e.g., when we promote or when we delete the object with the manifest)

This PR shows the concept of refcount for manifest object.
The target object have chuck_refcount{source_oid, pool_id, source_offset}
and chunk_refcount is handled as the following example.

  1. set_chunk or set_redirect is sent
  2. increase target object's refcount (add chunk_refcount)
  3. a. send set_chunk or set_redirect to the target object
  4. b. (if old reference exists) decrease target object's refcount first and then, send set_chunk or set_redirect to the target object

The target object which is referenced by other objects will be deleted if its refcount is 0.
Also, the message for decreasing refcount will be sent before the redirected or chunked object is removed,

Can you take a look? Please let me know if I missed something.

@liewegas liewegas self-requested a review February 22, 2018 17:50

struct chunk_obj_refcount {
map<chunk_refcount, bool> refs;
set<chunk_refcount> retired_refs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't seem like retired_refs is serving any purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my mistake. I will fix this.

@@ -125,5 +124,149 @@ struct cls_refcount_read_ret {
};
WRITE_CLASS_ENCODER(cls_refcount_read_ret)

struct chunk_refcount {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would drop this type and just use hobject_t. i don't think we need to know which part of the referring object is pointing at this chunk, and we're missing the hash value here, which would be useful for a space-efficient coarse-grained ref map.

#define CHUNK_REFCOUNT_ATTR "chunk_refcount"

struct chunk_obj_refcount {
map<chunk_refcount, bool> refs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/chunk_refcount/hobject_t/

RefCountCallback *fin = new RefCountCallback(
this, ctx, osd_op, get_last_peering_reset());
refcount_manifest(ctx->obc, target_oloc, target, SnapContext(),
"chunk_get", fin, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want to set ref counts for a vanilla redirect?

object_locator_t target_oloc(oi.manifest.redirect_target);
refcount_manifest(ctx->obc, target_oloc, oi.manifest.redirect_target,
SnapContext(), "chunk_put", NULL, 0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to happen in a context after the local operation commits. otherwise we might crash, failing to commit locally, but still decrement the ref on the chunk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I will fix this as your comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this shouldn't be based on need_reference but instead on just (oi.flags & object_info_t::FLAG_REDIRECT_HAS_REFERENCE)

@myoungwon
Copy link
Member Author

@liewegas We need to discuss two issues.

  1. chunk_refcount or hobject_t ?
    At this point, hobject_t is reasonable. But, what about future direction? I think this depends on the policy. If we need to consider that multiple objects in different pools indicate a single chunk, pool_id is needed. Because the different pools can have a same object id. Also, in the case of source_offset, different offset in a single object can have the same contents. For example, offset 0 and offset 1024 in the object A can have a same chunk. So, we might need source_offset (but, this is only for dedup tier).

  2. Do we really want to set ref counts for a vanilla redirect?
    please look at following example.
    object A -(redirect)-> object X
    object B -(redirect)-> object X
    Do we allow this? In this case, if object A receives a delete message, the message is forwarded to object X. So, I think we need to prevent deleting object X by ref count.

@liewegas
Copy link
Member

I think we might have a more general question to answer: how/when do we know from a given object_info_t manifest that we own references to the target object that we need to increment/decrement. I see a few options:

  1. Decide that chunk manifests always own refs, redirects always do not, or similar. This seems limiting because you might want to dedup on entire objects where a redirect is sufficient. Or, in many/most cases, you might use a redirect without wanting to refcount at all (e.g., maybe the target pool is read-only).
  2. Have a flag in the object_info_t indicating whether we own a ref. This would work for redirect, but for the chunk map, we might own refs for some of them and not all? I forget. This reminds me of my confusion when we had an hobject_t for the target but the target was possibly stale or out of date. Perhaps we want a per-chunk flag in the manifest for whether we own a ref?
  3. Have refcounting (yes or no) be a property of the target pool somehow.

I lean toward 2...

@liewegas
Copy link
Member

Note that hobject_t includes the pool id: https://github.com/ceph/ceph/blob/master/src/common/hobject.h#L49

@myoungwon
Copy link
Member Author

@liewegas I think there are two types of flag.
First, enable/disable counting reference. Second, indicate whether it own a ref (but, refcouting is enabled)
Which one is you intend for?
Anyway, a flag (a per-chunk flag in the case of chunked object) in the object_info_t is reasonable to me.

  1. chunk_refcount or hobject_t ?
    As you mentioned, pool_id already is included in hobject_t. So it doesn't need to be included. What do you think about source_offset?
    struct refcount {hobject_t; uint64_t source_offset} or hobject_t?

@liewegas
Copy link
Member

I think just hobject_t is sufficient

@myoungwon myoungwon force-pushed the wip-manifest-ref-count branch 2 times, most recently from 184d6bd to 2c6011c Compare February 28, 2018 03:04
@myoungwon
Copy link
Member Author

@liewegas I reworked existing commits as your comments. please review it again.

@myoungwon myoungwon requested a review from liewegas March 1, 2018 15:01
#define CHUNK_REFCOUNT_ATTR "chunk_refcount"

struct chunk_obj_refcount {
map<hobject_t, bool> refs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set<hobject_t> ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -4658,6 +4662,7 @@ struct object_info_t {
FLAG_CACHE_PIN = 1<<6, // pin the object in cache tier
FLAG_MANIFEST = 1<<7, // has manifest
FLAG_USES_TMAP = 1<<8, // deprecated; no longer used
FLAG_HAS_REFERENCE = 1<<9, // has reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLAG_REDIRECT_HAS_REFERENCE ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -3303,6 +3304,34 @@ void PrimaryLogPG::do_proxy_chunked_op(OpRequestRef op, const hobject_t& missing
}
}

void PrimaryLogPG::refcount_manifest(ObjectContextRef obc, object_locator_t oloc, hobject_t soid,
SnapContext snapc, string method, Context *cb, uint64_t offset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/string method/bool get/ so we can avoid a string comparison here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
} else {
cls_chunk_refcount_read_ret read_ret;
int ref = read_refcount_manifest(ctx, read_ret);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think we should interfere with delete here. If someone goes and sends delete ops to a rados pool with refcounted objects that's their mistake.

If we're worried about it, perhaps a pool property that disallows delete ops, or caps that only allow read operations and the refcount cls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Fixed.

@myoungwon myoungwon force-pushed the wip-manifest-ref-count branch 2 times, most recently from b372439 to a5517af Compare March 6, 2018 14:03
@myoungwon
Copy link
Member Author

@liewegas I reworked this PR based on your recent comments. Any other comments?

@liewegas
Copy link
Member

(Also, needs rebase!)

@myoungwon myoungwon force-pushed the wip-manifest-ref-count branch 3 times, most recently from f1a0b9b to 2b0f376 Compare March 19, 2018 09:39
@myoungwon
Copy link
Member Author

@liewegas Your suggestion is make sense to me. I address all your comments. Please review again.
And I will make a new PR regarding to unset operation for manifest object.

@myoungwon myoungwon changed the title WIP: osd: refcount for manifest object (redirect, chunked) osd: refcount for manifest object (redirect, chunked) Mar 19, 2018
@myoungwon myoungwon requested a review from liewegas April 2, 2018 11:20
@myoungwon
Copy link
Member Author

@liewegas Ping.

Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@myoungwon
Copy link
Member Author

Rebase is done (I just resolved a conflict).
Test result ( http://pulpito.ceph.com/myoungwon-2018-04-13_04:45:30-rados:thrash-wip-manifest-ref-count-distro-basic-smithi/ ).
Do I need more rados test results for this PR?

@liewegas
Copy link
Member

hmm looks like there's a new conflict? that test result shoudl be fine!

refcount() for chunked object is added (based on source id,
pool id and offset)

Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
…bject is referenced

Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
@myoungwon
Copy link
Member Author

@liewegas A conflict is resolved.

@liewegas liewegas added this to the mimic milestone Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants