New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crimson/osd: put snapmapper's key-value pairs into dedicated objs #52267
Conversation
src/crimson/osd/pg.cc
Outdated
assert(coll_ref); | ||
t.touch(coll_ref->get_cid(), make_snapmapper_oid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this section not part of OSD::start()
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also check for existence and add logs if possible.
// make sure snap mapper object exists
if (!store->exists(coll_ref, make_snapmapper_oid())) {
logger().debug("init creating/touching snapmapper object");
..
t.touch(coll_ref->get_cid(), make_snapmapper_oid());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, in crimson, we previously store snap mapper in pgmeta objects which are pg specific, I made the snap mapper objects pg specific, too. I think you are suggesting that we make the snapmapper object shared across the whole OSD like what classic OSD is doing. Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. The reason classic places all the snap trimmer information in a single object is that otherwise we'd need to read and split/merge the objects during pg split/merge. This would require IO linear in the number of snapshots, and split/merge aren't otherwise dependent on the size of the PGs (for good reason -- they must occur during the map transition and block IO/peering on those PGs). Crimson will have different problems for split/merge -- specifically that seastore will want to maintain per-shard metadata trees, so we won't be able to merge PGs easily on different shards or split into a different shard. For now, I'd actually suggest a single OSD-wide object maintained by shard 0 (pgs on other shards will have to proxy IO to that shard). If this becomes a bottleneck, we can try something else later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. The reason classic places all the snap trimmer information in a single object is that otherwise we'd need to read and split/merge the objects during pg split/merge. This would require IO linear in the number of snapshots, and split/merge aren't otherwise dependent on the size of the PGs (for good reason -- they must occur during the map transition and block IO/peering on those PGs). Crimson will have different problems for split/merge -- specifically that seastore will want to maintain per-shard metadata trees, so we won't be able to merge PGs easily on different shards or split into a different shard. For now, I'd actually suggest a single OSD-wide object maintained by shard 0 (pgs on other shards will have to proxy IO to that shard). If this becomes a bottleneck, we can try something else later.
@athanatos Hi, sam. I think the modification to the OSD-wide object and other modifications have to be in the same transaction, am I right? If so, it looks to me that we need to synchronize seastore transactions across multiple shards if they all modify the OSD-wide object, am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I forgot that the sharded objectstore abstraction doesn't allow transactions that span shards. A per-pg object seems fine, we'll figure out pg split/merge later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I forgot that the sharded objectstore abstraction doesn't allow transactions that span shards. A per-pg object seems fine, we'll figure out pg split/merge later.
Done:-)
src/crimson/osd/pg.h
Outdated
ghobject_t make_snapmapper_oid() { | ||
return ghobject_t(hobject_t( | ||
sobject_t( | ||
object_t("snapmapper"), | ||
0))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this to crimson/osd/osd_meta.cc
where we already create the osdmap/superblock/final_pool_info oids.
This will require introducing void OSDMeta::create_snapmapper
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are no longer using OSD-wide snapmapper object, this may not be necessary, am I right?
Can you please open a tracker so we could easily backport this? |
Yeah, will do. |
666ec48
to
3deab19
Compare
jenkins test make check |
1 similar comment
jenkins test make check |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this PR is breaking our tests:
With this PR:
https://pulpito.ceph.com/matan-2023-08-24_09:25:19-crimson-rados-wip-matanb-crimson-testing-24.8-v2-distro-crimson-smithi/
Looking into it:-) |
@Matan-B It seems that qa-proxy.ceph.com doesn't work anymore, is there any other way to get the log? Thanks:-) |
It's back :) The traceback from osds:
|
I got "403 Forbidden" when accessing qa-proxy.ceph.com; what should I do? Thanks:-) |
Same for me, you can join the slack sepia channel for updates. |
3deab19
to
f5f718e
Compare
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
@Matan-B @athanatos Could you take a final look at this PR, please? Thansk:-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, two nits from earlier discussions:
@Matan-B Hi, I've just fixed our testbed, and it seems that there's still problem with this PR, I'm looking into it. Sorry for this inconvenience. |
@Matan-B @athanatos this PR has passed our teuthology test, here's the results: |
Pushed with wip-sjust-crimson-testing-2024-02-19 |
Please ignore this, I was testing the wrong build. |
@xxhdx1985126 Gotcha, the test run above may nonetheless be sufficient to merge once it completes. |
Tests generally look ok although this looks odd:
Haven't seen that before, I'll test main for comparison. @xxhdx1985126 Can you run these unit tests locally if you have this build? Thanks! https://pulpito.ceph.com/matan-2024-02-20_08:26:01-crimson-rados-main-distro-crimson-smithi/ |
Yeah, will try. |
main looks clean, what were the local results? |
I have the same error, will look into it. |
It seems that the |
@xxhdx1985126 Right, librados users should only ever see librados objects, OSD internal objects shouldn't be exposed. |
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
a91c509
to
3c808f3
Compare
@Matan-B @athanatos |
3c808f3
to
32408e9
Compare
src/crimson/osd/ops_executer.cc
Outdated
-> PG::interruptible_future<std::tuple<std::vector<hobject_t>, hobject_t>> { | ||
auto& [objects, next] = ret; | ||
auto is_snapmapper = [&pg](const hobject_t &obj) { | ||
if (obj == pg.make_snapmapper_oid().hobj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding pg ref parameter we can introduce static bool PG::is_snapmapper_oid()
.
static inline bool is_snapmapper_oid(const hobject_t &obj) {
return (obj.hobj.oid.name == "snapmapper");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will schedule test after make check passes
Otherwise, PG::read_log_and_missing() will meet those key-values and won't know what to do with them. This is modeling what the classic osd is doing Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
81c45ac
to
d0cfc58
Compare
src/crimson/osd/ops_executer.cc
Outdated
@@ -1443,7 +1459,8 @@ static PG::interruptible_future<> do_pgls_filtered( | |||
lower_bound, | |||
nspace, | |||
osd_op.op.pgls.count, | |||
filter.get()) | |||
filter.get(), | |||
pg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, sorry, I thought it wouldn't need a recompilation...
src/crimson/osd/ops_executer.cc
Outdated
@@ -1399,7 +1414,8 @@ static PG::interruptible_future<> do_pgls( | |||
lower_bound, | |||
nspace, | |||
osd_op.op.pgls.count, | |||
nullptr /* no filter */) | |||
nullptr /* no filter */, | |||
pg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
d0cfc58
to
8f7921e
Compare
Otherwise, PG::read_log_and_missing() will meet those key-values and won't know what to do with them. This is modeling what the classic osd is doing
Fixes: https://tracker.ceph.com/issues/61875
Signed-off-by: Xuehan Xu xuxuehan@qianxin.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows