New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw multisite: replicate metadata for iam roles #43597
rgw multisite: replicate metadata for iam roles #43597
Conversation
@cbodley , @adamemerson @mattbenjamin : I have moved the RGWRoleMetdataHandler class to rgw_sal_rados.cc (look at the last commit by me), and it now invokes RadosRole interfaces. There are lots of things missing, but before I proceed further, I wanted feedback on the following:
Or, if we do not want RoleMetadataHandler to directly invoke RadosRoleinterfaces, then we can also do one of the following:
What is the preferred way forward, the one that I doing now or one of the two above? |
@pritha-srivastava ideally, we would want the |
@cbodley : I will look into it. |
ab7211b
to
605ad3e
Compare
@cbodley : I moved RGWRoleMetadataHandler back to rgw_role.cc and rgw_role.h and it uses a pointer to Store to make calls to RGWRole methods. The Store pointer is passed in to RGWRados::init_ctl, which is needed for instantiation of RGWRoleMetadatahandler pointer. I am planning to remove RGWRoleMetadataHandler:: init() method, as RadosRole already is aware of zone information. I also plan to do away with RGWRoleMetadataHandler::do_start() and RGWSI_Role_Module from rgw_role.cc as they are related to rgw services interfaces. I have wired up RGWRoleMetadataHandler in RGWCtl::init() in rgw_service.cc and calls the attach() method to register itself with the metadata manager. Does all of this look reasonable? Please let me know. P.S.: I still have work to do related to object tracker and other things in RGWRoleMetadataHandler methods, and other clean up work. Why do we want RGWRoleMetadataHandler to make calls to RGWRole? Are we making RGWRoleMetadataHandler generic for other backends too? |
wonderful, yes!
exactly right. i'm pretty sure that was the original intent of all that 'metadata backend' stuff. i would be thrilled to see zipper's store abstraction replace all of that eventually |
605ad3e
to
2cff0d6
Compare
I have added code for handling version tracker, attrs and mtime. It looks complete to me (may need another round of refactoring). I need to test it now.
Even with RGWRoleMetadataHandler, only the Store* type matters - for other store types, if the appropriate pointer is passed in, I believe everything will work as is (provided what I have done now works!) |
2cff0d6
to
6edb113
Compare
The first problem that I saw with these changes is that while bringing up the clusters using MON=1 OSD=1 MDS=0 MGR=0 ../src/test/rgw/test-rgw-multisite.sh 2, the log file in c1 contains a segmentation fault, at: When I check the code of RGWMetadataHandler_GenericMetaBE::list_keys_init, it uses RGWSI_MetaBackend_Handler, I will have to restore the init method (atleast) that I had removed and then re-try. |
6edb113
to
a3ae354
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
@cbodley : At this stage, the role metadata is getting replicated. I have tested using the following: ./bin/radosgw-admin role get --role-name=S3Access1 -c run/c2/ceph.conf I had to re-wire svc_rados_role back, although I have stripped it off methods related to role creation, deletion and retrieval. It only has methods related to getting be_handler() etc. I have also combined svc_role_rados and svc_role into one since there isn't a need for the two to exist separately. I still need to check for completeness of code and some cleanup work. But if you could just take a look at what I have done so far and give me feedback, it will be helpful. Also for comparison of call stack between user create and role create (with these changes): user create: role create: |
3f0eb01
to
eca6322
Compare
src/rgw/rgw_role.h
Outdated
RGWRole(std::string id) : id(std::move(id)) {} | ||
|
||
virtual ~RGWRole() = default; | ||
virtual ~RGWRoleInfo() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like we've removed all the other virtual functions from RGWRoleInfo
, so its destructor shouldn't be virtual
src/rgw/rgw_role.h
Outdated
RGWRole(std::string id); | ||
|
||
RGWRole(RGWRoleInfo& info) : info(info) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single-argument constructors should be made explicit. we especially don't want strings to implicitly convert to RGWRole
src/rgw/rgw_service.cc
Outdated
@@ -34,6 +35,8 @@ | |||
#include "rgw_metadata.h" | |||
#include "rgw_otp.h" | |||
#include "rgw_user.h" | |||
#include "rgw_role.h" | |||
#include "rgw_sal_rados.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does rgw_service.cc
really depend on rgw_sal_rados.h
? i don't see where
src/rgw/rgw_service.h
Outdated
@@ -75,6 +75,8 @@ class RGWSI_SysObj_Cache; | |||
class RGWSI_User; | |||
class RGWSI_User_RADOS; | |||
class RGWDataChangesLog; | |||
class RGWSI_Role; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RGWSI_Role
no longer exists, correct?
src/rgw/services/svc_role_rados.cc
Outdated
#if 0 | ||
class PutRole | ||
{ | ||
RGWSI_Role_RADOS* svc_role; | ||
RGWSI_Role_RADOS::Svc *svc; | ||
RGWSI_MetaBackend::Context *ctx; | ||
rgw::sal::RGWRole& info; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of dead code in this file. is the intent to remove this #if 0
block before merge?
If I understand what I'm looking at, this is a HUGE win. Matt |
the admin rest APIs in for testing, one of the metadata sync test cases in can you think of an easy way to test that AssumeRole works on a secondary zone, based on a role that was synced from the primary? |
Ok, I will look into this and come up with a way to test AssumeRole. |
@cbodley : why is zonegroup id (self zonegroup from secondary) sent to master in every forward_request_to_master() and why is uid sent as a system parameter, and whether role id/ name also needs to be sent as a system parameter? |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
jenkins retest this please |
@cbodley : ceph API tests and ceph Windows test are failing for no known reason. The PR checklist is also failing despite the tracker issue being referenced and the test added checkbox being ticked. Otherwise the PR is ready to be merged. |
jenkins test api |
cleared needs-qa as it has passed in teuthology |
we don't see any reason why the api tests are failing, and, they are failing with "no error"; |
@mattbenjamin I see it's passing now. I checked just out of curiosity and the error was (which I assume is unrelated to this code):
My impression is that in some Jenkins job we are installing Ceph python libraries (python-common aka ceph, ...) to site-packages dir (instead of using those from the Ceph repo path), and not cleaning afterwards, and that's causing conflicts with different versions (this is another example). @djgalloway does this ring a bell? |
@cbodley : thanks for merging this! |
Fixes: https://tracker.ceph.com/issues/51068
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox