Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw multisite: replicate metadata for iam roles #43597

Merged
merged 55 commits into from Jun 13, 2022

Conversation

pritha-srivastava
Copy link
Contributor

@pritha-srivastava pritha-srivastava commented Oct 20, 2021

Fixes: https://tracker.ceph.com/issues/51068

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@github-actions github-actions bot added the rgw label Oct 20, 2021
@pritha-srivastava pritha-srivastava marked this pull request as draft October 20, 2021 07:21
@pritha-srivastava
Copy link
Contributor Author

pritha-srivastava commented Oct 20, 2021

@cbodley , @adamemerson @mattbenjamin : I have moved the RGWRoleMetdataHandler class to rgw_sal_rados.cc (look at the last commit by me), and it now invokes RadosRole interfaces. There are lots of things missing, but before I proceed further, I wanted feedback on the following:

  1. whether this is what we intend to do - MetadataHandler directly invoking RadosRole interfaces.
  2. While trying to wire up RGWRoleMetadatHandler further up (to rgw_service), I figured that we need RadosStore instance for this. Is it ok if RGWServices now has an instance or RadosStore in it, so that it can be passed down to RGWRoleMetadataHandler? RadosStore reference can be passed form RGWRados::init_svc, to the init calls of RGW_Services class.

Or, if we do not want RoleMetadataHandler to directly invoke RadosRoleinterfaces, then we can also do one of the following:

  1. Collapse RGWSI_Role and RGWSI_Role_Rados into one (probably RGWSI_Role_Rados), do away with RGWRoleCtl and then have both RadosRole and RGWRoleMetadataHandler invoke RGWSI_Role_Rados interfaces.
  2. or, Collapse RGWSI_Role and RGWSI_Role_Rados into one (RGWSI_Role_Rados), do away with RGWRoleCtl and then have RGWRoleMetadataHandler invoke RGWSI_Role_Rados interfaces, and leave RadosRole as is.

What is the preferred way forward, the one that I doing now or one of the two above?

@cbodley
Copy link
Contributor

cbodley commented Oct 20, 2021

@pritha-srivastava ideally, we would want the RoleMetadataHandler to be using the generic RGWRole type instead of the rados-specific RadosRole. is that possible to do directly?

@pritha-srivastava
Copy link
Contributor Author

@pritha-srivastava ideally, we would want the RoleMetadataHandler to be using the generic RGWRole type instead of the rados-specific RadosRole. is that possible to do directly?

@cbodley : I will look into it.

@pritha-srivastava
Copy link
Contributor Author

pritha-srivastava commented Oct 26, 2021

@pritha-srivastava ideally, we would want the RoleMetadataHandler to be using the generic RGWRole type instead of the rados-specific RadosRole. is that possible to do directly?

@cbodley : I moved RGWRoleMetadataHandler back to rgw_role.cc and rgw_role.h and it uses a pointer to Store to make calls to RGWRole methods. The Store pointer is passed in to RGWRados::init_ctl, which is needed for instantiation of RGWRoleMetadatahandler pointer. I am planning to remove RGWRoleMetadataHandler:: init() method, as RadosRole already is aware of zone information. I also plan to do away with RGWRoleMetadataHandler::do_start() and RGWSI_Role_Module from rgw_role.cc as they are related to rgw services interfaces. I have wired up RGWRoleMetadataHandler in RGWCtl::init() in rgw_service.cc and calls the attach() method to register itself with the metadata manager. Does all of this look reasonable? Please let me know.

P.S.: I still have work to do related to object tracker and other things in RGWRoleMetadataHandler methods, and other clean up work.

Why do we want RGWRoleMetadataHandler to make calls to RGWRole? Are we making RGWRoleMetadataHandler generic for other backends too?

@cbodley
Copy link
Contributor

cbodley commented Oct 27, 2021

Does all of this look reasonable? Please let me know.

wonderful, yes!

Why do we want RGWRoleMetadataHandler to make calls to RGWRole? Are we making RGWRoleMetadataHandler generic for other backends too?

exactly right. i'm pretty sure that was the original intent of all that 'metadata backend' stuff. i would be thrilled to see zipper's store abstraction replace all of that eventually

@pritha-srivastava
Copy link
Contributor Author

Does all of this look reasonable? Please let me know.

wonderful, yes!

I have added code for handling version tracker, attrs and mtime. It looks complete to me (may need another round of refactoring). I need to test it now.

Why do we want RGWRoleMetadataHandler to make calls to RGWRole? Are we making RGWRoleMetadataHandler generic for other backends too?

exactly right. i'm pretty sure that was the original intent of all that 'metadata backend' stuff. i would be thrilled to see zipper's store abstraction replace all of that eventually

Even with RGWRoleMetadataHandler, only the Store* type matters - for other store types, if the appropriate pointer is passed in, I believe everything will work as is (provided what I have done now works!)

@pritha-srivastava
Copy link
Contributor Author

The first problem that I saw with these changes is that while bringing up the clusters using MON=1 OSD=1 MDS=0 MGR=0 ../src/test/rgw/test-rgw-multisite.sh 2, the log file in c1 contains a segmentation fault, at:
RGWMetadataManager::list_keys_init(DoutPrefixProvider const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void**)+0x7a) [0x7fa6b60d3d04]
5: (rgw::sal::RadosStore::meta_list_keys_init(DoutPrefixProvider const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void**)+0x14) [0x7fa6b641a5a6]
6: (RGWOp_Metadata_List::execute(optional_yield)+0xf82) [0x7fa6b5e7853c]

When I check the code of RGWMetadataHandler_GenericMetaBE::list_keys_init, it uses RGWSI_MetaBackend_Handler, I will have to restore the init method (atleast) that I had removed and then re-try.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@pritha-srivastava
Copy link
Contributor Author

@cbodley : At this stage, the role metadata is getting replicated. I have tested using the following:
./bin/radosgw-admin role create --role-name=S3Access1 --path=/application_abc/component_xyz/ --assume-role-policy-doc={"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":["sts:AssumeRole"]}]} -c run/c1/ceph.conf

./bin/radosgw-admin role get --role-name=S3Access1 -c run/c2/ceph.conf
{
"RoleId": "a09ab108-db7a-4f66-a3bb-24723c293392",
"RoleName": "S3Access1",
"Path": "/application_abc/component_xyz/",
"Arn": "arn:aws:iam:::role/application_abc/component_xyz/S3Access1",
"CreateDate": "2021-11-24T08:23:30.385Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument": "{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":["sts:AssumeRole"]}]}"
}

I had to re-wire svc_rados_role back, although I have stripped it off methods related to role creation, deletion and retrieval. It only has methods related to getting be_handler() etc. I have also combined svc_role_rados and svc_role into one since there isn't a need for the two to exist separately. I still need to check for completeness of code and some cleanup work. But if you could just take a look at what I have done so far and give me feedback, it will be helpful.

Also for comparison of call stack between user create and role create (with these changes):

user create:
#0 RGWSI_MetaBackend::put (this=0x555556669030, ctx=, key="t1tenant$TESTER2", params=..., objv_tracker=, y=..., dpp=) at ../src/rgw/services/svc_meta_be.cc:136
#1 0x000055555603ccce in PutOperation::put (this=this@entry=0x7fffffff8b90, dpp=dpp@entry=0x555556497e70 <dpp()::global_dpp>) at ../src/rgw/rgw_basic_types.h:78
#2 0x00005555560395fa in RGWSI_User_RADOS::store_user_info (this=, ctx=, info=..., old_info=, objv_tracker=, mtime=..., exclusive=false, attrs=0x5555567718f0, y=...,
dpp=0x555556497e70 <dpp()::global_dpp>) at ../src/rgw/services/svc_user_rados.cc:380
#3 0x0000555555ed1628 in operator() (op=, __closure=) at ../src/rgw/services/svc_meta_be.h:227
#4 std::__invoke_impl<int, RGWUserCtl::store_info(const DoutPrefixProvider*, const RGWUserInfo&, optional_yield, const RGWUserCtl::PutParams&)::<lambda(RGWSI_MetaBackend_Handler::Op*)>&, RGWSI_MetaBackend_Handler::Op*> (__f=...)
at /usr/include/c++/10/bits/invoke.h:60
#5 std::__invoke_r<int, RGWUserCtl::store_info(const DoutPrefixProvider*, const RGWUserInfo&, optional_yield, const RGWUserCtl::PutParams&)::<lambda(RGWSI_MetaBackend_Handler::Op*)>&, RGWSI_MetaBackend_Handler::Op*> (__fn=...)
at /usr/include/c++/10/bits/invoke.h:113
#6 std::_Function_handler<int(RGWSI_MetaBackend_Handler::Op*), RGWUserCtl::store_info(const DoutPrefixProvider*, const RGWUserInfo&, optional_yield, const RGWUserCtl::PutParams&)::<lambda(RGWSI_MetaBackend_Handler::Op*)> >::_M_invoke(const std::_Any_data &, RGWSI_MetaBackend_Handler::Op &&) (__functor=..., __args#0=) at /usr/include/c++/10/bits/std_function.h:291
#7 0x0000555556020232 in std::function<int (RGWSI_MetaBackend_Handler::Op
)>::operator()(RGWSI_MetaBackend_Handler::Op*) const (this=, __args#0=) at /usr/include/c++/10/bits/std_function.h:248
#8 0x000055555601ed8b in operator() (ctx=0x7fffffff8de0, __closure=0x7fffffff9030) at ../src/rgw/services/svc_meta_be.cc:190
#9 std::__invoke_impl<int, RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int(RGWSI_MetaBackend_Handler::Op*)>)::<lambda(RGWSI_MetaBackend::Context*)>&, RGWSI_MetaBackend::Context*> (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#10 std::__invoke_r<int, RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int(RGWSI_MetaBackend_Handler::Op*)>)::<lambda(RGWSI_MetaBackend::Context*)>&, RGWSI_MetaBackend::Context*> (__fn=...) at /usr/include/c++/10/bits/invoke.h:113
#11 std::_Function_handler<int(RGWSI_MetaBackend::Context*), RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int(RGWSI_MetaBackend_Handler::Op*)>)::<lambda(RGWSI_MetaBackend::Context*)> >::_M_invoke(const std::_Any_data &, RGWSI_MetaBackend::Context &&) (__functor=..., __args#0=) at /usr/include/c++/10/bits/std_function.h:291
#12 0x00005555559f1cf8 in std::function<int (RGWSI_MetaBackend::Context
)>::operator()(RGWSI_MetaBackend::Context*) const (this=, __args#0=, __args#0@entry=0x7fffffff8de0)
at /usr/include/c++/10/bits/std_function.h:248
#13 0x00005555559f140c in RGWSI_MetaBackend_SObj::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend::Context*)>) (this=0x555556669030,
opt=std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj>> [no contained value], f=...) at ../src/rgw/services/svc_meta_be_sobj.cc:109
#14 0x000055555601ebe3 in RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend_Handler::Op*)>) (this=this@entry=0x5555566c90e0,
bectx_params=std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj>> [no contained value], f=...) at ../src/rgw/services/svc_meta_be.cc:187
#15 0x0000555555ed6f17 in RGWSI_MetaBackend_Handler::call(std::function<int (RGWSI_MetaBackend_Handler::Op*)>) (f=..., this=0x5555566c90e0) at /usr/include/c++/10/optional:693
#16 RGWUserCtl::store_info (this=0x5555566c8130, dpp=, info=..., y=..., params=...) at ../src/rgw/rgw_user.cc:2855
#17 0x0000555555e981cf in rgw::sal::RadosUser::store_user (this=, dpp=, y=..., exclusive=, old_info=) at ../src/rgw/rgw_user.h:781
#18 0x0000555555edb031 in RGWUser::update (this=this@entry=0x7fffffffc3d0, dpp=dpp@entry=0x555556497e70 <dpp()::global_dpp>, op_state=..., err_msg=err_msg@entry=0x7fffffff9660, y=...) at ../src/rgw/rgw_user.cc:1611
#19 0x0000555555ee02b9 in RGWUser::execute_add (this=this@entry=0x7fffffffc3d0, dpp=dpp@entry=0x555556497e70 <dpp()::global_dpp>, op_state=..., err_msg=err_msg@entry=0x7fffffff9660, y=...) at ../src/rgw/rgw_user.cc:1859
#20 0x0000555555ee04d1 in RGWUser::add (this=0x7fffffffc3d0, dpp=0x555556497e70 <dpp()::global_dpp>, op_state=..., y=..., err_msg=0x7fffffffa5c0) at ../src/rgw/rgw_user.cc:1881
#21 0x0000555555915410 in main (argc=, a

role create:
#0 RGWSI_MetaBackend::put (this=0x555556666f50, ctx=, key="f5a9b6ae-68e6-498e-8148-4beb78a85971", params=..., objv_tracker=, y=..., dpp=) at ../src/rgw/services/svc_meta_be.cc:136
#1 0x0000555555e9ece8 in rgw::sal::RadosRole::store_info (this=0x5555564c2200, dpp=0x555556497e70 <dpp()::global_dpp>, exclusive=, y=..., addprefix=) at /usr/include/c++/10/bits/unique_ptr.h:421
#2 0x0000555555eb71f3 in rgw::sal::RadosRole::create (this=0x5555564c2200, dpp=, exclusive=, y=..., addprefix=) at ../src/rgw/rgw_sal_rados.cc:3065
#3 0x0000555555916917 in main (argc=, argv=) at ../src/common/async/yield_context.h:40

src/rgw/rgw_role.cc Show resolved Hide resolved
RGWRole(std::string id) : id(std::move(id)) {}

virtual ~RGWRole() = default;
virtual ~RGWRoleInfo() = default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like we've removed all the other virtual functions from RGWRoleInfo, so its destructor shouldn't be virtual

Comment on lines 106 to 108
RGWRole(std::string id);

RGWRole(RGWRoleInfo& info) : info(info) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single-argument constructors should be made explicit. we especially don't want strings to implicitly convert to RGWRole

@@ -34,6 +35,8 @@
#include "rgw_metadata.h"
#include "rgw_otp.h"
#include "rgw_user.h"
#include "rgw_role.h"
#include "rgw_sal_rados.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does rgw_service.cc really depend on rgw_sal_rados.h? i don't see where

@@ -75,6 +75,8 @@ class RGWSI_SysObj_Cache;
class RGWSI_User;
class RGWSI_User_RADOS;
class RGWDataChangesLog;
class RGWSI_Role;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RGWSI_Role no longer exists, correct?

Comment on lines 83 to 89
#if 0
class PutRole
{
RGWSI_Role_RADOS* svc_role;
RGWSI_Role_RADOS::Svc *svc;
RGWSI_MetaBackend::Context *ctx;
rgw::sal::RGWRole& info;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of dead code in this file. is the intent to remove this #if 0 block before merge?

@mattbenjamin
Copy link
Contributor

@cbodley : At this stage, the role metadata is getting replicated. I have tested using the following: ./bin/radosgw-admin role create --role-name=S3Access1 --path=/application_abc/component_xyz/ --assume-role-policy-doc={"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":["sts:AssumeRole"]}]} -c run/c1/ceph.conf

./bin/radosgw-admin role get --role-name=S3Access1 -c run/c2/ceph.conf { "RoleId": "a09ab108-db7a-4f66-a3bb-24723c293392", "RoleName": "S3Access1", "Path": "/application_abc/component_xyz/", "Arn": "arn:aws:iam:::role/application_abc/component_xyz/S3Access1", "CreateDate": "2021-11-24T08:23:30.385Z", "MaxSessionDuration": 3600, "AssumeRolePolicyDocument": "{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":["sts:AssumeRole"]}]}" }

I had to re-wire svc_rados_role back, although I have stripped it off methods related to role creation, deletion and retrieval. It only has methods related to getting be_handler() etc. I have also combined svc_role_rados and svc_role into one since there isn't a need for the two to exist separately. I still need to check for completeness of code and some cleanup work. But if you could just take a look at what I have done so far and give me feedback, it will be helpful.

Also for comparison of call stack between user create and role create (with these changes):

user create: #0 RGWSI_MetaBackend::put (this=0x555556669030, ctx=, key="t1tenant$TESTER2", params=..., objv_tracker=, y=..., dpp=) at ../src/rgw/services/svc_meta_be.cc:136 #1 0x000055555603ccce in PutOperation::put (this=this@entry=0x7fffffff8b90, dpp=dpp@entry=0x555556497e70 <dpp()::global_dpp>) at ../src/rgw/rgw_basic_types.h:78 #2 0x00005555560395fa in RGWSI_User_RADOS::store_user_info (this=, ctx=, info=..., old_info=, objv_tracker=, mtime=..., exclusive=false, attrs=0x5555567718f0, y=..., dpp=0x555556497e70 <dpp()::global_dpp>) at ../src/rgw/services/svc_user_rados.cc:380 #3 0x0000555555ed1628 in operator() (op=, __closure=) at ../src/rgw/services/svc_meta_be.h:227 #4 std::__invoke_impl<int, RGWUserCtl::store_info(const DoutPrefixProvider*, const RGWUserInfo&, optional_yield, const RGWUserCtl::PutParams&)::<lambda(RGWSI_MetaBackend_Handler::Op*)>&, RGWSI_MetaBackend_Handler::Op*> (__f=...) at /usr/include/c++/10/bits/invoke.h:60 #5 std::__invoke_r<int, RGWUserCtl::store_info(const DoutPrefixProvider*, const RGWUserInfo&, optional_yield, const RGWUserCtl::PutParams&)::<lambda(RGWSI_MetaBackend_Handler::Op*)>&, RGWSI_MetaBackend_Handler::Op*> (__fn=...) at /usr/include/c++/10/bits/invoke.h:113 #6 std::_Function_handler<int(RGWSI_MetaBackend_Handler::Op*), RGWUserCtl::store_info(const DoutPrefixProvider*, const RGWUserInfo&, optional_yield, const RGWUserCtl::PutParams&)::<lambda(RGWSI_MetaBackend_Handler::Op*)> >::_M_invoke(const std::_Any_data &, RGWSI_MetaBackend_Handler::Op _&&) (__functor=..., _args#0=) at /usr/include/c++/10/bits/std_function.h:291 #7 0x0000555556020232 in std::function<int (RGWSI_MetaBackend_Handler::Op)>::operator()(RGWSI_MetaBackend_Handler::Op*) const (this=, __args#0=) at /usr/include/c++/10/bits/std_function.h:248 #8 0x000055555601ed8b in operator() (ctx=0x7fffffff8de0, __closure=0x7fffffff9030) at ../src/rgw/services/svc_meta_be.cc:190 #9 std::__invoke_impl<int, RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int(RGWSI_MetaBackend_Handler::Op*)>)::<lambda(RGWSI_MetaBackend::Context*)>&, RGWSI_MetaBackend::Context*> (__f=...) at /usr/include/c++/10/bits/invoke.h:60 #10 std::__invoke_r<int, RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int(RGWSI_MetaBackend_Handler::Op*)>)::<lambda(RGWSI_MetaBackend::Context*)>&, RGWSI_MetaBackend::Context*> (__fn=...) at /usr/include/c++/10/bits/invoke.h:113 #11 std::_Function_handler<int(RGWSI_MetaBackend::Context*), RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int(RGWSI_MetaBackend_Handler::Op*)>)::<lambda(RGWSI_MetaBackend::Context*)> >::_M_invoke(const std::_Any_data &, RGWSI_MetaBackend::Context _&&) (__functor=..., _args#0=) at /usr/include/c++/10/bits/std_function.h:291 #12 0x00005555559f1cf8 in std::function<int (RGWSI_MetaBackend::Context)>::operator()(RGWSI_MetaBackend::Context*) const (this=, __args#0=, __args#0@entry=0x7fffffff8de0) at /usr/include/c++/10/bits/std_function.h:248 #13 0x00005555559f140c in RGWSI_MetaBackend_SObj::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend::Context*)>) (this=0x555556669030, opt=std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj>> [no contained value], f=...) at ../src/rgw/services/svc_meta_be_sobj.cc:109 #14 0x000055555601ebe3 in RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend_Handler::Op*)>) (this=this@entry=0x5555566c90e0, bectx_params=std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj>> [no contained value], f=...) at ../src/rgw/services/svc_meta_be.cc:187 #15 0x0000555555ed6f17 in RGWSI_MetaBackend_Handler::call(std::function<int (RGWSI_MetaBackend_Handler::Op*)>) (f=..., this=0x5555566c90e0) at /usr/include/c++/10/optional:693 #16 RGWUserCtl::store_info (this=0x5555566c8130, dpp=, info=..., y=..., params=...) at ../src/rgw/rgw_user.cc:2855 #17 0x0000555555e981cf in rgw::sal::RadosUser::store_user (this=, dpp=, y=..., exclusive=, old_info=) at ../src/rgw/rgw_user.h:781 #18 0x0000555555edb031 in RGWUser::update (this=this@entry=0x7fffffffc3d0, dpp=dpp@entry=0x555556497e70 <dpp()::global_dpp>, op_state=..., err_msg=err_msg@entry=0x7fffffff9660, y=...) at ../src/rgw/rgw_user.cc:1611 #19 0x0000555555ee02b9 in RGWUser::execute_add (this=this@entry=0x7fffffffc3d0, dpp=dpp@entry=0x555556497e70 <dpp()::global_dpp>, op_state=..., err_msg=err_msg@entry=0x7fffffff9660, y=...) at ../src/rgw/rgw_user.cc:1859 #20 0x0000555555ee04d1 in RGWUser::add (this=0x7fffffffc3d0, dpp=0x555556497e70 <dpp()::global_dpp>, op_state=..., y=..., err_msg=0x7fffffffa5c0) at ../src/rgw/rgw_user.cc:1881 #21 0x0000555555915410 in main (argc=, a

role create: #0 RGWSI_MetaBackend::put (this=0x555556666f50, ctx=, key="f5a9b6ae-68e6-498e-8148-4beb78a85971", params=..., objv_tracker=, y=..., dpp=) at ../src/rgw/services/svc_meta_be.cc:136 #1 0x0000555555e9ece8 in rgw::sal::RadosRole::store_info (this=0x5555564c2200, dpp=0x555556497e70 <dpp()::global_dpp>, exclusive=, y=..., addprefix=) at /usr/include/c++/10/bits/unique_ptr.h:421 #2 0x0000555555eb71f3 in rgw::sal::RadosRole::create (this=0x5555564c2200, dpp=, exclusive=, y=..., addprefix=) at ../src/rgw/rgw_sal_rados.cc:3065 #3 0x0000555555916917 in main (argc=, argv=) at ../src/common/async/yield_context.h:40

If I understand what I'm looking at, this is a HUGE win.

Matt

@cbodley
Copy link
Contributor

cbodley commented Jan 11, 2022

the admin rest APIs in rgw_rest_role.cc should also be modified to call forward_request_to_master() so that changes on a secondary zone get applied to the primary zone first. it looks like rgw_rest_user.cc has a lot of examples you can copy

for testing, one of the metadata sync test cases in src/test/rgw/rgw_multi/tests.py could do a radosgw-admin role create on the primary zone, then (after a call to zonegroup_meta_checkpoint()) check that radosgw-admin role get finds that role on the other zones

can you think of an easy way to test that AssumeRole works on a secondary zone, based on a role that was synced from the primary?

@pritha-srivastava
Copy link
Contributor Author

the admin rest APIs in rgw_rest_role.cc should also be modified to call forward_request_to_master() so that changes on a secondary zone get applied to the primary zone first. it looks like rgw_rest_user.cc has a lot of examples you can copy

for testing, one of the metadata sync test cases in src/test/rgw/rgw_multi/tests.py could do a radosgw-admin role create on the primary zone, then (after a call to zonegroup_meta_checkpoint()) check that radosgw-admin role get finds that role on the other zones

can you think of an easy way to test that AssumeRole works on a secondary zone, based on a role that was synced from the primary?

Ok, I will look into this and come up with a way to test AssumeRole.

@pritha-srivastava
Copy link
Contributor Author

the admin rest APIs in rgw_rest_role.cc should also be modified to call forward_request_to_master() so that changes on a secondary zone get applied to the primary zone first. it looks like rgw_rest_user.cc has a lot of examples you can copy

for testing, one of the metadata sync test cases in src/test/rgw/rgw_multi/tests.py could do a radosgw-admin role create on the primary zone, then (after a call to zonegroup_meta_checkpoint()) check that radosgw-admin role get finds that role on the other zones

can you think of an easy way to test that AssumeRole works on a secondary zone, based on a role that was synced from the primary?

@cbodley : why is zonegroup id (self zonegroup from secondary) sent to master in every forward_request_to_master() and why is uid sent as a system parameter, and whether role id/ name also needs to be sent as a system parameter?

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions github-actions bot added the tests label Mar 25, 2022
@pritha-srivastava
Copy link
Contributor Author

jenkins retest this please

@pritha-srivastava
Copy link
Contributor Author

@cbodley : ceph API tests and ceph Windows test are failing for no known reason. The PR checklist is also failing despite the tracker issue being referenced and the test added checkbox being ticked. Otherwise the PR is ready to be merged.

@cbodley
Copy link
Contributor

cbodley commented Jun 10, 2022

jenkins test api

@mattbenjamin
Copy link
Contributor

cleared needs-qa as it has passed in teuthology

@mattbenjamin
Copy link
Contributor

we don't see any reason why the api tests are failing, and, they are failing with "no error";
@epuertat can you please help us?

@epuertat
Copy link
Member

we don't see any reason why the api tests are failing, and, they are failing with "no error"; @epuertat can you please help us?

@mattbenjamin I see it's passing now. I checked just out of curiosity and the error was (which I assume is unrelated to this code):

2022-06-09T21:49:27.297+0000 7f3a83441500  1 mgr[py] Loading python module 'cephadm'
2022-06-09T21:49:27.441+0000 7f3a83441500 -1 mgr[py] Module not found: 'cephadm'
2022-06-09T21:49:27.441+0000 7f3a83441500 -1 mgr[py] Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/cephadm/__init__.py", line 1, in <module>
    from .module import CephadmOrchestrator
  File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/cephadm/module.py", line 31, in <module>
    from cephadm.serve import CephadmServe
  File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/cephadm/serve.py", line 23, in <module>
    import orchestrator
  File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/orchestrator/__init__.py", line 3, in <module>
    from .module import OrchestratorCli
  File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/orchestrator/module.py", line 12, in <module>
    from ceph.deployment.drive_group import DriveGroupSpec, DeviceSelection, OSDMethod
ImportError: cannot import name 'OSDMethod' from 'ceph.deployment.drive_group' (/usr/lib/python3/dist-packages/ceph/deployment/drive_group.py)

My impression is that in some Jenkins job we are installing Ceph python libraries (python-common aka ceph, ...) to site-packages dir (instead of using those from the Ceph repo path), and not cleaning afterwards, and that's causing conflicts with different versions (this is another example).

@djgalloway does this ring a bell?

@cbodley cbodley merged commit 6f765e2 into ceph:main Jun 13, 2022
@pritha-srivastava
Copy link
Contributor Author

@cbodley : thanks for merging this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants