New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: improve handling of Swift's error messages and limits #15369
rgw: improve handling of Swift's error messages and limits #15369
Conversation
src/common/config_opts.h
Outdated
@@ -1438,6 +1438,7 @@ OPTION(rgw_put_obj_min_window_size, OPT_INT, 16 * 1024 * 1024) | |||
OPTION(rgw_put_obj_max_window_size, OPT_INT, 64 * 1024 * 1024) | |||
OPTION(rgw_max_put_size, OPT_U64, 5ULL*1024*1024*1024) | |||
OPTION(rgw_max_put_param_size, OPT_U64, 1 * 1024 * 1024) // max input size for PUT requests accepting json/xml params | |||
OPTION(rgw_max_attrs_num_in_req, OPT_U64, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOCUMENT ME.
7b4df18
to
9b63abf
Compare
Both Tempest and s3-tests haven't found any regression here. |
this looks good to me |
@@ -1526,7 +1526,7 @@ void RGWGetObj::execute() | |||
if (torrent.get_flag()) | |||
{ | |||
torrent.init(s, store); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
if (op_ret == -EFBIG) { | ||
/* Handle the custom error message of exceeding maximum custom attribute | ||
* (stored as xattr) size. */ | ||
const auto error_message = boost::str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, boost::format idiom
The branch has been tested in following Teuthology runs:
Reference point: The
|
jenkins retest this please |
jenkins retest this please ( |
jenkins retest this please |
@@ -1666,7 +1666,10 @@ void RGWInfo_ObjStore_SWIFT::list_swift_data(Formatter& formatter, | |||
|
|||
string ceph_version(CEPH_GIT_NICE_VER); | |||
formatter.dump_string("version", ceph_version); | |||
formatter.dump_int("max_meta_name_length", 81); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This were causing failures in the container-related tests of Swift (e.g. test.functional.test_container:TestContainer.test_POST_bad_metadata
). The patch rectifies that as /info
returns 80
for max_meta_name_length
now.
072ea64
to
9c622f1
Compare
jenkins retest this please (arm64 only) |
Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
Fixes: http://tracker.ceph.com/issues/17935 Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
Fixes: http://tracker.ceph.com/issues/17938 Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
…ocs. Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
Fixes: http://tracker.ceph.com/issues/17936 Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
9c622f1
to
da21be7
Compare
Rebased after a trivial merge conflict. |
With "discover=true", the logic for "rgw_max_attrs_num_in_req" fails on tempest runs. There is a silent limit set by civetweb.h mg_request_info http_headers[64]. The actual max # of metadata items is slightly slower - larger #'s will cause extra http headers to drop - this typically causes swift authentication to fail (in tempest). Setting a sufficiently small number (say, 32) causes different behavior. There's an assertion generated in src/rgw/rgw_rest_swift.cc in handle_metadata_errors when it tries to do this, Ideally, we should have teuthology or unit tests for some of this behavior. |
src/rgw/rgw_rest_swift.cc
Outdated
} else if (op_ret == -E2BIG) { | ||
const auto error_message = boost::str( | ||
boost::format("Too many metadata items; max %lld") | ||
% s->cct->_conf->get_val<int>("rgw_max_attrs_num_in_req")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The here should almost certainly be <size_t> . Better yet would be to use "cct->_conf->rgw_max_attrs_num_in_req". The boost runtime code for get_val generates a lot of runtime code - and for this bug a very hairy stack backtrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, boost::format
& co. isn't cheap. I hoped that having it outside of main execution paths isn't an issue.
I'm not sure whether we could use cct->_conf->rgw_max_attrs_num_in_req
due to the recent changes in the configuration component. I guess it's allowed only for legacy options while rgw_max_attrs_num_in_req
is a new one. Anyway, I will check this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed the updated branch. It looks I introduced the int
bug with the rebase after the new configuration subsystem. Big thanks for pointing it out, Marcus! :-)
cct->_conf->get_val<size_t>("rgw_max_attrs_num_in_req"); | ||
if (rgw_max_attrs_num_in_req && | ||
++valid_meta_count > rgw_max_attrs_num_in_req) { | ||
return -E2BIG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's logic elsewhere in this commit that turns E2BIG into an error message. It would be much better to generate that error message here. That would result in simpler and easier to understand logic with less chance of breakage - less brittle!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid there is a reason for not doing that. IIRC the rgw_get_request_metadata
is a generic routine that is used by both our Swift and S3 implementations while the error message is specific to Swift.
src/common/options.cc
Outdated
@@ -4163,6 +4163,10 @@ std::vector<Option> get_rgw_options() { | |||
.set_default(1 * 1024 * 1024) | |||
.set_description(""), | |||
|
|||
Option("rgw_max_attrs_num_in_req", Option::TYPE_UINT, Option::LEVEL_ADVANCED) | |||
.set_default(0) | |||
.set_description("The maximum number of metadata items that can be put via single request"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where this goes, but there's a max limit on this value that's sized by civetweb.h mg_request_info http_headers[64]. The true limit is "somewhat less" than this number. It would be good to document this number somewhere. It would be good to emit a runtime warning at startup for values near this limit. It would be really good if civetweb failed requests in a more obvious fashion when this limit was exceeded. And it would be great if civetweb didn't have this limit in the first place (but given asio we might not want to bother.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why "default(0)"? ocata swift advertises 90 here (that means changing civetweb tho)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out this CivetWeb's limitation! Documenting it is definitely a very good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why "default(0)"? ocata swift advertises 90 here (that means changing civetweb tho)
I believe that's one of the compromises coming from being a multi-API program. The 0
value means do not enforce as in rgw_get_request_metadata
we have:
if (rgw_max_attrs_num_in_req &&
++valid_meta_count > rgw_max_attrs_num_in_req) {
return -E2BIG;
}
I see 2 factors driving this behavior:
- the need to not break the level of S3 compliance we offer in the default configuration,
- necessity to have the policy coherent between all APIs similarly to ACLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's not true we have a '0 = do not enforce' limit. What we currently have is a default "number slightly larger than 60 causes weird bad behavior" limit. Having a common default that causes non-weird deterministic behavior in s3 would still be an improvement over what we offer today.
If it's really the case that s3 doesn't expect a limit here than perhaps this should be a swift-only limit. And if we don't want to have any limit as a default, then the radosgw install directions need to call that out as something people caring about swift standards must set. But before any of this makes sense, we'd have to fix the civetweb restriction which requires we have a limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have many different limits across the layers. Passing a check on particular layer doesn't guarantee success elsewhere.
I perceive CivetWeb as a component located on the front-end layer. As it can't deal properly with requests carrying huge number of metadata headers, it should at least properly error there (maybe by sending a 5xx
?). I would like to avoid introducing a logic that crosses the layer boundary.
src/rgw/rgw_rest_swift.cc
Outdated
- strlen(RGW_ATTR_PREFIX RGW_AMZ_META_PREFIX); | ||
formatter.dump_int("max_meta_name_length", meta_name_limit); | ||
|
||
const size_t meta_value_limit = g_conf->osd_max_attr_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again not sure where to put this - but this really should have a default value (ocata swift advertises 256). Since this is an "osd" value and applies to all rados attributes, setting this to small values causes other interesting bits of rados to start breaking (and fixing that means restarting all osds). From an operational standpoint for rgw, I think the only sane way to set this is as an rgw specific value and have it set to a separate possibly lower value than the osd. So perhaps this should just be its own separate configuration item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting this to small values causes other interesting bits of rados to start breaking (and fixing that means restarting all osds).
The usage pattern of osd_max_attr_size
in RadosGW assumed setting the option only in the RGW-dedicated section of ceph.conf
. Does the mentioned problem occur then? If so, we might have an issue in OSDs as a misbehaving RADOS client shouldn't cause the need to restart it, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I move this to the rgw only section I get more desirable behavior yes.
Note: naive people looking at an option that starts with "osd_" are likely to think it matters to the osd. And, oh it does, but not in a way that's useful to swift.
I'd prefer to see this be its own independent configuration name starting with "rgw_".
According to the s3 documentation, http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata , there is also a s3 metadata size limit. I don't think we want to enforce that limit, but it appears to me if we advertise a default 2K per-meta value limit, we can please the tempest tests, provide a perfectly defensible swift size limit, and an equally defensible s3 experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Renaming the configurables to rgw_
.
Fixes: http://tracker.ceph.com/issues/17934 Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
da21be7
to
21ad80a
Compare
The bug that is fixed in this patch has been responsible for failing the Tempest's test_delete_non_empty_container test case. The investigation has been made by: Marcus Watts <mwatts@redhat.com>. Fixes: http://tracker.ceph.com/issues/21169 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
70bc1b1
to
308c8d3
Compare
@rzarzynski can you schedule an RGW suite run on this, as well? |
@mattbenjamin: sure, I will schedule on https://github.com/rzarzynski/ceph/tree/wip-rgw-swift_api_testing which should have all the patches. |
Rescheduled with higher priority: http://pulpito.ceph.com/rzarzynski-2017-08-30_12:11:05-rgw-wip-rgw-swift_api_testing---basic-smithi. |
There is interest in getting this into master/luminous to fix tempest swift tests. If there is a possibility of fixing some of the nits I'll address more above quickly that would be cool, otherwise I guess that better wait for some other clean up pass. |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
@mdw-at-linuxbox: pushed a commit changing the names. It needs testing. Though, I'm still unhappy as |
@rzarzynski @mdw-at-linuxbox I don't quite follow the meaning of "conversion to RADOS"; changing civetweb header limits seems less useful now, I'd rather focus on polishing the ASIO frontend? normalizing attrs with OSD settings is problematic in general, so I think there are no open issues for for this PR; as regards testing, does this need another suite run? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm (updated for unique var name requested by marcus, checked by Yehuda, passed teuthology)
These patches improve Swift's error handling and bring support for several limits (mandatory in recent Tempest versions) which are enumerated in the
/info
. The number of changes have been minimised for the sake of backportability. It's expected they will be later (most likely after Luminous) supplemented with broader rework of the error handling infrastructure in RadosGW.Fixes:
Best works with: #15273, #12704.
Derogates: #12109, #12089.