New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: implement get/put object tags for S3 #13753

Merged
merged 5 commits into from Jun 26, 2017

Conversation

@theanalyst
Member

theanalyst commented Mar 2, 2017

A very initial version of AWS S3's get/put object tagging,
for early feedback.

Tags are stored as an xattr under the user.rgw.tags key,
and are restricted to max 10 tags per
object and key size of 128 and a val size of 256. The API does not reuse
the existing x-amz-meta tags (similar to amazon's behaviour).

Some details on the API:
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html

TODO

  • Implement in POST Object API
  • Reraise Invalid Tag errors when limits are exceeded (instead of a malformed xml)
  • Support tag-count in the HEAD Object API
  • Support tag-count in GET Object API
  • Support x-amz-tagging in PUT Object API
  • Implement DELETE Object Tagging
  • Add object tags in the elastic search plugin (waiting on mdsearch part 2 to be merged- done)
  • Add S3Tests for the feature (probably non trivial, boto2 has no support yet, boto3 test eg: https://github.com/theanalyst/s3tags ) ceph/s3-tests#159
  • refactor get_params (of this and a few others to probably reuse) #13815
  • Handle Tags in COPY api? (automatically handled)
  • Handle ECANCELD (Not sure what S3's behaviour is here)
  • Use Boost::split
  • Add tagging to PolicyEngine - Basic support added,
    A future improvement would be an AWS like tagging based filtering policy
@theanalyst

This comment has been minimized.

Member

theanalyst commented Mar 2, 2017

test this please

@chardan

Looks like good work Abhi-- a few suggestions for you to consider.

attrs[RGW_ATTR_TAGS] = tags_bl;
op_ret = store->set_attrs(s->obj_ctx, obj, attrs, NULL);
// handle ECANCELD

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

This comment looks like there still needs to be code written for ECANCELD?
The pointers "obj_tags" and "tagging" look like they're observing-only, so you're good on those (ditto if
an exception is thrown).

This comment has been minimized.

@theanalyst

theanalyst Mar 7, 2017

Member

Yeah still a todo item to handle ECANCELED, amazon said they return an error code when there is a conflict, just need to return the same one here (we already return a 400 I guess)

op_ret = -ERR_MALFORMED_XML;
return;
}
tagging = static_cast<RGWObjTagging_S3 *>(parser.find_first("Tagging"));

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

Are these guaranteed to be found?

This comment has been minimized.

@theanalyst

theanalyst Mar 7, 2017

Member

I expect parser.parse to error out earlier

void RGWPutObjTags::execute()
{
RGWObjTagSet_S3 *obj_tags;

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

These could be moved closer to where they are used, but are also fine here.

protected:
bufferlist tags_bl;
public:
RGWGetObjTags() {}

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

If these are just going to be default, you can either not declare them or explicitly say you want the default versions, eg.: RGWGetObjTags() = default;

This comment has been minimized.

@theanalyst

theanalyst Mar 7, 2017

Member

ack, will do

void pre_exec();
virtual void send_response_data(bufferlist& bl) = 0;
virtual const string name() { return "get_obj_tags"; }

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

Don't forget to declare such member functions const noexcept:
virtual const string name() const noexcept { return ... }

This comment has been minimized.

@theanalyst
while (tagentry) {
const string& key = tagentry->get_key();
const string& val = tagentry->get_val();
// TODO: this is wrong, we should probably do a rebuild and return different types of errors here

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

Don't forget to do this! :-)

XMLObj *RGWObjTagsXMLParser::alloc_obj(const char *el){
XMLObj* obj = nullptr;
if(strcmp(el,"Tagging") == 0) {

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

What you've written here is fine, but I /personally/ would prefer to not have any of the else clauses if we're just doing this sort of checking. (But, I have a pet tick when it comes to extra else-en.)

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

@chardan Surely the plurals are iffen and elsem? C lacks a then keyword so we need not use its plural of theen. Multiple do loops are of course loops doreaux.

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

@adamemerson Yeah, it depends on how the branchem feel about the wetherum that day. ;-)

#include "rgw_tag.h"
#include "rgw_xml.h"
using namespace std;

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

Don't do this in header files-- it pollutes the global namespace of anything that includes it.

class RGWObjTagKey_S3: public XMLObj
{
public:
RGWObjTagKey_S3() {};

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

Same comment as before on this (plus, in this case, things will get pleasantly concise! :>).

This comment has been minimized.

@theanalyst
string val;
public:
RGWObjTagEntry_S3() {}
RGWObjTagEntry_S3(string k,string v):key(k),val(v) {};

This comment has been minimized.

@chardan

chardan Mar 7, 2017

Contributor

I suggest taking these by reference to avoid a potential extra copy (eg. const string& k, etc.)

This comment has been minimized.

@theanalyst

@mattbenjamin mattbenjamin added the rgw label Mar 9, 2017

@mattbenjamin

I really like the look of this so far; I signalled to Adam Emerson to have a look; it may be possible to add the tag interactions to upcoming policy work, which would unlock the AWS use cases

~RGWPutObjTags_ObjStore_S3() {}
void send_response() override;
// void execute () override {};

This comment has been minimized.

@mattbenjamin

mattbenjamin Mar 9, 2017

Contributor

@theanalyst commented out?

This comment has been minimized.

@theanalyst

theanalyst Mar 9, 2017

Member

well RGWOp::execute() already does this, I'll remove this, I'll move the get_params method off rgw_rest and move it to rgw_rest_s3 where it'd belong (see previous comment)

#include <common/errno.h>
#include "rgw_tag.h"
#define MAX_OBJ_TAGS 10

This comment has been minimized.

@mattbenjamin

mattbenjamin Mar 9, 2017

Contributor

@theanalyst AWS only allows 10 tags; I wonder if RGW could support ... 11?

@mattbenjamin mattbenjamin requested a review from adamemerson Mar 9, 2017

@theanalyst

This comment has been minimized.

Member

theanalyst commented Mar 17, 2017

changelog:

  • rebased against the override changes
  • added support in PUT object api
@theanalyst

This comment has been minimized.

Member

theanalyst commented Mar 21, 2017

Added support for PUT/GET objects with x-amz-tagging and count similar to s3, POST is still pending @mattbenjamin @adamemerson @chardan would you mind taking a look for comments/improvements etc.

op_ret = get_obj_attrs(store, s, obj, attrs);
auto tags = attrs.find(RGW_ATTR_TAGS);
if(tags == attrs.end())
return;

This comment has been minimized.

@theanalyst

theanalyst Mar 23, 2017

Member

oops this is wrong, getting this!

@theanalyst theanalyst requested a review from cbodley Mar 24, 2017

@oritwas oritwas referenced this pull request Apr 2, 2017

Merged

rgw: usage #14287

@@ -1236,6 +1237,20 @@ static int parse_list_of_flags(struct rgw_name_to_flag *mapping,
return 0;
}
static void split(const std::string &s, char delim, std::vector<std::string> &elems) {

This comment has been minimized.

@oritwas

oritwas Apr 3, 2017

Contributor

why not boost::split

@@ -197,21 +197,6 @@ bool RGWLC::if_already_run_today(time_t& start_date)
return false;
}
static std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {

This comment has been minimized.

@oritwas

oritwas Apr 3, 2017

Contributor

I see were the split code came from, I would prefer we replaced it with boost::split

This comment has been minimized.

@theanalyst

theanalyst Apr 3, 2017

Member

ack, will do

This comment has been minimized.

@theanalyst

@oritwas oritwas assigned adamemerson and unassigned cbodley Apr 3, 2017

class RGWObjTags
{
protected:
std::map <std::string, std::string> tags;

This comment has been minimized.

@theanalyst

theanalyst Apr 4, 2017

Member

should we use an unordered_map here? (though with a size of ~10ish elements shouldn't be that much of an advantage)

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

It might be worth using a boost::container::flat_map. Since we're splitting into a vector we could sort it.

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

flat_map seems like a good fit here-- OTOH, it can also be done later. Do we know how we're actually using this, @theanalyst? For example, are there lots of small transactions on "tags", or does it just tend to grow?

This comment has been minimized.

@theanalyst

theanalyst Jun 13, 2017

Member

Well currently we are only consuming the tags in full, but in the future we may end up needing that for checking if a specific key val exists in an object, I have moved to flat_maps locally and it seems working, will push an update

@theanalyst

This comment has been minimized.

Member

theanalyst commented Apr 4, 2017

Adding the fields to elasticsearch needs some thought as to how to index, while the info is already a part of stat obj cr, it is just a matter of deciding what json hierarchy to put it under so as to have a more intuitive index queries

@Liuchang0812

This comment has been minimized.

Contributor

Liuchang0812 commented Apr 20, 2017

👍

@theanalyst theanalyst referenced this pull request Apr 20, 2017

Merged

add tests for tagging functionality #159

9 of 9 tasks complete
@theanalyst

This comment has been minimized.

Member

theanalyst commented Apr 24, 2017

I have added s3 tests for this feature at ceph/s3-tests#159, the only pending item on the list is support for tags in the elastic search plugin, which requires some looking into wrt indexing and such. This is ready for an initial review @mattbenjamin @oritwas @adamemerson

@theanalyst theanalyst changed the title from [WIP] rgw: implement get/put object tags for S3 to rgw: implement get/put object tags for S3 May 19, 2017

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 9, 2017

@adamemerson may I have a review

@adamemerson

This looks good to me. Please add Get/PubObjectTagging ops to the pile of policy op flags and make sure this works with the new permission check helper functions.

{
if (!verify_object_permission(s, RGW_PERM_READ))
return -EACCES;

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

This looks like it conflicts with the new bucket policy stuff, but should be easily adaptable

protected:
bufferlist tags_bl;
bool has_tags;
public:

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

Just a stylistic nit, but In /general/ I'd like people to start moving toward using default values for class members instead of a single default constructor with an initialization list.

virtual const string name() noexcept override { return "get_obj_tags"; }
virtual uint32_t op_mask() { return RGW_OP_TYPE_READ; }
RGWOpType get_type() override { return RGW_OP_GET_OBJ_TAGGING; }

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

Having both virtual AND override is superfluous.

This comment has been minimized.

@theanalyst

theanalyst Jun 12, 2017

Member

ack, just followed what other get_types in the same file were doing, dropping

auto tag_str = s->info.env->get("HTTP_X_AMZ_TAGGING");
if (tag_str){
obj_tags.reset(new RGWObjTags());
ret = obj_tags->set_from_string(tag_str);

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

We have a ceph::make_unique now, if you want it.

(But you don't have to use it.)

This comment has been minimized.

@theanalyst

theanalyst Jun 12, 2017

Member

sure, killing new things are fun

class RGWObjTags
{
protected:
std::map <std::string, std::string> tags;

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

It might be worth using a boost::container::flat_map. Since we're splitting into a vector we could sort it.

DECODE_FINISH(bl);
}
void dump(Formatter *f) const;

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

it's something I'd like to see loooong term since I'd like to make people use unique_ptr instead of Formatter* and *f is nicer than f.get().

(That and if people wanted to use a shared_ptr in some case the code they call wouldn't have to care.)

#include "rgw_tag_s3.h"
using namespace std;

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

Please don't do that. It's not as bad in .cc files, but opens us up to trouble when we upgrade. It's probably better to do using std::this and using std::that.

This comment has been minimized.

@theanalyst

theanalyst Jun 12, 2017

Member

ack done

return true;
}
bool RGWObjTagSet_S3::xml_end(const char* el){

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

If we don't actually use a parameter (but need it to match a type signature) it's probably best to leave it unnamed so it's explicit.

XMLObj *RGWObjTagsXMLParser::alloc_obj(const char *el){
XMLObj* obj = nullptr;
if(strcmp(el,"Tagging") == 0) {

This comment has been minimized.

@adamemerson

adamemerson Jun 9, 2017

Contributor

@chardan Surely the plurals are iffen and elsem? C lacks a then keyword so we need not use its plural of theen. Multiple do loops are of course loops doreaux.

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 12, 2017

@adamemerson updated with the IAM policy tags, the basic s3tests version (with ceph/s3-tests#159) passes, but I have to write tests using nouns as policy. Also haven't switched to flat_map yet && need to squash the commits

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 12, 2017

needs support in json parser or I'll have a segfaulting rgw
UPDATE - done

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 13, 2017

@adamemerson added the most basic of tests with https://github.com/ceph/s3-tests/pull/159/files#diff-589dd2476d44e54d3806a08cca072b1bR8616 and those seem passing, but those were basic tests just setting public read/write on tags
The more interesting use case is to allow access to object based on tags, http://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html#tagging-and-policies which allows accessing objects based on tags, but this wouldn't work easily with the current implementation which writes the tags as object xattrs, we will need to index objects by tags for that

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 13, 2017

scratch that, I guess ExistingObjectTag may not be that hard to enforce after all, since we need to do a check only after the user requests an object and we can query the object tags then

@adamemerson

This comment has been minimized.

Contributor

adamemerson commented Jun 13, 2017

Sorry, I didn't mean to heap QUITE as much work on you as you seem to have taken on. I was just wanting you to make sure that the Get/Put object tag operations THEMSELVES could have their permissions controlled with policy. (Which just requires adding the operation constants and making sure you call object permission check with them.)

If you /wish/ to control access WITH object tags that is wonderful, but I will be very happy to approve your PR without that.

@@ -1,6 +1,7 @@
#include <string.h>
#include <iostream>
#include <map>
#include <boost/algorithm/string.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

It's a little blurry in the github diff, but is this the same as line 4?

This comment has been minimized.

@theanalyst

theanalyst Jun 13, 2017

Member

hmm artifact from a rebase, thanks for catching

virtual void send_response_data(bufferlist& bl) = 0;
virtual const string name() noexcept override { return "get_obj_tags"; }
virtual uint32_t op_mask() { return RGW_OP_TYPE_READ; }

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

noexcept? (...then again, at some point the decoration starts to feel excessive, doesn't it? ;->)

bufferlist tags_bl;
bool has_tags{false};
public:
~RGWGetObjTags() = default;

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

If it's the default dtor, then why mention it at all? (nitpick)

@@ -953,6 +999,7 @@ class RGWPutObj : public RGWOp {
copy_source_range_fst(0),
copy_source_range_lst(0),
chunked_upload(0),
obj_tags(nullptr),

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

unique_ptr<>'s ctor will take care of this.

class RGWGetObjTags_ObjStore : public RGWGetObjTags {
public:
RGWGetObjTags_ObjStore() {};
~RGWGetObjTags_ObjStore() {};

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

...as per above, it's already the default dtor.

This comment has been minimized.

@theanalyst

theanalyst Jun 13, 2017

Member

yeah I guess this should be a part of a cleanup of this entire file, which has a very similar pattern, maybe a seperate cleanup pr

if (r < 0)
return r;
auto data_deleter = std::unique_ptr<char, decltype(free)*>{data, free};

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

Nice use of unique_ptr<>'s deleter!

@@ -3030,7 +3197,10 @@ RGWOp *RGWHandler_REST_Obj_S3::op_put()
{
if (is_acl_op()) {
return new RGWPutACLs_ObjStore_S3;
} else if (is_tagging_op()) {

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

I know this is the pattern in the surrounding code, but if we're returning, what is "else" accomplishing?

This comment has been minimized.

@adamemerson

adamemerson Jun 14, 2017

Contributor

For this sort of thing where we're following the style of a file to clean it up in one swell foop PR, just to avoid having a file that isn't internally consistent

(Some people might object to that on backport concerns. Maybe wait until some OTHER breaking change would make backports difficult anyway.)

Though all the permission checking now might be a good excuse to do a cleanup like that in rgw_op.cc

{
bufferlist tags_bl;
public:
RGWGetObjTags_ObjStore_S3() {}

This comment has been minimized.

@chardan

chardan Jun 13, 2017

Contributor

You know the drill...

@chardan

This comment has been minimized.

Contributor

chardan commented Jun 13, 2017

This is looking good to me-- hard to comment on any of the API interactions, I haven't used this Amazon service.

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 14, 2017

Rebased again after AWSv4 patch went in, hopefully the final version for now (fingers crossed :) @adamemerson review when you have the time

@adamemerson

This comment has been minimized.

Contributor

adamemerson commented Jun 14, 2017

Lampreys Glut Themselves Messily

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 16, 2017

@adamemerson scheduled a teuthology run at http://pulpito.ceph.com/abhi-2017-06-15_08:34:10-rgw-wip-rgw-tagging---basic-smithi/, initially the s3test failures were due to ceph/s3-tests#166, swift failures look related to #15711

rerun of the suites saw the s3 tests pass http://pulpito.ceph.com/abhi-2017-06-15_20:51:15-rgw-wip-rgw-tagging---basic-smithi/, however brought out the valgrind errors that were seen in other earlier runs, for eg: http://pulpito.ceph.com/cbodley-2017-06-14_14:27:41-rgw-wip-cbodley-testing-distro-basic-mira/ (cc @cbodley )

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 16, 2017

s3 failures might be fixed by #15735

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 16, 2017

rescheduled the suites

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 19, 2017

@adamemerson Now that we do have a 1/3 QE suite passing let me know if anything else needs to be done to get this in

theanalyst added some commits Jun 21, 2017

rgw: implement get/put object tags for S3
- New api for get/put/delete object tags
- PUT Object supports x-amz-tagging
- POST API supports x-amz-tagging

This is similar to AWS S3's get/put object tags, tags are stored as an
xattr under the user.rgw.tags key, and are restricted to max 10 tags per
object and key size of 128 and a val size of 256. The API does not reuse
the existing `x-amz-meta` tags.
For PUT obj to optionally accept `x-amz-tagging` and this will end up
storing the tags under x-amz-tagging xattr, the input is expected to be
in the url encoded kv format, we return InvalidTagError if we cannot
parse the tags

A future TODO: use the same xml parser for put and post apis

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>

 Conflicts:
   src/rgw/rgw_rest_s3.cc
   src/rgw/rgw_tag.cc
   mostly trivial conflicts from AWSv4 rework, importantly url_decode's
   signature was changed for the nicer, and moved rgw_tag.cc to the
   newer url_decode format

 Conflicts:
  src/rgw/rgw_common.h
  conflict as another PR that used up the error code before this was
  merged
rgw es: add support for obj tags
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
(cherry picked from commit 7bff49b)
rgw: support IAM policies for object tagging
A very basic support for the s3:{get/put/delete}objecttagging and
related versions for object versioning

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
rgw_tag: move from std::map -> boost::flat_map
well since flat_map is a vector under the hood it should be faster for
such small sizes, also renaming tags to a more understandable tag_map.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
rgw: add tagging to AWSv4 completion
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 21, 2017

@adamemerson had to do a trivial rebase again as there was another patch which used up the error code in rgw_common.h which created a merge conflict, rebased

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 22, 2017

jenkins test this please (Error: No package found /usr/share/selinux/devel/policyhelp)

@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 22, 2017

jenkins test this please

@theanalyst theanalyst requested a review from rzarzynski Jun 22, 2017

@cbodley

This comment has been minimized.

Contributor

cbodley commented Jun 23, 2017

jenkins test this please

1 similar comment
@theanalyst

This comment has been minimized.

Member

theanalyst commented Jun 23, 2017

jenkins test this please

@cbodley cbodley merged commit 5977372 into ceph:master Jun 26, 2017

4 of 5 checks passed

arm64 make check arm64 make check started
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
default Build finished.
Details
make check make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment