New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: metadata search part 2 #14351

Merged
merged 71 commits into from Jun 2, 2017

Conversation

Projects
None yet
3 participants
@yehudasa
Member

yehudasa commented Apr 5, 2017

The initial metadata search implementation was just pushing data to elasticsearch (via sync process). This PR adds the ability to use rgw to query elasticsearch, and to also control what data is actually indexed. A few new elasticsearch tier configurables were added:

  • num_shards, num_replicas: controls how many shards and replicas elasticsearch index will be configured with
  • explicit_custom_meta (bool) - whether all custom metadata is indexed, or only user configured one
  • index_buckets_list - a list of buckets (or buckets prefixes and/or suffixes) that will be indexed (empty is all buckets)
  • approved_owners_list - only these owners will have their data indexed (empty is all owners)
  • override_index_path - we'll create a new elasticsearch index if the data sync process is re-initialized. The new index will have a new name, however, using this configurable it is possible to pin the elasticsearch index name, so that sync will still go to the old elasticsearch instance.

The new REST apis that rgw implements:

  • operation on either a bucket or on the root for querying the index. If on a bucket, the query will limit to the specific bucket. The result will contain all the objects that the user has permissions to read from and that were indexed (and matched in the query)
  • a new user operation on the bucket to configure the custom metadata that should be indexed (and its type). If all custom metadata is indexed by default this should be used if user wants to index some of the metadata in a type other than string (integer, date). This operation currently needs to go to the meta master, and not to zone that handles the meta sync
  • a new user operation to delete custom metadata config (needs to go to the master currently)
  • a new user operation to retrieve custom metadata config (can go to the sync zone)
@yehudasa

This comment has been minimized.

Show comment
Hide comment
@yehudasa

yehudasa Apr 19, 2017

Member

@cbodley now rebased, including changes to test_multi.py for testing an elasticsearch zone (and other modifications)

Member

yehudasa commented Apr 19, 2017

@cbodley now rebased, including changes to test_multi.py for testing an elasticsearch zone (and other modifications)

@yehudasa yehudasa changed the title from [DNM] rgw metadata search part 2 to rgw: metadata search part 2 May 2, 2017

return nullptr;
}
RGWOp *op_post() {
return nullptr;

This comment has been minimized.

@theanalyst

theanalyst May 3, 2017

Member

shouldn't this be returning RGWConfigBucketMetaSearch? also should we support a PUT instead of POST?

@theanalyst

theanalyst May 3, 2017

Member

shouldn't this be returning RGWConfigBucketMetaSearch? also should we support a PUT instead of POST?

This comment has been minimized.

@theanalyst

theanalyst May 3, 2017

Member

the above would only apply if we're supporting the op on the secondary, not needed in the current way it works

@theanalyst

theanalyst May 3, 2017

Member

the above would only apply if we're supporting the op on the secondary, not needed in the current way it works

This comment has been minimized.

@yehudasa

yehudasa May 25, 2017

Member

right

@yehudasa
Show outdated Hide outdated src/rgw/rgw_sync_module_es_rest.cc
string id;
RGWRESTConn *conn{nullptr};
string index_path;
std::unique_ptr<RGWRESTConn> conn;

This comment has been minimized.

@theanalyst

theanalyst May 3, 2017

Member

👍

@theanalyst
Show outdated Hide outdated src/rgw/rgw_sync_module_es.cc
Show outdated Hide outdated src/test/rgw/rgw_multi/zone_rados.py
@@ -347,7 +347,7 @@ int RGWHTTPClient::init_request(const char *method, const char *url, rgw_http_re
}
curl_easy_setopt(easy_handle, CURLOPT_READFUNCTION, send_http_data);
curl_easy_setopt(easy_handle, CURLOPT_READDATA, (void *)req_data);
if (is_upload_request(method)) {
if (send_data_hint || is_upload_request(method)) {

This comment has been minimized.

@cbodley

cbodley May 3, 2017

Contributor

does this mean we're sending a body with GET requests? is it possible to use PUT/POST instead, where is_upload_request() will do the right thing?

@cbodley

cbodley May 3, 2017

Contributor

does this mean we're sending a body with GET requests? is it possible to use PUT/POST instead, where is_upload_request() will do the right thing?

This comment has been minimized.

@yehudasa

yehudasa May 3, 2017

Member

Can't change it, sadly that's the elasticsearch api.

@yehudasa

yehudasa May 3, 2017

Member

Can't change it, sadly that's the elasticsearch api.

@theanalyst theanalyst self-requested a review May 4, 2017

yehudasa added some commits Mar 15, 2017

rgw: initial implementation of mdsearch query compiler
convert infix queries that look as follows:

[(]<name> <operator> <value> [)] [<and|or> ...]

into a prefix structure that is understood by elasticsearch.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: add init callback to sync modules
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: define es index mapping
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: implement init_sync() callback in es module
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: handle nested fields in es queries
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: move code into class
just cleaning up

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: simplify es compile interface
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: rename rgw_rest_es.cc to rgw_es_query.cc
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: work on REST handler for es module
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: meta search rest handler can access es module
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: rename a few methods
just rename calls

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: make key param in RGWRESTStreamRWRequest::send_request() optional
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: can send data in RGWRESTStreamRWRequest::send_request()
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: move code around
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

yehudasa added some commits Apr 12, 2017

rgw: don't pass sync module to rest filter in creation
sync module instance might change due to reconfiguration. Pass it into the
handler when executing op.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: move data sync instance_id initialization to caller
so that caller can easily know the instance id.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw/test_multi: add support for elasticsearch testing
Add support for different zone types, and create an elasticsearch
zone type that deals with es testing.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: es: index and return versioned epoch
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
test/rgw/test_multi: differentiate between zone and zone connection
Instead of having a Zone type used for the connection, create a new
ZoneConn type that represents the connection. This frees us from the
need to pass in credentials all around.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
test/rgw/test_multi: initial es functional tests
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw/test_multi/es: extend test
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw/test_multi: zone_conn can hold more than one bucket per zone
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: don't send raw date header to elasticsearch
parse the header, and encode it in the json doc using
a format that ES can understand. Skip header if fails to
parse.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw/test-multi: test more complicated queries
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw/test_multi: add test_es_bucket_conf test
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw/test_multi: add tests for different key types
add int and date tests

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
test_multi: realm checkpoint after init
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: use RGW_AMZ_META_PREFIX
instead of defining X_AMZ_META_PREFIX

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: adjust log levels
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: minor cleanup
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
test/rgw/test_multi: cleanup
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
test_multi: don't pass array as default param to constructor
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: system users override elasticsearch permission filter
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: package radosgw-es
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: es: fix system user check
check was inverted

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: minor fixes following review
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: check init_sync return code
fix following review

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: get_rest_filter() delete original rest manager
When overriding rest manager, delete original.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
test/rgw: drop use of urllib
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: fix import
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
qa/tasks/rgw_multisite.py: adjust zone init
zone is now a ZoneConn object. Also, change import to make it relative
so that qa task can locate it.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
@yehudasa

This comment has been minimized.

Show comment
Hide comment
@yehudasa

yehudasa Jun 1, 2017

Member

@cbodley can we merge this PR? See:
http://pulpito.ceph.com/yehudasa-2017-06-01_10:41:03-rgw-wip-rgw-mdsearch-distro-basic-smithi/
and:
http://pulpito.ceph.com/yehudasa-2017-06-01_15:01:34-rgw:multisite-wip-rgw-mdsearch-distro-basic-smithi/

the first is before the rgw_multisite test fixes, the latter is after the fixes. Although the tests failed there, the test itself ran, not sure that these failures are regressions, we can deal with that after merge anyway.

Member

yehudasa commented Jun 1, 2017

@cbodley can we merge this PR? See:
http://pulpito.ceph.com/yehudasa-2017-06-01_10:41:03-rgw-wip-rgw-mdsearch-distro-basic-smithi/
and:
http://pulpito.ceph.com/yehudasa-2017-06-01_15:01:34-rgw:multisite-wip-rgw-mdsearch-distro-basic-smithi/

the first is before the rgw_multisite test fixes, the latter is after the fixes. Although the tests failed there, the test itself ran, not sure that these failures are regressions, we can deal with that after merge anyway.

@yehudasa yehudasa merged commit ea911b7 into ceph:master Jun 2, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment