-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc: rgw add some basic documentation for sync plugins & ES
Mostly a rst formatted C-c C-v of Yehuda's mail to the ceph-devel lists Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
- Loading branch information
1 parent
474828d
commit f9e6648
Showing
2 changed files
with
270 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
========================= | ||
ElasticSearch Sync Module | ||
========================= | ||
|
||
.. versionadded:: Kraken | ||
|
||
This sync module writes the metadata from other zones to `ElasticSearch`_. As of | ||
luminous this is a json of data fields we currently store in ElasticSearch. | ||
|
||
:: | ||
|
||
{ | ||
"_index" : "rgw-gold-ee5863d6", | ||
"_type" : "object", | ||
"_id" : "34137443-8592-48d9-8ca7-160255d52ade.34137.1:object1:null", | ||
"_score" : 1.0, | ||
"_source" : { | ||
"bucket" : "testbucket123", | ||
"name" : "object1", | ||
"instance" : "null", | ||
"versioned_epoch" : 0, | ||
"owner" : { | ||
"id" : "user1", | ||
"display_name" : "user1" | ||
}, | ||
"permissions" : [ | ||
"user1" | ||
], | ||
"meta" : { | ||
"size" : 712354, | ||
"mtime" : "2017-05-04T12:54:16.462Z", | ||
"etag" : "7ac66c0f148de9519b8bd264312c4d64" | ||
} | ||
} | ||
} | ||
|
||
|
||
|
||
ElasticSearch tier type configurables | ||
------------------------------------- | ||
|
||
* ``endpoint`` | ||
|
||
Specifies the Elasticsearch server endpoint to access | ||
|
||
* ``num_shards`` (integer) | ||
|
||
The number of shards that Elasticsearch will be configured with on | ||
data sync initialization. Note that this cannot be changed after init. | ||
Any change here requires rebuild of the Elasticsearch index and reinit | ||
of the data sync process. | ||
|
||
* ``num_replicas`` (integer) | ||
|
||
The number of the replicas that Elasticsearch will be configured with | ||
on data sync initialization. | ||
|
||
* ``explicit_custom_meta`` (true | false) | ||
|
||
Specifies whether all user custom metadata will be indexed, or whether | ||
user will need to configure (at the bucket level) what custome | ||
metadata entries should be indexed. This is false by default | ||
|
||
* ``index_buckets_list`` (comma separated list of strings) | ||
|
||
If empty, all buckets will be indexed. Otherwise, only buckets | ||
specified here will be indexed. It is possible to provide bucket | ||
prefixes (e.g., foo\*), or bucket suffixes (e.g., \*bar). | ||
|
||
* ``approved_owners_list`` (comma separated list of strings) | ||
|
||
If empty, buckets of all owners will be indexed (subject to other | ||
restrictions), otherwise, only buckets owned by specified owners will | ||
be indexed. Suffixes and prefixes can also be provided. | ||
|
||
* ``override_index_path`` (string) | ||
|
||
if not empty, this string will be used as the elasticsearch index | ||
path. Otherwise the index path will be determined and generated on | ||
sync initialization. | ||
|
||
|
||
End user metadata queries | ||
------------------------- | ||
|
||
.. versionadded:: Luminous | ||
|
||
Since the ElasticSearch cluster now stores object metadata, it is important that | ||
the ElasticSearch endpoint is not exposed to the public and only accessible to | ||
the cluster administrators. For exposing metadata queries to the end user itself | ||
this poses a problem since we'd want the user to only query their metadata and | ||
not of any other users, this would require the ElasticSearch cluster to | ||
authenticate users in a way similar to RGW does which poses a problem. | ||
|
||
As of Luminous RGW in the metadata master zone can now service end user | ||
requests. This allows for not exposing the elasticsearch endpoint in public and | ||
also solves the authentication & authorization problem since RGW itself can | ||
authenticate the end user requests. For this purpose RGW introduces a new query | ||
in the bucket apis that can service elasticsearch requests. All these requests | ||
must be sent to the metadata master zone. | ||
|
||
Syntax | ||
~~~~~~ | ||
|
||
Get an elasticsearch query | ||
`````````````````````````` | ||
|
||
:: | ||
|
||
GET /{bucket}?query={query-expr} | ||
|
||
request params: | ||
- max-keys: max number of entries to return | ||
- marker: pagination marker | ||
|
||
``expression := [(]<arg> <op> <value> [)][<and|or> ...]`` | ||
|
||
op is one of the following: | ||
<, <=, ==, >=, > | ||
|
||
For example :: | ||
|
||
GET /?query=name==foo | ||
|
||
Will return all the indexed keys that user has read permission to, and | ||
are named 'foo'. | ||
|
||
Will return all the indexed keys that user has read permission to, and | ||
are named 'foo'. | ||
|
||
The output will be a list of keys in XML that is similar to the S3 | ||
list buckets response. | ||
|
||
Configure custom metadata fields | ||
```````````````````````````````` | ||
|
||
Define which custom metadata entries should be indexed (under the | ||
specified bucket), and what are the types of these keys. If explicit | ||
custom metadata indexing is configured, this is needed so that rgw | ||
will index the specified custom metadata values. Otherwise it is | ||
needed in cases where the indexed metadata keys are of a type other | ||
than string. | ||
|
||
:: | ||
|
||
POST /{bucket}?mdsearch | ||
x-amz-meta-search: <key [; type]> [, ...] | ||
|
||
Multiple metadata fields must be comma seperated, a type can be forced for a | ||
field with a `;`. The currently allowed types are string(default), integer & | ||
date | ||
|
||
eg. if you want to index a custom object metadata x-amz-meta-year as int, | ||
x-amz-meta-date as type date and x-amz-meta-title as string, you'd do | ||
|
||
:: | ||
|
||
POST /mybooks?mdsearch | ||
x-amz-meta-search: x-amz-meta-year;int, x-amz-meta-release-date;date, x-amz-meta-title;string | ||
|
||
|
||
Delete custom metadata configuration | ||
```````````````````````````````````` | ||
|
||
Delete custom metadata bucket configuration. | ||
|
||
:: | ||
|
||
DELETE /<bucket>?mdsearch | ||
|
||
Get custom metadata configuration | ||
````````````````````````````````` | ||
|
||
Retrieve custom metadata bucket configuration. | ||
|
||
:: | ||
|
||
GET /<bucket>?mdsearch | ||
|
||
|
||
.. _`Elasticsearch`: https://github.com/elastic/elasticsearch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
============ | ||
Sync Modules | ||
============ | ||
|
||
.. versionadded:: Kraken | ||
|
||
The `Multisite`_ functionality of RGW introduced in Jewel allowed the ability to | ||
create multiple zones and mirror data & metadata between them. ``Sync Modules`` | ||
are built atop of the multisite framework that allows for forwarding data & | ||
metadata to a different external tier. A sync module allows for a set of actions | ||
to be performed whenever a change in data occurs (metadata ops like bucket or | ||
user creation etc. are also regarded as changes in data). As the rgw multisite | ||
changes are eventually consistent at remote sites, changes are propagated | ||
asynchronously. This would allow for unlocking use cases such as backing up the | ||
object storage to an external cloud cluster or a custom backup solution using | ||
tape drives, indexing metadata in ElasticSearch etc. | ||
|
||
A sync module configuration is local to a zone. The sync module determines | ||
whether the zone exports data or can only consume data that was modified in | ||
another zone. As of luminous the supported sync plugins are `elasticsearch`_, | ||
``rgw``, which is the default sync plugin that synchronises data between the | ||
zones and ``log`` which is a trivial sync plugin that logs the metadata | ||
operation that happens in the remote zones. The following docs are written with | ||
the example of a zone using `elasticsearch sync module`_, the process would be similar | ||
for configuring any sync plugin | ||
|
||
.. note ``rgw`` is the default sync plugin and there is no need to explicitly | ||
configure this | ||
Requirements and Assumptions | ||
---------------------------- | ||
|
||
Let us assume a simple multisite configuration as described in the `Multisite`_ | ||
docs, of 2 zones ``us-east`` and ``us-west``, let's add a third zone | ||
``us-east-es`` which is a zone that only processes metadata from the other | ||
sites. This zone can be in the same or a different ceph cluster as ``us-east``. | ||
This zone would only consume metadata from other zones and RGWs in this zone | ||
will not serve any end user requests directly. | ||
|
||
|
||
Configuring Sync Modules | ||
------------------------ | ||
|
||
Create the third zone similar to the `Multisite`_ docs, for example | ||
|
||
:: | ||
|
||
# radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-es \ | ||
--access-key={system-key} --secret={secret} --endpoints=http://rgw-es:80 | ||
|
||
|
||
|
||
A sync module can be configured for this zone via the following | ||
|
||
:: | ||
|
||
# radosgw-admin zone modify --rgw-zone={zone-name} --tier-type={tier-type} --tier-config={set of key=value pairs} | ||
|
||
|
||
For example in the ``elasticsearch`` sync module | ||
|
||
:: | ||
|
||
# radosgw-admin zone modify --rgw-zone={zone-name} --tier-type=elasticsearch \ | ||
--tier-config=endpoint=http://localhost:9200,num_shards=10,num_replicas=1 | ||
|
||
|
||
For the various supported tier-config options refer to the `elasticsearch sync module`_ docs | ||
|
||
Finally update the period | ||
|
||
|
||
:: | ||
|
||
# radosgw-admin period update --commit | ||
|
||
|
||
Now start the radosgw in the zone | ||
|
||
:: | ||
|
||
# systemctl start ceph-radosgw@rgw.`hostname -s` | ||
# systemctl enable ceph-radosgw@rgw.`hostname -s` | ||
|
||
|
||
|
||
.. _`Multisite`: ../multisite | ||
.. _`elasticsearch sync module`: ../elastic-sync-module | ||
.. _`elasticsearch`: ../elastic-sync-module |