doc: rgw add some basic documentation for sync plugins & ES

Mostly a rst formatted C-c C-v of Yehuda's mail to the ceph-devel lists Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
ceph · Jan 25, 2018 · f9e6648 · f9e6648
1 parent 474828d
commit f9e6648
Show file tree

Hide file tree

Showing 2 changed files with 270 additions and 0 deletions.
diff --git a/doc/radosgw/elastic-sync-module.rst b/doc/radosgw/elastic-sync-module.rst
@@ -0,0 +1,181 @@
+=========================
+ElasticSearch Sync Module
+=========================
+
+.. versionadded:: Kraken
+
+This sync module writes the metadata from other zones to `ElasticSearch`_. As of
+luminous this is a json of data fields we currently store in ElasticSearch.
+
+::
+
+   {
+        "_index" : "rgw-gold-ee5863d6",
+        "_type" : "object",
+        "_id" : "34137443-8592-48d9-8ca7-160255d52ade.34137.1:object1:null",
+        "_score" : 1.0,
+        "_source" : {
+          "bucket" : "testbucket123",
+          "name" : "object1",
+          "instance" : "null",
+          "versioned_epoch" : 0,
+          "owner" : {
+            "id" : "user1",
+            "display_name" : "user1"
+          },
+          "permissions" : [
+            "user1"
+          ],
+          "meta" : {
+            "size" : 712354,
+            "mtime" : "2017-05-04T12:54:16.462Z",
+            "etag" : "7ac66c0f148de9519b8bd264312c4d64"
+          }
+        }
+      }
+
+
+
+ElasticSearch tier type configurables
+-------------------------------------
+
+* ``endpoint``
+
+Specifies the Elasticsearch server endpoint to access
+
+* ``num_shards`` (integer)
+
+The number of shards that Elasticsearch will be configured with on
+data sync initialization. Note that this cannot be changed after init.
+Any change here requires rebuild of the Elasticsearch index and reinit
+of the data sync process.
+
+* ``num_replicas`` (integer)
+
+The number of the replicas that Elasticsearch will be configured with
+on data sync initialization.
+
+* ``explicit_custom_meta`` (true | false)
+
+Specifies whether all user custom metadata will be indexed, or whether
+user will need to configure (at the bucket level) what custome
+metadata entries should be indexed. This is false by default
+
+* ``index_buckets_list`` (comma separated list of strings)
+
+If empty, all buckets will be indexed. Otherwise, only buckets
+specified here will be indexed. It is possible to provide bucket
+prefixes (e.g., foo\*), or bucket suffixes (e.g., \*bar).
+
+* ``approved_owners_list`` (comma separated list of strings)
+
+If empty, buckets of all owners will be indexed (subject to other
+restrictions), otherwise, only buckets owned by specified owners will
+be indexed. Suffixes and prefixes can also be provided.
+
+* ``override_index_path`` (string)
+
+if not empty, this string will be used as the elasticsearch index
+path. Otherwise the index path will be determined and generated on
+sync initialization.
+
+
+End user metadata queries
+-------------------------
+
+.. versionadded:: Luminous
+
+Since the ElasticSearch cluster now stores object metadata, it is important that
+the ElasticSearch endpoint is not exposed to the public and only accessible to
+the cluster administrators. For exposing metadata queries to the end user itself
+this poses a problem since we'd want the user to only query their metadata and
+not of any other users, this would require the ElasticSearch cluster to
+authenticate users in a way similar to RGW does which poses a problem.
+
+As of Luminous RGW in the metadata master zone can now service end user
+requests. This allows for not exposing the elasticsearch endpoint in public and
+also solves the authentication & authorization problem since RGW itself can
+authenticate the end user requests. For this purpose RGW introduces a new query
+in the bucket apis that can service elasticsearch requests. All these requests
+must be sent to the metadata master zone.
+
+Syntax
+~~~~~~
+
+Get an elasticsearch query
+``````````````````````````
+
+::
+
+   GET /{bucket}?query={query-expr}
+
+request params:
+ - max-keys: max number of entries to return
+ - marker: pagination marker
+
+``expression := [(]<arg> <op> <value> [)][<and|or> ...]``
+
+op is one of the following:
+<, <=, ==, >=, >
+
+For example ::
+
+  GET /?query=name==foo
+
+Will return all the indexed keys that user has read permission to, and
+are named 'foo'.
+
+Will return all the indexed keys that user has read permission to, and
+are named 'foo'.
+
+The output will be a list of keys in XML that is similar to the S3
+list buckets response.
+
+Configure custom metadata fields
+````````````````````````````````
+
+Define which custom metadata entries should be indexed (under the
+specified bucket), and what are the types of these keys. If explicit
+custom metadata indexing is configured, this is needed so that rgw
+will index the specified custom metadata values. Otherwise it is
+needed in cases where the indexed metadata keys are of a type other
+than string.
+
+::
+
+   POST /{bucket}?mdsearch
+   x-amz-meta-search: <key [; type]> [, ...]
+
+Multiple metadata fields must be comma seperated, a type can be forced for a
+field with a `;`. The currently allowed types are string(default), integer &
+date
+
+eg. if you want to index a custom object metadata x-amz-meta-year as int,
+x-amz-meta-date as type date and x-amz-meta-title as string, you'd do
+
+::
+
+   POST /mybooks?mdsearch
+   x-amz-meta-search: x-amz-meta-year;int, x-amz-meta-release-date;date, x-amz-meta-title;string
+
+
+Delete custom metadata configuration
+````````````````````````````````````
+
+Delete custom metadata bucket configuration.
+
+::
+
+   DELETE /<bucket>?mdsearch
+
+Get custom metadata configuration
+`````````````````````````````````
+
+Retrieve custom metadata bucket configuration.
+
+::
+
+   GET /<bucket>?mdsearch
+
+
+.. _`Elasticsearch`: https://github.com/elastic/elasticsearch
diff --git a/doc/radosgw/sync-plugins.rst b/doc/radosgw/sync-plugins.rst
@@ -0,0 +1,89 @@
+============
+Sync Modules
+============
+
+.. versionadded:: Kraken
+
+The `Multisite`_ functionality of RGW introduced in Jewel allowed the ability to
+create multiple zones and mirror data & metadata between them. ``Sync Modules``
+are built atop of the multisite framework that allows for forwarding data &
+metadata to a different external tier. A sync module allows for a set of actions
+to be performed whenever a change in data occurs (metadata ops like bucket or
+user creation etc. are also regarded as changes in data). As the rgw multisite
+changes are eventually consistent at remote sites, changes are propagated
+asynchronously. This would allow for unlocking use cases such as backing up the
+object storage to an external cloud cluster or a custom backup solution using
+tape drives, indexing metadata in ElasticSearch etc.
+
+A sync module configuration is local to a zone. The sync module determines
+whether the zone exports data or can only consume data that was modified in
+another zone. As of luminous the supported sync plugins are `elasticsearch`_,
+``rgw``, which is the default sync plugin that synchronises data between the
+zones and ``log`` which is a trivial sync plugin that logs the metadata
+operation that happens in the remote zones. The following docs are written with
+the example of a zone using `elasticsearch sync module`_, the process would be similar
+for configuring any sync plugin
+
+.. note ``rgw`` is the default sync plugin and there is no need to explicitly
+   configure this
+
+Requirements and Assumptions
+----------------------------
+
+Let us assume a simple multisite configuration as described in the `Multisite`_
+docs, of 2 zones ``us-east`` and ``us-west``, let's add a third zone
+``us-east-es`` which is a zone that only processes metadata from the other
+sites. This zone can be in the same or a different ceph cluster as ``us-east``.
+This zone would only consume metadata from other zones and RGWs in this zone
+will not serve any end user requests directly.
+
+
+Configuring Sync Modules
+------------------------
+
+Create the third zone similar to the `Multisite`_ docs, for example
+
+::
+
+   # radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-es \
+   --access-key={system-key} --secret={secret} --endpoints=http://rgw-es:80
+
+
+
+A sync module can be configured for this zone via the following
+
+::
+
+   # radosgw-admin zone modify --rgw-zone={zone-name} --tier-type={tier-type} --tier-config={set of key=value pairs}
+
+
+For example in the ``elasticsearch`` sync module
+
+::
+
+   # radosgw-admin zone modify --rgw-zone={zone-name} --tier-type=elasticsearch \
+                               --tier-config=endpoint=http://localhost:9200,num_shards=10,num_replicas=1
+
+
+For the various supported tier-config options refer to the `elasticsearch sync module`_ docs
+
+Finally update the period
+
+
+::
+
+    # radosgw-admin period update --commit
+
+
+Now start the radosgw in the zone
+
+::
+
+    # systemctl start ceph-radosgw@rgw.`hostname -s`
+    # systemctl enable ceph-radosgw@rgw.`hostname -s`
+
+
+
+.. _`Multisite`: ../multisite
+.. _`elasticsearch sync module`: ../elastic-sync-module
+.. _`elasticsearch`: ../elastic-sync-module