New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: sync modules, metadata search #10731

Merged
merged 35 commits into from Oct 10, 2016

Conversation

Projects
None yet
2 participants
@yehudasa
Member

yehudasa commented Aug 15, 2016

Make data sync more modular, so that we could add sync modules. An example for such sync module is the "log" module that logs every object that needs to be synced. Other examples (yet to be implemented) are meta indexing module, or backup (to external storage) module.

@yehudasa yehudasa added the feature label Aug 15, 2016

@yehudasa yehudasa changed the title from [DNM] rgw: sync modules to rgw: sync modules, metadata search Aug 26, 2016

@yehudasa yehudasa added the rgw label Aug 26, 2016

Show outdated Hide outdated src/rgw/rgw_rados.cc
return ret;
}
{ /* opening scope so that we can do goto, sorry */

This comment has been minimized.

@cbodley

cbodley Sep 21, 2016

Contributor

copy/paste? i don't see a goto

@cbodley

cbodley Sep 21, 2016

Contributor

copy/paste? i don't see a goto

Show outdated Hide outdated src/rgw/rgw_rados.cc
map<string, bufferlist>::iterator iter = src_attrs.find(RGW_ATTR_ETAG);
if (iter != src_attrs.end()) {
bufferlist& etagbl = iter->second;
*petag = string(etagbl.c_str(), etagbl.length());

This comment has been minimized.

@cbodley

cbodley Sep 21, 2016

Contributor

there's a bufferlist::to_str() for this (and unlike bufferlist::c_str(), it doesn't require reallocating and copying into a contiguous buffer if the bufferlist has multiple segments)

@cbodley

cbodley Sep 21, 2016

Contributor

there's a bufferlist::to_str() for this (and unlike bufferlist::c_str(), it doesn't require reallocating and copying into a contiguous buffer if the bufferlist has multiple segments)

Show outdated Hide outdated src/rgw/rgw_sync_module.h
void set_result(ceph::real_time& _mtime,
uint64_t _size,
map<string, bufferlist>& _attrs) {

This comment has been minimized.

@cbodley

cbodley Sep 21, 2016

Contributor

consider taking _attrs by rvalue ref, so it's obvious to the caller that _attrs is being moved

@cbodley

cbodley Sep 21, 2016

Contributor

consider taking _attrs by rvalue ref, so it's obvious to the caller that _attrs is being moved

Show outdated Hide outdated src/rgw/rgw_rados.h
RGWSyncModulesManager *get_sync_modules_manager() {
return sync_modules_manager;
}
RGWSyncModuleInstanceRef& get_sync_module() {

This comment has been minimized.

@cbodley

cbodley Sep 21, 2016

Contributor

consider returning by value or const reference. by reference allows the caller to modify, i.e. store->get_sync_module().reset()

@cbodley

cbodley Sep 21, 2016

Contributor

consider returning by value or const reference. by reference allows the caller to modify, i.e. store->get_sync_module().reset()

* in this case, we're not returning the object's content, only the prepended
* extra metadata
*/
total_len = 0;

This comment has been minimized.

@cbodley

cbodley Sep 21, 2016

Contributor

cool, so stat_remote_obj() works like a normal GET request that skips the data - and we get the size from Rgwx-Object-Size instead of Content-Length?

@cbodley

cbodley Sep 21, 2016

Contributor

cool, so stat_remote_obj() works like a normal GET request that skips the data - and we get the size from Rgwx-Object-Size instead of Content-Length?

This comment has been minimized.

@yehudasa

yehudasa Oct 7, 2016

Member

yeah

@yehudasa
class RGWSyncModulesManager {
Mutex lock;

This comment has been minimized.

@cbodley

cbodley Sep 21, 2016

Contributor

unused

@cbodley

cbodley Sep 21, 2016

Contributor

unused

yehudasa added some commits Jul 5, 2016

rgw: initial data plugin definition and default implementation
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: use data sync module callbacks
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: define sync modules manager, instance
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: define zone tier type, sync from appropriate tiers only
Can only sync from tiers that can export data.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: add tier config for zone params
Needed for sync module instance configuration

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw_admin: can set/modify zone tier's config
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
@yehudasa

This comment has been minimized.

Show comment
Hide comment
@yehudasa

yehudasa Oct 7, 2016

Member

@cbodley addressed you comments, repushed

Member

yehudasa commented Oct 7, 2016

@cbodley addressed you comments, repushed

yehudasa added some commits Aug 3, 2016

rgw: define sync_module on RGWRados
Instead of having it as part of the data sync module. Since we only have a
single sync_module, having it there will make it easier to get its properties.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: non-rgw tier is not writeable
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: add a simple logging sync module
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: helper to stat remote obj
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: add cr to stat remote obj
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: propagate attrs, mtime, size of remote object
Use new rgwx-stat http param that allows getting only object's
meta. Use that when calling stat_remote_object().

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: log sync module gets source object's meta
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: some abstraction around log sync module
Moving code that fetches remote object meta to its own classes.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: move the rgw sync code module around
No real code change

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: REST client, don't sign requests if empty key
If key is not passed in, don't try to sign the request.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: allow null store in RGWRESTConn
We're not necessarily going to connect to rgw/s3 endpoints,
we only need store param to handle s3 signing.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: a new cr to send http PUT requests
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: initial implementation of elasticsearch sync module
sync module that will handle rgw metadata indexing.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
cmake: fix linkage of ceph_test_librgw_file_nfsns
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: es sync module, send object info to elasticsearch
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: es sync module, store object attrs
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: es sync module, store acl information
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: utility function to dump iso8601
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: es sync module, keep object mtime
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: es sync module, store custom metadata
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: rest conn functions cleanup, only append zonegroup if not empty
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: add cr to send DELETE to remove endpoint
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: es module, remove entry on delete
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: support partial mesh for zone sync
zone configuration now includes two new fields: sync_from_all
which is boolean, and sync_from, which is a least of zones to
sync from. By default sync_from_all is set to true. Sync will
only happen from all the zones, or from the specified zones if
sync_from all is false. We also check to see whether zone can
export data (depending on tier_type).

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw_admin: config options to set sync_from and sync_from_all
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw_admin: update usage
add refrence to --sync-from*

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: setting sync-from zone by name not by id
Using the zone name is easier and clearer.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw_admin: sync status command shows if not syncing from zone
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: index metadata in elasticsearch using realm name for path
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
@cbodley

This comment has been minimized.

Show comment
Hide comment
@cbodley

cbodley Oct 7, 2016

Contributor

looks good. is it passing test_multi.py?

Contributor

cbodley commented Oct 7, 2016

looks good. is it passing test_multi.py?

@yehudasa

This comment has been minimized.

Show comment
Hide comment
@yehudasa
Member

yehudasa commented Oct 7, 2016

@cbodley yes

@cbodley cbodley merged commit 4ededdb into ceph:master Oct 10, 2016

2 checks passed

Signed-off-by all commits in this PR are signed
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment