New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: gc use aio #20546
rgw: gc use aio #20546
Conversation
still need to deal with index cleanup asynchronously Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
to allow cross shards concurrency Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
ead95e8
to
29f3be1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support, appreciate the fast typing, will test
src/rgw/rgw_gc.cc
Outdated
string tag; | ||
}; | ||
|
||
list<IO> ios; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would love to avoid std::list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattbenjamin need a fifo, vector will not cut it
src/rgw/rgw_gc.cc
Outdated
}; | ||
|
||
list<IO> ios; | ||
std::list<string> remove_tags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would love to avoid std::list (but I'm not actually suggesting an alternative would be worth the trouble)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will try to replace this one
src/rgw/rgw_gc.cc
Outdated
} | ||
|
||
remove_tags.push_back(io.tag); | ||
#define MAX_REMOVE_CHUNK 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get now that this was missing from the chmagnus change; what makes 16 a good magic number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not too small, not too high... but seriously, this can be now configured, I have no way to tell what is the optimal number. We're bundling this much operations together, more than that (how much more?) seems to me could affect osd availability. Less than that, latency will probably be the biggest factor.
src/rgw/rgw_gc.cc
Outdated
string tag; | ||
}; | ||
|
||
list<IO> ios; | ||
std::list<string> remove_tags; | ||
map<int, std::list<string> > remove_tags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would love to avoid std::map of std::list--in this case, since the index represents a stable "slot", could this same mechanism be made to work with a std::vector<std::list>? That could actually be worth doing, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, can probably do vector.
src/rgw/rgw_gc.cc
Outdated
~RGWGCIOManager() { | ||
for (auto io : ios) { | ||
io.c->release(); | ||
} | ||
} | ||
|
||
int schedule_io(IoCtx *ioctx, const string& oid, ObjectWriteOperation *op, const string& tag) { | ||
int schedule_io(IoCtx *ioctx, const string& oid, ObjectWriteOperation *op, int index, const string& tag) { | ||
#warning configurable | ||
#define MAX_CONCURRENT_IO 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what makes 5 a good magic number?
@yehudasa just one suggestion I'd act on, if it worked: can the std::map of slots work as a vector? |
@yehudasa tested manually, it worked well for me; I think it delivered about the gc throughput at concurrent=5 as the threaded version did w/3 threads, so the tuning seems good |
(scheduling a teuthology run) |
@yehudasa agree, the interesting one is the std::map<std::liststd::string>, the other uses of list may be already as good as they can be |
@yehudasa perhaps the one you mentioned could be a std::deque |
29f3be1
to
4634585
Compare
and another tunable for log trim size Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
@yehudasa just noticed this; we need something like: diff --git a/src/test/cls_rgw/test_cls_rgw.cc b/src/test/cls_rgw/test_cls_rgw.cc
index 1d72dce2a1..a9242b03e6 100644
--- a/src/test/cls_rgw/test_cls_rgw.cc
+++ b/src/test/cls_rgw/test_cls_rgw.cc
@@ -673,7 +673,7 @@ TEST(cls_rgw, gc_defer)
ASSERT_EQ(0, truncated);
librados::ObjectWriteOperation op3;
- list<string> tags;
+ std::vector<std::string> tags;
tags.push_back(tag);
/* remove chain */ |
@yehudasa manual testing checks out |
@mattbenjamin I'll push a fix |
4634585
to
9d37cac
Compare
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
@yehudasa does this need a tracker issue? |
@mattbenjamin wouldn't hurt |
@yehudasa the only unexpected failure is a known fail of the lifecycle expiration test, which Casey confirms, does not verify actual GC, only bucket index (but is still failing apparently due to a timing issue). |
Hi, how fast will it be? I have a requirement here, clients have 10000 qps for write/put, and all objects are set ttl, i.e. one week or month. Clients may read it before expiration(may be a few thousand qps), but never access them after expiration. We have enough SSDs or NVMEs for osd clusters, disk is not the bottleneck in terms of iops or throughput, but space. So we must trim or delete expired objects aggressively. It must be better than 10000 qps, otherwise the space will use up. So will this patch be fast enough to trim objects? Is there any other factors that affect the gc speed? |
@wjin this change actually permits much higher workload contribution from gc--you'd increase concurrent_io to "go faster"; the max_obj value should just be a "good" value for one gc work unit. The factor left limiting gc speed then is the real workload capacity in the cluster. |
@mattbenjamin Thanks for your quick response. We will set up a very "fast" cluster for clients, like 50000 qps so that it does not affect client usage when doing gc. I will try it later, wish it could be in 12.2.5. |
@mattbenjamin can your PR make the number of concurrent GC requests based on a fraction of number of OSDs in the cluster? Something like max(1, OSDs/10)? The goal is that it would scale naturally without interfering with application workload, and wouldn't require per-site tuning (normally). Any ceph daemon can ask the monitor for the number of OSDs in the cluster using librados, right? Also, does your PR spread GC request generators across the cluster, for really big clusters (in the hundreds or thousands of OSDs)? thx -ben |
No description provided.