Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: add abstraction for ops log destination and add file logger #43329

Merged

Conversation

cfsnyder
Copy link
Contributor

Adds an OpsLogSink abstraction for ops log destinations. Also implements
this abstraction for a file logger since it's easier to use files vs.
sockets with containers.

Fixes: https://tracker.ceph.com/issues/48752
Signed-off-by: Cory Snyder csnyder@iland.com

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is cogently written. My main questions are about buffering options on the output file. Am I missing where that is handled?

@@ -589,10 +588,17 @@ namespace rgw {

// XXX ex-RGWRESTMgr_lib, mgr->set_logging(true)

OpsLogManifold* olog_manifold = new OpsLogManifold();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think this is a sane design


bl.append(s);
void OpsLogManifold::add_sink(OpsLogSink *sink)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency and Stroustrupness, "OpsLogSink* sink"

int OpsLogFile::log_json(struct req_state* s, bufferlist& bl)
{
lock.lock();
bl.write_stream(file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we do this, is the buffer being written out synchronously?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the socket version allows for a configurable amount of data to be in the socket buffer before being flushed--this is important for performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattbenjamin Yeah that was a question that I intended to ask. For our use case, we don't want to lose these ops logs in the case of a process crash, so we prefer that they be written out synchronously. I can imagine that this might be less important for other users and that it might be desirable to have a knob to turn in order to control whether (and how much) buffering is used. Since you're confirming that this is important, I'll go ahead and work on that piece and update the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these synchronous writes are unfortunately in conflict with the async refactoring project, where we're trying to eliminate all blocking waits from frontend threads. beast frontend requests run as coroutines, so when we can suspend that coroutine instead of blocking, that allows the thread to process other requests in the meantime

i'd eventually like to see this use something like libaio to make the file writes asynchronous

int log(struct req_state* s, struct rgw_log_entry& entry) override;
};

class JsonOpsLogSink : public OpsLogSink {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that's expected

ceph::Formatter *formatter;
ceph::mutex lock = ceph::make_mutex("OpsLogSocket");
ceph::mutex lock = ceph::make_mutex("JsonOpsLogSink");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is too...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this may be where writing to the unix socket and attaching something to consume it and relay it to a file may have some advantage, but, it's still a valid choice

@@ -518,12 +518,16 @@ int radosgw_Main(int argc, const char **argv)

rgw::dmclock::SchedulerCtx sched_ctx{cct.get()};

OpsLogSocket *olog = NULL;

OpsLogManifold *olog = new OpsLogManifold();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpsLogManifold seems like a verbose, but valid name :)

Copy link
Contributor

@dang dang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, once the buffering bits get in.

@cfsnyder cfsnyder force-pushed the wip-48752-generalize-ops-log-socket branch from cd07218 to 8e07cf1 Compare October 4, 2021 21:34
OpsLogSocket::~OpsLogSocket()
void OpsLogFile::flush()
{
std::scoped_lock flush_lock(flush_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the log buffer append while the flush buffer is flushed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think iow that maybe you could get that behavior by introducing a flag and dropping this mutex during flush (apologies if I'm missing something)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log_mutex is only held while swapping log_buffer and flush_buffer, so request threads can continue to append to the log buffer while the flush_buffer is actually being written out to disk. There should be minimal contention here between the loggers and the flusher. Let me know if I'm missing something, though! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see; this is basically fine, I think.

Adds an OpsLogSink abstraction for ops log destinations. Also implements
this abstraction for a file logger since it's easier to use files vs.
sockets with containers.

Fixes: https://tracker.ceph.com/issues/48752
Signed-off-by: Cory Snyder <csnyder@iland.com>
@cfsnyder cfsnyder force-pushed the wip-48752-generalize-ops-log-socket branch from 8e07cf1 to c4c9f39 Compare October 5, 2021 08:46
Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you set up an rgw suite run in teuthology?

OpsLogSocket::~OpsLogSocket()
void OpsLogFile::flush()
{
std::scoped_lock flush_lock(flush_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see; this is basically fine, I think.

@cfsnyder
Copy link
Contributor Author

cfsnyder commented Oct 5, 2021

can you set up an rgw suite run in teuthology?

@mattbenjamin I haven't run teuthology suites myself yet. Any references or pointers on how to go about that?

@mattbenjamin
Copy link
Contributor

hi @cfsnyder well, you need sepia (ceph test lab creds), and then you push your branch to ceph-ci.git, and then you request a teuthology run. There's detail here:
https://docs.ceph.com/en/octopus/dev/developer_guide/running-tests-using-teuth/
If you've never done this, we can try to see if others will help--I'll set the needs-qa flag.

@mattbenjamin mattbenjamin self-requested a review October 5, 2021 13:22
Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving, setting needs-qa

@cfsnyder
Copy link
Contributor Author

cfsnyder commented Oct 5, 2021

hi @cfsnyder well, you need sepia (ceph test lab creds), and then you push your branch to ceph-ci.git, and then you request a teuthology run. There's detail here: https://docs.ceph.com/en/octopus/dev/developer_guide/running-tests-using-teuth/ If you've never done this, we can try to see if others will help--I'll set the needs-qa flag.

Thanks @mattbenjamin, I'll look into getting access to Sepia and figuring out how to do this for myself in the future. So far I've just been adding the needs-qa flag and letting Yuri handle that testing.

@cfsnyder
Copy link
Contributor Author

Anything that I can do to get this included in a QA run so that it can be back-ported in time for the next Pacific release @mattbenjamin ?

@mattbenjamin
Copy link
Contributor

Anything that I can do to get this included in a QA run so that it can be back-ported in time for the next Pacific release @mattbenjamin ?

I'll try to send one today

Matt

@mattbenjamin
Copy link
Contributor

Ok, so I've re-reviewed this in view of @cbodley's comment regarding the in-progress async/non-blocking request processing refactor that we expect to land immanently in 2022. I do want the functionality that this PR offers, and I don't want to be anything less than welcoming of good and new work. But I think Casey has a point here, in that even the top-half/request threads are at least infrequently doing blocking operations and waiting for bottom-half flush operations to finalize. I would like to ask: is there a way we could potentially revise the i/o strategy to be explicitly non-blocking in the request path? I think ordinary non-blocking i/o, but also libaio or similar as Casey noted, would be sound ways to approach this. @cfsnyder is that something you could consider helping with?

@mattbenjamin
Copy link
Contributor

@cbodley maybe the semantics of the existing ops-log path need some review discussion as well; iirc, the ops log socket is potentially lossy, if the receiver is unable to keep its queue under its max size. Maybe we could discuss all this more in the refactoring meeting? I think you've come to those, @cfsnyder?

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per casey's feedback

@mattbenjamin mattbenjamin self-requested a review October 27, 2021 19:48
Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per discussion in refactoring call, restoring approval

@mattbenjamin
Copy link
Contributor

@cfsnyder wip-opslog-48752 created in ceph-ci; will start an rgw suite run when it's built

@cfsnyder
Copy link
Contributor Author

@mattbenjamin I'm guessing the QA failures are unrelated to these changes but let me know if you think otherwise?

@mattbenjamin
Copy link
Contributor

@cfsnyder I don't think any of the failures here are due to the ops-log change.
that said, @cbodley I haven't seen https://pulpito.ceph.com/mbenjamin-2021-10-28_10:36:16-rgw-wip-opslog-48752-distro-basic-smithi/6465138/ before; boto thinks ListBuckets needs to be re-tried? is this a client timeout?

@mattbenjamin
Copy link
Contributor

ok, this passed; merging; opening a tracker issue for the new list-buckets failure on cleanup

@wjwithagen
Copy link
Contributor

Clang/FreeBSD is not happy with this one:

/home/jenkins/workspace/ceph-master-compile/src/rgw/rgw_log.cc:391:16: error: no matching member function for call to 'wait'
    cond_flush.wait(lock);
    ~~~~~~~~~~~^~~~
/usr/include/c++/v1/__mutex_base:292:10: note: candidate function not viable: no known conversion from 'std::unique_lock<mutex_debug_impl<false>>' (aka 'std::unique_lock<ceph::mutex_debug_detail::mutex_debug_impl<false>>') to 'unique_lock<std::mutex> &' for 1st argument
    void wait(unique_lock<mutex>& __lk) _NOEXCEPT;
         ^
/usr/include/c++/v1/__mutex_base:403:21: note: candidate function template not viable: requires 2 arguments, but 1 was provided
condition_variable::wait(unique_lock<mutex>& __lk, _Predicate __pred)
                    ^
1 error generated.

Trying to fix, but keep running in all kinds of variations for this

@cbodley
Copy link
Contributor

cbodley commented Oct 31, 2021

thanks @wjwithagen, #43751 should fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants