Provide logging behavior policies applied by conmon to stdout/stderr #84

portante · 2019-11-14T21:55:01Z

Today conmon is engaging in a byte-capture of stdout/stderr pipes, recording data as a series of JSON documents in a log file. All bytes from both file descriptors are captured without discrimination. We also see that conmon tries to write to disk as fast as possible (but sequentially) with what it reads from both pipes.

There is a natural back-pressure presented by conmon to the processes in the container writing to stdout/stderr that results from the rate the disk can accepts write()s. It is usually the case that all conmon processes server containers write to the same disk. Which means it is possible for "noisy" containers to take up much of the available disk bandwidth leading to unwanted impacts of one container on another.

Further, the rate at which containers write to disk is independent of the rate at which log file readers (or scrapers; e.g. fluentd, fluent-bit, rsyslog, syslog-ng, filebeat, etc.) can read. Because log file rotation is performed independent of the readers and conmon on most platforms, it is possible for conmon to write more data than what a reader can take in before a log file is deleted behind the reader's back with it ever seeing it.

These two problems suggest that log behavior policies applied by conmon to the log stream will give administrators the chance to solve them.

There are three policy behaviors we can begin to consider, each applied to stdout/stderr independently:

back-pressure

Given a rate specified in "bytes per interval", stop reading from the given pipe if the number of bytes read so far exceeds the limit for that interval, continuing to read again at the beginning of the next interval
Suggested interval could just be 1 second by default, and/or allow it to be configured along with the # of bytes for that interval

drop

Given a rate like specified for back-pressure, drop bytes read over the limit for the remainder of the interval, accepting them again at the start of the next interval

ignore

Ignore all bytes read from a given pipe without writing them to disk.

This will allow an administrator to apply a policy to all conmon processes. Since conmon processes don't "know" about each other, coordination of the setting of that policy would have to be managed externally to conmon.

It is possible that conmon processes could periodically poll for configuration changes, or accept some sort of signal to re-evaluate the change.

The text was updated successfully, but these errors were encountered:

rhatdan · 2019-11-17T14:13:30Z

@giuseppe Could you see about adding these flags.

giuseppe · 2019-11-18T08:37:40Z

I've a few questions.

Is ignore the equivalent of drop with a bytes-per-interval=0?
Are these limits supposed to be changed at runtime? e.g. can I change bytes-per-interval while the container is running?
How these settings will work for Podman/CRI-O? e.g. for CRi-O: should it be done globally or in an annotation?

mheon · 2019-11-18T11:57:47Z

For Podman I see us exposing this via --log-opt

…

On Mon, Nov 18, 2019, 03:37 Giuseppe Scrivano ***@***.***> wrote: I've a few questions. - Is ignore the equivalent of drop with a bytes-per-interval=0? - Are these limits supposed to be changed at runtime? e.g. can I change bytes-per-interval while the container is running? - How these settings will work for Podman/CRI-O? e.g. for CRi-O: should it be done globally or in an annotation? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#84?email_source=notifications&email_token=AB3AOCAJ2OLEUDSCN2TRIDTQUJH5LA5CNFSM4JNSHNX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEJUGMY#issuecomment-554910515>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCDJO6MCUXKGH5IXWTLQUJH5LANCNFSM4JNSHNXQ> .

rhatdan · 2019-11-18T16:07:42Z

I've a few questions.

* Is `ignore` the equivalent of `drop` with a bytes-per-interval=0?

@portante WDYT?

* Are these limits supposed to be changed at runtime?  e.g. can I change bytes-per-interval while the container is running?

No these should be constant for the container

* How these settings will work for Podman/CRI-O?  e.g. for CRi-O: should it be done globally or in an annotation?

CRI-O would need to use annotations until this gets support in upstream Kubernetes.
I agree with @mheon Podman would use --log-opt.

haircommander · 2019-11-18T16:25:47Z

* How these settings will work for Podman/CRI-O?  e.g. for CRi-O: should it be done globally or in an annotation?
CRI-O would need to use annotations until this gets support in upstream Kubernetes.
I agree with @mheon Podman would use --log-opt.

While using annotations in CRI-O to set this would work, I'm not sure anyone would use it (us included) before it makes it upstream. conmon would still have to log in the old format to conform, and thus we'd have to have concurrent logging (the new and old way). It may make most sense to implement for podman, and propose in CRI, before we do work in CRI-O that may not be used or may not be really desired

portante · 2019-11-18T16:27:40Z

Is ignore the equivalent of drop with a bytes-per-interval=0?

Certainly it could be implemented that way, but I was hoping we could make it so that at the --log-opt interface level drop with bytes-per-interval=0 is rejected. This way the intention is explicit instead of implicit: you want one of three behaviors, ignore (no logs at all), drop(logs if in bounds), back-pressure (all logs, with max bandwidth).

The other behavior we need is to be sure an SRE has the option to set the max bandwidth without the user increasing it, while still allowing the users to pick the behavior.

mffiedler · 2019-12-05T19:14:01Z

Example of the behavior described by @portante in https://bugzilla.redhat.com/show_bug.cgi?id=1741955. Container logging rates and container runtime log rotation policy can cause the logs to outstrip the log scraper's ability to keep up.

syedriko · 2019-12-09T16:48:31Z

I'm adding two new log options to podman CLI in my POC, passed as part of --log-opt:

policy=backpressure|drop|ignore|passthrough

Peter has covered the first three, passthrough is current behavior with unrestricted logging.

rate-limit=RATE
Limit the transfer to a maximum of RATE bytes per second. A suffix of "K", "M", "G", or "T" can be
added to denote kibibytes (*1024), mebibytes, and so on.

rhatdan · 2019-12-09T17:40:38Z

@syedriko @portante Could one of you open a PR to add this feature?

syedriko · 2019-12-09T17:46:42Z

@rhatdan A WIP PR: #92
Another PR in libpod for the CLI changes: containers/podman#4663

rhatdan · 2019-12-09T18:32:41Z

@syedriko I mean a PR to containers/common to add these limits.

bparees · 2019-12-09T19:15:13Z

what happens in the back-pressure case? the logs continue to accumulate in the pipe/buffer indefinitely(Assuming the log rate never drops)? can this cause memory consumption issues? who's memory budget does that come out of?

haircommander · 2019-12-09T19:18:15Z

who's memory budget does that come out of?

By default, CRI-O puts conmon in the system.slice, but OCP ships with it in the pod slice, so it'd be charged to the pod

bparees · 2019-12-09T19:21:23Z

so it'd be charged to the pod

that sounds like the right place for it, so that's good.

syedriko · 2019-12-09T19:58:16Z

what happens in the back-pressure case?

When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.

portante · 2019-12-09T20:32:36Z

what happens in the back-pressure case?

When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.

One complication is that I think the logging rate should be specified separately for stdout from stderr. We should not lump the two together.

syedriko · 2019-12-09T20:38:38Z

what happens in the back-pressure case?

When over output quota in a particular time period, container log writes will block on a pipe that conmon reads from until the end of that time period. There won't be log accumulation.

One complication is that I think the logging rate should be specified separately for stdout from stderr. We should not lump the two together.

We can sure do that. With the PRs that are currently in-flight, I'm aiming at building a minimal working but end-to-end implementation we can play with, pick apart and make sure it makes sense.

syedriko · 2019-12-10T06:12:36Z

@syedriko I mean a PR to containers/common to add these limits.

@rhatdan Could you give me a bit more details? I'm new to the containers code base and I'm not quite seeing how containers/common fits into what I'm doing.

portante mentioned this issue Nov 15, 2019

LOG-545: Cluster logging Elasticsearch rollover data design openshift/enhancements#108

Merged

rhatdan assigned giuseppe Nov 17, 2019

syedriko mentioned this issue Dec 9, 2019

WIP: logging behavior policies and rate-limited logging #92

Closed

syedriko mentioned this issue Dec 9, 2019

WIP: Added support for log policy and log rate limit in conmon containers/podman#4663

Closed

syedriko mentioned this issue Dec 9, 2019

Provide logging behavior policies applied by conmon to stdout/stderr cri-o/cri-o#3026

Closed

This was referenced Jan 6, 2020

Logging - posibility of losing logs openshift/origin-aggregated-logging#1618

Closed

Secure forward output plugins should use "buffer_queue_full_action" setting of "block" openshift/origin-aggregated-logging#712

Closed

raffaelespazzoli mentioned this issue Nov 13, 2020

Implement container log back-pressure mechanism kubernetes/kubernetes#96567

Closed

portante mentioned this issue Nov 30, 2020

Track container log throttling features as log files are inherently unstable to scrape vectordotdev/vector#5302

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide logging behavior policies applied by conmon to stdout/stderr #84

Provide logging behavior policies applied by conmon to stdout/stderr #84

portante commented Nov 14, 2019

rhatdan commented Nov 17, 2019

giuseppe commented Nov 18, 2019

mheon commented Nov 18, 2019 via email

rhatdan commented Nov 18, 2019

haircommander commented Nov 18, 2019

portante commented Nov 18, 2019

mffiedler commented Dec 5, 2019

syedriko commented Dec 9, 2019

rhatdan commented Dec 9, 2019

syedriko commented Dec 9, 2019 •

edited

Loading

rhatdan commented Dec 9, 2019

bparees commented Dec 9, 2019

haircommander commented Dec 9, 2019

bparees commented Dec 9, 2019

syedriko commented Dec 9, 2019

portante commented Dec 9, 2019

syedriko commented Dec 9, 2019

syedriko commented Dec 10, 2019

Provide logging behavior policies applied by conmon to stdout/stderr #84

Provide logging behavior policies applied by conmon to stdout/stderr #84

Comments

portante commented Nov 14, 2019

rhatdan commented Nov 17, 2019

giuseppe commented Nov 18, 2019

mheon commented Nov 18, 2019 via email

rhatdan commented Nov 18, 2019

haircommander commented Nov 18, 2019

portante commented Nov 18, 2019

mffiedler commented Dec 5, 2019

syedriko commented Dec 9, 2019

rhatdan commented Dec 9, 2019

syedriko commented Dec 9, 2019 • edited Loading

rhatdan commented Dec 9, 2019

bparees commented Dec 9, 2019

haircommander commented Dec 9, 2019

bparees commented Dec 9, 2019

syedriko commented Dec 9, 2019

portante commented Dec 9, 2019

syedriko commented Dec 9, 2019

syedriko commented Dec 10, 2019

syedriko commented Dec 9, 2019 •

edited

Loading