Proposal: Logging drivers #7195

Closed
crosbymichael opened this Issue Jul 23, 2014 · 93 comments

Projects

None yet
@crosbymichael
Member

Improved logging support

Topics:

  • Logging drivers
  • Initial logging drivers
  • Default driver improvements

Logging drivers

The driver interface should be able to support the smallest subset available for logging drivers to
implement their functionality. Stdout and stderr will still be the source of logging for containers
in this proposal. Docker will, however, take the raw streams from the containers and create discrete
messages delimited by writes. This parsed struct will then be sent to the logging drivers.

type Message struct {
    // ContainerID is the container id where the message originated from
    ContainerID string 

    // RawMessage is the raw bytes from the write
    RawMessage []byte 

    // Source specifies where this message originated, stderr, stdout, syslog
    Source string

    // Time is the time the message was received
    Time time.Time

    // Fields are user defined fields attach to the message
    Fields map[string]string
}

type Driver interface {
    // Log begins the logging of the stdout and stderr streams for a specific id
    Log(message *Message) error

    // ReadLog fetches the messages for a specific id
    ReadLog(containerID string) (messages []*Message, err error)

    // CloseLog tells the driver that no more log messages will be written for the specific id
    // drivers can implement this to their requirements, it may mean compressing the logs or deleting
    // them off of the disk
    CloseLog(containerID string) error

    // Close ensures that any writes for the logger are properly flushed and can be
    // stopped without data loss
    Close() error
}

When creating or initializing the drivers they will be provided with a key/value map with the user defined configuration specific to the driver. Each driver will also be provided a root directory where it is able to store and manage any type of state on disk.

Initial logging drivers

none - This driver will ignore the streams and log nothing for the containers. This is a totally valid
driver as the docker daemon has to manage the logs for all container it's a memory and performance bottleneck
on the daemon.

default - This driver will be the current implementation of logs that docker currently has. It is a single
file on disk with json objects with the message, timestamp, and stream of the log message separated by a
new line char.

syslog - This driver will write to a syslog socket and use the tag field to insert the container id.

Default driver improvements

One of the biggest issues with the default driver is that there is no log truncation or rotation. Both of
these issues need to be addressed. We can either truncate based on filesize or date. I believe filesize
is better.

Truncation size can default to 10mb with an option when you select the driver to specify additional options.
Rotation can also be set a specific size limit defaulting to 500mb. To change the defaults I propose a
--logging-opt flag on the daemon, similar to --storage-opt for the storage drivers.

Usage

The usage for this feature will be managed via the daemon:

docker -d --logging none
docker -d --logging default --logging-opt truncation=20mb --logging-opt rotation=1gb
@LK4D4
Contributor
LK4D4 commented Jul 23, 2014

What about per container driver choosing? For example I don't want logs for elasticsearch, but I want logs for prosody.

@crosbymichael
Member

A few questions that I have, what should we do about timestamps? Should the read return some type of structured data or should we still manage this as just steams?

Any suggestion and modifications to this proposal is welcome.

@cpuguy83
Contributor

I would still manage them as just streams and allow the implementation to handle it.

@LK4D4
Contributor
LK4D4 commented Jul 23, 2014

Hm, actually I think that we have perfect interface for logging - io.Writer :) And for none we have ioutil.Discard.

@brianm
Contributor
brianm commented Jul 23, 2014

I encourage being able to attach streams to syslog out of the box, while syslog may not be sexy in 2014, it works everywhere.

@crosbymichael
Member

@brianm would you be interested in working on a syslog driver for this initial push?

@brianm
Contributor
brianm commented Jul 23, 2014

Happy to!

@crosbymichael
Member

@brianm sounds good to me. I like to have a few different drives so that it keeps that interface honest and makes sure that we are accounting for different needs within the driver.

I'm guessing for things like syslog we will need to pass options when we create the driver. Maybe something like:

driver, err := syslog.NewDriver("/var/lib/docker/logging/syslog", map[string]string{
    "priority": "1",
    "socket": "/somepath",
})
@jamtur01
Contributor

+1 to syslog driver and config.

@crosbymichael
Member

I just added the syslog driver to the proposal

@kuon
Contributor
kuon commented Jul 24, 2014

I am currently evaluating a gazillion way of getting my logs to the right place with docker (app directly forward log, agent in the container, agent in another container, agent on the host, syslog, ...) and having docker logs directly to syslog would solve it easily. All apps could just use stdout/err.

As per container configuration, we should be able to give the sender name and the facility.

The other possibility is to turn off logging in docker and use systemd or supervisord to forward the logs of each containers to syslog.

@LK4D4
Contributor
LK4D4 commented Jul 24, 2014

If someone miss my first comment:

  • Drivers should be configurable per container
  • Drivers should implement io.Writer, so we have for free null writer and syslog writer from stdlib. This is goish way to do this.
@crosbymichael
Member

I can see the need for this to be per container but configuration will be weird on the daemon running multiple logging drivers.

-1 on the io.Writer, we need to distinguish stdout and stderr in some of the drivers os we cannot use one interface. the steams are already io.Writers coming in so we are still good.

@crosbymichael
Member

I updated the Driver interface to include CloseLog for signaling to the driver that no more logs will be written for a specific id.

@LK4D4
Contributor
LK4D4 commented Jul 24, 2014

we need to distinguish stdout and stderr

Yup, and I'm totally want possibility to have different drivers for them.

@markcartertm

Should the logging driver support a syslog server option ?
This will make it easier to troubleshoot when containers are dynamically assigned to hosts by an orchestration layer.
docker -d --logging syslog

@cpuguy83
Contributor

@markcartertm Are you questioning the idea of including a syslog driver (which is much discussed above and part of the proposal) or did you not see the discussion?

@solidsnack

An implementation that chunked logs by time could be helpful for async log archiving strategies. For example, if logs were stored by minute. I'm not sure how this would interact with the truncation option.

@kuon
Contributor
kuon commented Jul 28, 2014

A first step that would allow "usual" logging processing system to work is to make docker logrotate compliant. At present there is no way to make docker re-create the log files after a rotation without restarting the containers. kill -HUP on the docker daemon restarts all the containers.

@crosbymichael
Member

I think the last question here that needs to be answered is, should this be per container or a daemon wide option?

@cpuguy83
Contributor

@crosbymichael People will want per container with a default set at the daemon level.

@kuon
Contributor
kuon commented Jul 30, 2014

Syslog configuration could be (in order or priority, from low to high):

  • Daemon level default
  • Image default
  • Container config

The configuration at the image level would obviously not include all options (like where to log) but rather what to log (stderr/stdout/both, include timestamp or not or add formatting).

@crosbymichael
Member

@kuon we cannot do anything at the image level because it makes images lose portability. Things like this should be host specific/ runtime dependencies.

@wking
wking commented Jul 30, 2014

On Wed, Jul 30, 2014 at 10:49:07AM -0700, Michael Crosby wrote:

we cannot do anything at the image level because it makes images
lose portability.

Setting defaults at the image level shouldn't compromise portability.
We already do this with other image metadata (e.g. via a Dockerfile's
CMD, ENTRYPOINT, EXPOSE, …).

@cpuguy83
Contributor

@wking all these things you listed are things that happen inside the container.
Log handling happens outside the container.

@crosbymichael
Member

@wking Yes, they do when it's something specific about what type of logging drivers you have installed on a specific docker host. The settings in the image are all portable. This is the reason why VOLUMES /home/michael:/root is not allowed in the image because not ever docker host will have a folder structure with /home/michael

@kuon
Contributor
kuon commented Jul 30, 2014

I guess relying on environment variables (like -e LOGLEVEL=INFO and such) is OK for this use case. It was more an idea than a "thought through" proposal.

@wking
wking commented Jul 30, 2014

On Wed, Jul 30, 2014 at 10:59:23AM -0700, Michael Crosby wrote:

Yes, they do when it's something specific about what type of logging
drivers you have installed on a specific docker host.

Right, so which driver shouldn't be in the image config, but what
gets passed
from the container to the logging interface should be.
@kuon's suggestions:

Wed, Jul 30, 2014 at 10:47:59AM -0700, kuon:

The configuration at the image level would obviously not include all
options (like where to log) but rather what to log
(stderr/stdout/both, include timestamp or not or add formatting).

all sound portable to me.

@shykes
Contributor
shykes commented Jul 30, 2014

I would prefer the logging interface to be message-oriented. It turns out "logs as continuous byte streams" (ie stdout/stderr) are only a fraction of the total universe of logs out there. Basically every logging system - from syslog to splunk to loggly to systemd to logstash - expects discrete messages with a primary payload and key-value metadata attached to it. So I think that should be our logging primitive, and stdout/stderr should be chopped up and converted to discrete messages at the container boundary, before being ingested that way. Of course along the way these chopped up messages can be annotated with extra fields, for example: "stream: stdout" or something similar.

I think our message format should define a strict schema, with reserved fields and a special area for flexible userdata fields.

@crosbymichael
Member

@shykes what is a "discrete" message when we are only dealing with stdout,stderr? Only read until a \n and say that is a message or chunk by the Write that we get?

@daniel-garcia
Contributor

+1 to @shykes comments. A scheme like that would allow container to report all sorts of semi structured data such as white box metrics. For example, metric_name=foo, ts=X, val=Z, tag1=baz

@erikh
Contributor
erikh commented Jul 30, 2014

Logstash internally converts from input plugins to JSON and then uses output plugins to reformat to output to the desired store or transport. Perhaps we could use its model as a source of inspiration?

-Erik

On Jul 30, 2014, at 12:32 PM, Daniel Garcia notifications@github.com wrote:

+1 to shykes comments. A scheme like that would allow container to report all sorts of semi structured data such a white box metrics. For example, metric_name=foo, ts=X, val=Z, tag1=baz


Reply to this email directly or view it on GitHub.

@unclejack
Contributor

Dockerfile level specification of logging driver: -1
This configuration is set on each Docker daemon as needed, the Dockerfile shouldn't specify the logging driver. This would break the reuse of the same image across all environments (dev vs. production and so on).

Default daemon level default logging driver: +1
Being able to specify the driver for each container: +1

@crosbymichael
Member

How are we supposed to get key-value format from this without having people modify their applications to support this?

@daniel-garcia
Contributor

Could the meaning of "discrete" from stdin/stderr be defined by a container level option? The default driver could "do it's best" by breaking on newlines, or writes or whatever; the json logging today breaks up the messages today... just reuse that.

Why can't there exist a new interface at /dev/dockerlog that has a special meaning?

@shykes
Contributor
shykes commented Jul 30, 2014

@crosbymichael for each logging format commonly used by applications, we should have an adapter which does the translation to our internal logging format. As a start we can bundle the 3 most requested adapters: stdout/stderr, syslog, and files. Then perhaps later we can allow 3d-party adapters.

The problem right now is that many applications commonly use syslog and files. If our internal logging API doesn't support those well, then it will be difficult for Docker logging drivers to receive all logs.

About the stdout/err adapter: I think we have the choice between 1) newline-split and 2) write-split. I think write-split makes more sense, it allows multi-line logs, as long the applications writes them as a single write(2) call. I would like to discuss this part more, I agree it's important.

@digital-wonderland

What I am currently missing in this discussion, is the ability to handle in container log files - i.e. Elasticsearch logs to three different files bellow /var/log/elasticsearch/.

If I then could add something like LOGFILE ["/var/log/file1","/var/log/file2",...] to get those files integrated into Dockers log handling, it would definitely be neat.

Besides that I would need the ability to add some metadata - i.e. Logstashs 'type' to later figure out what parser to use for that line - so I would have something like { type:"foo"; msg: "The log message"}. (This is also valid for 'normal' Docker logs - i.e. depending on the image I might have to use different parsers to handle log messages)

So, for something like the above, being able to add some logging meta data, would definitely make sense within a Dockerfile.

On the host I then would configure the encoder (i.e. JSON encoded) and the driver (i.e. forward to Logstash) and would like to have the ability to override anything set via Dockerfile (i.e. forward only one of those 3 elasticsearch logs).

@shykes
Contributor
shykes commented Jul 30, 2014

@digital-wonderland see my previous comment, in-container files would be collected with the appropriate adapter.

  • file -> dockerlogs adapter
  • stdout -> dockerlogs adapter
  • syslog -> dockerlogs adapter
  • etc.
@digital-wonderland

@shykes thanks, I posted before I saw your comment, this sounds fine.

Besides that I just would like to be able to add some metadata, so I can later sort out the individual logs easier. This is possible via the image name but being able to add some log specific key=value pairs within a Dockerfile (as overwritable defaults) would be pretty helpful.

@crosbymichael
Member

I'm sorry but I was not aware of these such requirements and they totally conflict with the current proposal.

The requirements that I was aware of were that containers still output logs via stdout/stderr and we needed to improve the way that docker aggregates these and provide more destinations than just one massive file on disk. Taking the current logs and adding truncation and rotation for our default logging driver and adding new destinations such as forwarding these logs to syslog.

What is now being suggested are adapters that are injected into containers. This also implies that we will not only have to implement a client but also the daemon of these logging methods and understand the protocols within docker. In order to inject a syslog socket into the container then interpret writes from container processes, we will have to implement the syslog protocol and have a daemon running on the other side of the socket to read the writes from container processes and interpret/forward them to the correct destination.

If adapters that can be inserted into a users containers are the real requirement then I suggest we close this proposal and start on a new one.

@shykes
Contributor
shykes commented Jul 31, 2014

@crosbymichael you're right we don't need to ship a syslog or file adapter. We can stick to the current functionality of only collecting stdout/stderr. However I think it's worth doing a transformation along the way into discrete messages, because in practice I think 100% of logging drivers will need that transformation. So instead of letting each driver re-implement transformation, I think it's better if we do it in the core and the drivers only need to deal with discrete messages. Presented in this way, I think it's not a huge change to the proposal.

I think I got ahead of myself in suggesting multiple adapters etc. That can easily be done later (but note that when we do, we won't need to change the driver API, which is a good thing!)

So, to rephrase my suggestion in 2 phases:

Phase 1 (this proposal)

stdout -----|
            |---> (convert to messages) ---> (log subsystem) ----> (log drivers)
stderr -----| 

Phase 2 (later, but compatible with this proposal)

stdout ---|
          |
stderr ---|
          |---> (convert to messages) ---> (log subsystem) ----> (log drivers)
syslog ---|
          |
files ----|
@kuon
Contributor
kuon commented Jul 31, 2014

There is something I thought of when I proposed configuration at the image level that I forgot to mention.

It is the ability to create additional ouput device within a container, for example in a Dockerfile:

LOG ["access", "error"]

Would mount

/dev/access_log
/dev/error_log

within the container.

This would allow all apps to redirect their output to named streams. The above would suite an nginx configuration for example.

Maybe we can provide two drivers, like

LOG ["dev", "access"] (would create /dev/access_log as a byte stream)
LOG ["syslog", "error"] (would create /dev/error_log as a syslog destination)

As for handling files, I think docker shouldn't do anything with them, it's really easy to redirect them to a stream from within the container, the missing "link" is the ability to have named streams.

@crosbymichael
Member

thanks @shykes i'll update the proposal tomorrow morning with the discrete message changes.

@ajackson

@shykes With the new proposal does this mean it wouldn't be possible for an external application to have access to the direct streams from stdout / stderr? Meaning that external applications would have to consume from the log subsystem via a driver as the only way to get output from the container?

@cpuguy83
Contributor

@ajackson One should still be able to access the raw streams by attaching to the container.

@brianm
Contributor
brianm commented Jul 31, 2014

I think having docker take over syslog is... interesting, but much more invasive. I agree with focusing on stdout/stderr handling.

I still believe that a {stdout, stderr} -> syslog-on-host driver should ship with docker in order to integrate with existing infrastructures.

@brianm
Contributor
brianm commented Jul 31, 2014

Taking over syslog on the container is very reasonably done via "-v /dev/log:/dev/log" though this does lead to inconsistent hostname in log lines coming from the container -- loggers which log the full syslog datagram structured correctly will generally use the container hostname, loggers which just do line-oriented and rely on the syslog daemon to make it correct will get the host hostname.

Writing a syslog daemon into docker smells wrong. If this needs to be done then something in the design feels broken.

@wking
wking commented Jul 31, 2014

On Thu, Jul 31, 2014 at 08:47:53AM -0700, Brian McCallister wrote:

Taking over syslog on the container is very reasonably done via "-v
/dev/log:/dev/log" though this does lead to inconsistent hostname in
log lines coming from the container -- loggers which log the full
syslog datagram structured correctly will generally use the
container hostname, loggers which just do line-oriented and rely on
the syslog daemon to make it correct will get the host hostname.

Searching through RFC 5424, I don't see anything about
“line-oriented”. I do see that the HOSTNAME field is optional 1,
which may be where the “host syslog daemon using it's hostname as the
default” issue comes up. Perhaps that's just a client-app issue
though? If this is fixable in the client app, I don't see any problem
with using ‘-v /dev/log:/dev/log’.

@brianm
Contributor
brianm commented Jul 31, 2014

Part of syslog is "accept anything you possibly can" in a datagram. There is a correct structure for the datagram, but given incorrect a syslog daemon SHOULD do the best it can to log whatever it receives. It is not line-oriented, but is datagram-oriented

$ echo "hello world" | nc -uU /dev/log

any good syslog daemon will accept it. Some things take advantage of this, even if they shouldn't.

@crosbymichael
Member

I just updated the proposal for discrete messages to the drivers.

@daniel-garcia
Contributor

Could the Message.ID be renamed to Message.ContainerID? ID seems to imply that is the message's unique name.

@crosbymichael
Member

@daniel-garcia good point

@crosbymichael
Member

@shykes updated

@daniel-garcia
Contributor

What is the id in Driver.Read() & Driver.CloseLog? containerID? Should the call return back a channel of Messages instead (to avoid overhead of large reads)?

@ubermuda
Contributor
ubermuda commented Aug 1, 2014

I find the CloseLog and Close functions naming confusing. What about:

  • CloseLog -> Close
  • Close -> Flush

Since your Close function's purpose is actually to "ensures that any writes for the logger are properly flushed", I think this would make more sense.

@pandrew
Contributor
pandrew commented Aug 1, 2014

+1

@crosbymichael
Member

@ubermuda the only problem with that is that they do not match Go's io.Closer semantics. Maybe that is ok in this situation tough

@crosbymichael
Member

@daniel-garcia maybe ReadLog needs to take a predicate func so that you can filter the type of results you want? Adding a channel to the interface adds a little complexity for the consumer.

@tiborvass
Contributor

@crosbymichael Why not mimick io.Reader then?
ReadLog(id string, messages []*Messages) (n int, err error) ?

@crosbymichael
Member

I didn't' name the methods Write and Read because if they don't implement those standard interfaces in go I didn't want to have the same name.

@daniel-garcia
Contributor

@crosbymichael you are right about the complexity on the consumer. The predicate function could help is some aspect (the return result) but not in the amount of work that has to be done in serialization/deserialization at the driver level. Most likely after you get them in to the destination driver (syslog, logstash, hadoop) you won't really care about how to get them out through this interface anyway (or at least it may be a seldom operation). It's good the way it is.

@tiborvass
Contributor

@crosbymichael I was not suggesting to use Read or Write, but more to mimick the signature, so that we limit the number of messages to be sent, by len(messages) that is specified by the client.

@jdef
Contributor
jdef commented Aug 4, 2014

@crosbymichael Message.Source mentions "syslog" as a possibility, and that Docker will (eventually) be listening on UNIX socket (e.g. /dev/log) inside a container for messages, and then passing them along to a driver implementation? It sounds like Driver is really an OutputDriver that:

  • stores generated logs, and;
  • lets you look at the logs generated by a container.

Whereas an InputDriver would be responsible for collecting the logs to begin with, listening on perhaps:

  • stdout and stderr
  • /dev/log inside a container;
  • a (localhost:)UDP port inside a container for log messages;
  • a (localhost:)TCP port inside a container, etc.

And perhaps where and how syslog is listening could be exposed to the container via environment variables? For example:

DOCKER_LOGGING=syslog  # or: syslog,stdout,stderr
DOCKER_LOGGING_SYSLOG_SOCKET=unix,udp
DOCKER_LOGGING_SYSLOG_SOCKET_UNIX=/dev/log
DOCKER_LOGGING_SYSLOG_SOCKET_UDP=514

At least with UDP if nothing is listening then the client process doesn't get backed up (but you would lose logs). If you wanted multiple streams, something like:

DOCKER_LOGGING_SYSLOG_SOCKET_UNIX=/dev/error_log,/dev/access_log
@jdef
Contributor
jdef commented Aug 4, 2014

Re-read initial description, along w/ comments from @crosbymichael and @shykes, and now understand consuming syslog's generated by a container is not a part of this proposal. Is there a proposal to deal with this?

@zepouet
zepouet commented Aug 7, 2014

Hello,

I have been reading all comments about this new feature proposal and I would like to make some comments. As a matter of fact, I have actually been working those last weeks on tool to retrieve and anlayse the docker's logs. To share where I am coming from: we are using the docker for our "PaaS homebrew solution". Log Management is a critical function for us.

At start, we have coded the whole logic to retrieve logs and expose them to users.
We were about to add new features when we decided to run a gap analysis of existing solutions.
And we decided to take one step back and select LogStash.

Our logic is to feed logtash with the logs produced by our apps. We use either file or sylog (and its flavor syslog4jjappender for the java apps). Then we chosse Elasticsearch to create a datawarehouse of logs (sorting/indexing), irrespective of the containers.
Additionally deleting a container wont delete the log.

We believe the decision to use << stdout/stderr, file, syslog >> or << configuration runtime/dockerfile >> is quite relevant
and we consider several options.

My humble opinion is hereafter:

  • we shall leave people (sysadmin or devops team) use tools they are familiar with (Kibana, Logstash... no limit list).
  • They shall not be asked to deal with configuration of containers.
  • Also all those log reader tools already come with plugin to parse various log formats. That would a tremendous work to recode everything in Docker, even using the modular approach of plugin.
  • we shall leave developper/container architect make a decision on log file to be retained, based on severity level (error, warn, info, debug, custom... ). Possibly in a static manner based on the fact that developpers are the most relevant person to know/determine where are relevant infos/logs.

We therefore would like to propose a new instruction for DockerFile

LOGS[TAG]=
LOGS[DEBUG]=/var/log/apache2/access.log
LOGS[DEBUG]=/opt/tomcat/logs/valve-access.log
LOGS[INFO]=/opt/tomcat/logs/catalina.out
LOGS[ERROR]=/var/log/apache2/error.log
LOGS[ERROR]=/var/log/apache2/mod_jk.log
LOGS[ERROR]=/var/log/apache2/mod_rewrite.log
LOGS[ERROR]=/opt/tomcat/logs/error.log

Keywords used above (DEBUG, ERROR, ...) are free text (i.e. a folksonomy as opposed to a taxonomy).

Therefore people would free to add new keyword such as "AUDIT".

LOGS[AUDIT]=/opt/tomcat/logs/audits.log

Even better they could gain access to an archive folder for rotating logs.

LOGS[ARCHIVES]=/archives/**/*.log [WILDCARD pattern inspired by ruby syntax]
  • The Docker API will allow to retrieve all logs of a given container based on one or multiple tags (ex. INFO + AUDIT). From there a client (may that be a a human person or a software) would query logs based on software call over an API (Rest API or Socket).

  • The log driver of Dock shall be able to retrieve and list all tags attached to a container : "docker logs -list 1e4328"

    {
    "debug" : [ 
          { "file1" : "/var/log/apache2/access.log" },
          { "file2" : "/opt/tomcat/logs/valve-access.log" } 
    ],
    "info" : { "file3 ": { "opt/tomcat/logs/catalina.out" },
    error : [ 
           { file4 : "/var/log/apache2/error.log" },
           { file5 : "/var/log/apache2/mod_jk.log",
           { file6 : "/var/log/apache2/mod_rewrite.log",
           { file7 : "/opt/tomcat/logs/error.log" },
    ],
    "audit" : [ { file8 : "/opt/tomcat/logs/audits.log" } ] 
    "archives" : [ 
          { file9 : "/archives/logs/catalina.1.log" } ] 
          { file10 : "/archives/logs/catalina.2.log" } 
      ] 
    }
    
  • The log driver of Dock shall also be able to scroll backward file of a given tag with command : "docker logs -tags:AUDIT+INFO 1e4328"

    • n last lines, just like a tail command would do. Options: --tail:n
    • only new lines since last call. Option --sinceLastCall
    • to grep the lines directly without returning all informations to the client 👍
      docker logs -grep 'mySpecificMessage' 1e4328
    • also an option --json to retrieve a json based file with extra informations (file source/origin details + extra metadata). Default false.
    • driver shall also be able to retrieve directly a file ex.
      docker logs -keys:file3 1e4328

Usage example:
1 - Lets assume an admin guy in need to get a list of all errors of a given app would launch:

docker logs -tags:ERROR 1e4328 --tail:500

He would receive the last 500 error lines - cross apps - that the developer classified as relevant in case of errors.

2 - Lets now assume a software apps
It may call on a regular basis the Docket API with

 docker logs -tags:INFOS 1e4328 --sinceLastCall 

It would receive only new lines to do whatever processings it needs (storage, trigger alert, ...)

My opinion is that log rotating and archiving shall fall under the responsibility of individual products that produce logs.
Log4j or LogBack are good examples in the Java world.

The core added value of Docker would be to focus on an easy access to those tagged logs (through file, json or Rest API). The static configuration into DockerFile could be too external (properties, json of xml format) and given at runtime for each container. So administrator could so cancel and override a default developper/architect dockerfile configuration.

We could imagine in the future to have a Message Queue or an event broker (publish/subscribe) to inform external application. A such feature is not relevant to the current question. It is more general.

Thanks you if you read me :-) And Thanks you for the previous comments too !
Best regards,

Nicolas


" César - Et ce Paris, c'est vraiment beaucoup plus grand que Marseille ?
M. Brun - A Paris, j'ai vu au moins cinquante Canebière ! "
(Marius - Scène 3, Marcel Pagnol, Adaptation de 1946)

@vbalazs
vbalazs commented Aug 11, 2014

@zepouet I think the feature you described shouldn't be a part of Docker, there are tools out there which are doing similar things better. We shouldn't try to reimplement those.

Long running conversation on docker-dev list with several use cases and important points: https://groups.google.com/d/topic/docker-dev/3paGTWD6xyw/discussion

@discordianfish
Contributor

I agree with @LK4D4: Per container logging drivers would be very useful since for anything with high log output (load balancers, caches etc - basically every service in a traffic tier that does per-request logging) this logging would probably become bottleneck.

Beside that, people have different auditing requirements, it's common that you just can't lost logs, so it should be possible to not rotate logs and rather stop the container than having it do stuff without being able to log them.

@bwhaley
bwhaley commented Aug 20, 2014

@jdef in @shykes comment he says phase 2 of a logging subsystem will handle syslog output via the same mechanism as stderr and stdout. So I see no harm in using -v /dev/log:/dev/log until then.

@jdef
Contributor
jdef commented Aug 20, 2014

@bwhaley that's the plan. Though it can complicate using docker with
various orchestration tools.

--sent from my phone
On Aug 20, 2014 1:33 AM, "bwhaley" notifications@github.com wrote:

@jdef https://github.com/jdef in @shykes https://github.com/shykes
comment
#7195 (comment) he
says phase 2 of a logging subsystem will handle syslog output via the same
mechanism as stderr and stdout. So I see no harm in using -v
/dev/log:/dev/log until then.


Reply to this email directly or view it on GitHub
#7195 (comment).

@crosbymichael
Member

I guess one of the last questions is if this should be per container or a daemon level option. @shykes what do you think?

@shykes
Contributor
shykes commented Aug 27, 2014

I would say we can start daemon-wide, and then add per-container later?

@crosbymichael
Member

Yes, I would feel more comfortable defining these types of settings on the daemon, at least initially until we can see how they are being used.

@zjeraar
zjeraar commented Aug 29, 2014

I really like where this is going, totally agree with @shykes let's first do this daemon-wide. I'd say per-container would be a nice-to-have for now.

For the syslog driver, I'd really like to see the container name in the tag field.

@randywallace

IMVHO I think a syslog plugin for docker is completely pointless. Syslog daemons that run on practically every distribution already support everything that has been discussed here. This comment outlines precisely why I feel this way. This is not meant as a flame, but as an alternative discussion that relieves some pressure off the docker devs and puts more responsibility on the Dockerfile maintainer where in this case I feel (again, IMVHO) it belongs.

I am not saying that the logging doesn't need some work; the piece about handling/rotating the stderr/stdout of the container itself is incredibly useful b/c for long-running containers pushing a lot of logs to those pipes results in the issues previously described regarding disk usage. This will at some point need to be solved, though, to cover the bevy of trusted builds that currently send everything to stderr/stdout.

configuring syslog output within the container

I find that the following options work beautifully (these should obviously be expanded):

  • Most apps themselves provide syslog output via configuration. If one doesn't, it should (and probably can be setup but just isn't documented very well). This is especially true for java apps via slf4j, logback, log4j, etc... etc.. Dockerfiles should modify/ADD correct syslog daemon configuration endpoints. My example is for elasticsearch's logging config (this is for @LK4D4) usually found in config/logging.yaml. The conversionPattern could be mangled by a startup script via sed, etc.. to throw-in the container id, hostname, whatever you want (instead of elasticsearch:, or perhaps nothing if you are fine with just the hostname showing up in the syslog). Here is the relevant snippet (I didn't include the default console appender and level in this example):
rootLogger: INFO, syslog
appender:
  syslog:
    type: syslog
    header: true
    syslogHost: <THE_HOST_SYSLOG_DAEMON>
    Facility: USER
    layout:
      type: pattern
      conversionPattern: "elasticsearch: [%p] %t/%c{1} - %m"
  • A wrapper startup script that exec's out everything to logger. For non-daemonized processes running in a wrapper, this just magically works and handles stderr/stdout appropriately (written as a boilerplate that can be modified to run in a sourced file easily)
#!/bin/bash

if host syslog > /dev/null 2>&1; then HOST_SYSLOG_DAEMON=$(host syslog | head -n 1 | cut -f 4); fi

_enable_syslog=${SYSLOG:-true}
_host_syslog_daemon="${HOST_SYSLOG_DAEMON:-172.17.42.1}" # perhaps loaded by --env/-e; ${VAR:-DEFAULT} notation sets default if ENV variable does not exist
_unique_proc_name="randywallace/test_syslog_image"
_facility='local0' # or local1 thru local7, cron, user, etc...
_syslog_and_stdout_stderr=${TEE_OUTPUT:-false} # true/false; also could be a --env

# docker run -e TEE_OUTPUT=true -e HOST_SYSLOG_DAEMON=1.2.3.4 -d my_image
# or
# docker run --link my-rsyslog-container:syslog -d my_image
# or disabled completely
# docker run -e SYSLOG=false -d my_image

__logger() {
  local LEVEL=${1:-info}
  sed -u -r -e 's/\\n/ /g' -e 's/\s\-{3,}/;/g' -e 's/\-{3,}\s//g' |\
  /usr/bin/logger -p ${_facility}.${LEVEL}  -t "${_unique_proc_name}[$$]" -n "${_host_syslog_daemon}"
}

run_logger() {
  if $_enable_syslog; then
    if $_syslog_and_stdout_stderr; then
      tee -a >(__logger $1)
    else
      __logger $1
    fi
  else
    if [ "$1" = "err" ]; then
      cat >&2
    else
      cat
    fi
  fi
}

# Catch all STDOUT and STDERR traffic
exec > >(run_logger info) 2> >(run_logger err)

log() { echo "INFO: $*" | run_logger ; }
error() { echo "ERROR: $*" | run_logger err ; }
critical() { echo "EMERGENCY: $*" | run_logger emerg; exit 1 ; }
alert() { echo "ALERT: $*" | run_logger alert ; }
notice() { echo "NOTICE: $*" | run_logger notice ; }
debug() { echo "DEBUG: $*" | run_logger debug ; }
warning() { echo "WARN: $*" | run_logger warn ; }

log "info"
error "error"
alert "alert"
notice "notice"
debug "debug"
warning "warning"

echo "STDOUT output"
echo "STDERR output" >&2

critical "critical... exiting"

identifying the host to receive syslog traffic

  • use an ENV setting in the dockerfile (see wrapper example above) to indicate your preferred default syslog host. Or, for public Dockerfiles, use a default config (SYSLOG=false in the wrapper above) that is caught at startup to disable syslog output.
  • use a docker container with a volume on /var/log to the host (perhaps on /var/log/docker/syslog/ at the host) and a syslog daemon (I use rsyslog personally). Then EXPOSE the syslog port (514) and link that container to your other containers and specify that link alias in your wrapper (no need to specify the dynamic IP b/c it shows up in /etc/hosts, an example is given in the wrapper that uses 'syslog' for the link alias).
  • Use the actual host daemon, if there is one (boot2docker does not have a syslog daemon, so I use a container and volume). This defaults to 172.17.42.1 unless docker is configured differently, but I don't ever need to change that so I set this IP statically. It would be nice if the docker0 Bridge Gateway IP was configured in /etc/hosts on the containers so that I could specify that in cases in which I may need to change the bridge subnet or something. It may already be there, food for thought.

Profit

  • The hostname of the container shows up in the syslog in all cases. Why not set this when you run the container to something useful? If you're forwarding logs from syslog to logstash/splunk/etc... The IP of the forwarding syslog server will show up, so you can always identify where container X came from.
  • The syslog daemon does not have to exist on the same host as the container. Why fight with tailing /var/log/syslog on 10 docker hosts if you could do it on one?
  • You can use syslog daemon configs to do whatever you want with that stuff getting thrown at it to include

Conclusion

Solving the problem of logging is not a new one, and I seriously doubt that docker could create enough plugins, command line options, etc... to satisfy everybody. This is why rsyslog, syslog-ng, syslogd, papertrail, logstash, graylog2, splunk, fluentd, etc... exist. We've already seen this battle start here, and I don't want to be around when the smoke clears. I hope what I've said here, though, may help some of you to come up with your own solutions that could be working today!

And, if you have problems with the container's logs getting too full (those that are generated from stderr/stdout), don't send them there at all and use my example wrapper above to get rid of that problem completely!

@kuon
Contributor
kuon commented Aug 30, 2014

For what's worth, I am now using systemd to launch containers and forward logs to syslog, in addition with logrotate in copytruncate mode. This works fine..

I am not arguing in one way or the other, just saying that this setup works today and give per container configuration option.

@LK4D4
Contributor
LK4D4 commented Aug 30, 2014

@kuon This is good way, but docker internal mechanism of writing logs to stdout is not perfect. So if you write long lines or you have high flow of logs - you will get huge memory and CPU overhead just for writing logs to container stdout. So having native syslog support will be great anyway.

@cpuguy83
Contributor

@randywallace The point is to do something with the collected logs that we already have in Docker. Instead of forcing people to implement something on their own, Docker can provide the facility to do it without having to hack stuff around (like running a syslog daemon inside your container).

@ilowe
ilowe commented Sep 9, 2014

I would like to echo the sentiment of @brianm and @zepouet and maybe suggest that there are really two discussions here:

The first one is, frankly, up to @crosbymichael and co. and it concerns how Docker handles an issue with the current implementation of logging. I think the guys are trying hard to come up with forward-facing solutions that will provide paths to new features, and I applaud that effort.

The second discussion, however, is being alluded to in previous comments; that is: is it appropriate for Docker to dictate a "logging framework"? No matter how many drivers "we" add, no matter how many options and config files, the tacit assumption becomes "everybody does logging like (or a subset of how) Docker does it".

The UNIX way is to do one thing and do it well. In the case of logging, this means let syslog do the work; for rotating logs we have logrotate, etc.

I think it would be better to have more intelligent ways to handle mounting and cross-mnt namespace access so that solutions described above like mounting /dev/log actually work. In the real world, I can't just drop all my YetAnotherSyslog code because I want to containerize something and I don't want to be a second-class citizen just because of that.

@randywallace provides an example of how easy it is to setup logging already using existing tools. I just think we're not thinking outside the box on how to provide a generally useful solution that also handles this case.

All of this is without mentioning the performance issues in containers at high load if the Docker daemon has to handle each packet. In high traffic situations, this is an absolute non-starter. We need to have access to kernel primitives at this point and a multi-layered userspace logging solution is going to force me to disable it and cobble my own each time.

As I said above, the more immediate decision is important for handling issues with log management. Of course, that should be solved. But I would hate to feel that Docker as an organization was wasting money and time building stuff I already have. The featureset so far is so out-of-the-box (both in terms of innovation and usability) that I really hate to see such a mundane and already-solved concern become the responsibility of the docker daemon.

To head off and forestall any other comments to the effect that "wouldn't it be nice if docker managed your logfiles" let me say that yes, it would. I would also like a built-in webserver so that I can launch my Node.js apps. I just don't feel that beyond the scope of improving the current stdout/stderr system this path leads anywhere but to having a whole group of people disabling docker log management and bending over backwards to use something else.

Of course, this all should be taken in the context of an assumption that the goal is to have advanced container technology that does it's thing and otherwise let's you do what you want (ie. maintaining neutrality). If docker is becoming a bit more of an "app hosting platform" where you can fully customize the inside and all the plumbing is handled for you, then this is definitely the way to go.

In case code speaks louder than words, I'm willing to work on a PR with a sketch of something if I get at least two people who will read it.

@frank-dspeed

I reviewed it all i think the only thing that needs to happen is that the log files get rotate able all else will break my existing setups probally and if i need logs from a process i can gather them via the dockerhost on os level via tools i don't need docker to handle that

@discordianfish
Contributor

My 2 cents: It should be easy to get some default logging (aka something without that huge memory and CPU overhead) and possible (without unnecessary overhead) to integrate your own logging as @randywallace suggested, all that as lean as possible. Therefor I wouldn't try to interprete the logstream and just implement the bare minimal features for the default logging (truncated, rotate and maybe some 'tailing' to get only the last x lines).

@kuon
Contributor
kuon commented Sep 15, 2014

@frank-dspeed You can rotate docker logs with logrotate using copytruncate, see #7333

@tve
tve commented Sep 25, 2014

After reading all the comment I still feel that docker needs to do something to simplify logging. Some comments mentioned using -v /dev/log:/dev/log but that apparently doesn't really work because the link gets broken if syslog is restarted because it creates a fresh /dev/log leaving all running containers logging into a dead pipe. One can work around that by moving /dev/log to a directory of its own, such as /tmp/syslog/log and doing -v /mnt/syslog:/dev (suggested in http://jpetazzo.github.io/2014/08/24/syslog-docker/), but now all containers share /dev.

Suggestions made by @randywallace don't help me at all unless I'm missing something, many apps don't have the capability to log to a remote syslog, they expect a local syslog device.

@mhart
mhart commented Oct 8, 2014

Any news on this?

What @tve and others have said is very important for anyone using -v /dev/log:/dev/log – the link does indeed get broken if syslog restarts:

$ docker run -d -v /dev/log:/dev/log ubuntu sh -c 'while true; do logger hello; sleep 5; done'

$ tail -f /var/log/syslog
2014-10-08T04:04:17.009793+00:00 notice logger: hello
2014-10-08T04:04:22.014052+00:00 notice logger: hello
2014-10-08T04:04:27.018377+00:00 notice logger: hello
^C

$ sudo restart rsyslog

$ tail -f /var/log/syslog
... no new logs from docker container

Which makes this a very brittle solution...

@afolarin

Is there a reason I shouldn't just (as of v1.3) if I want to inspect a container by container, log file by log file ?

$ docker exec -it my-container cat /path/to/my.log
@wking wking referenced this issue in docker/docker-registry Oct 23, 2014
Closed

NG: Logging #635

@randywallace

@mhart You shouldn't need to mount the log device. If your syslog daemon on the host is listening on the gateway of the docker bridge (172.17.42.1 by default), this should work just fine, even across host syslog daemon restarts:

docker run -d --name logger_test ubuntu:raring /bin/bash -c 'while true; do /usr/bin/logger -n $(grep default < <(ip route) | grep -Eo "([0-9]{1,3}[\.]){3}[0-9]{1,3}") -p user.info -P 514 -i -t logger_test -u /tmp/unused hello; sleep 1; done'
@lennartkoopmann

Graylog2 developer here. Not having much practical experience with Docker yet so I can't go much into Docker configuration specifics but I thought I'd join and leave a few comments based on my experience from building a logging system in the last 4-5 years:

Keep it as simple and hand off the logging to for example the local syslog subsystem as soon as possible. Tools like rsyslog or syslog-ng have spent enormous amounts of time to let the user flexibly configure stuff. Simple tasks like choosing the facility are easy to build but you can spent a lot of time implementing different TCP syslog framing methods for example. Do not try to build anything yourself that the popular syslog daemons are doing already. They should be available on basically every platform that Docker runs on.

If you want to ship Docker with log management capabilities like basic search, archiving or live tailing then use a log management system that already exists. Graylog2 for example has REST APIs that can be built upon while all the data management is abstracted. Even implementing something like log rotation yourself can go wrong in many different ways and cause OS compatibility nightmares.

You will also think about avoiding a Docker log silo that only contains Docker logs. You need to have all your logs (network hardware, OS, applications) in one place for proper correlation,

Key=Value pairs are a good way to structure data. There is also GELF, which Graylog2, Logstash, fluentd and nxlog speak. Structured syslog as defined in RFC5424 is probably the most compliant approach but could cause issues with maximum message length.

Just my suggestions. :)

@maximkulkin
Contributor

Hey guys, any volunteers to try #9513 patch and provide some feedback?

@crosbymichael
Member

I think @LK4D4 said my current proposal here is a little too complex and he is looking into something much simpler.

@LK4D4
Contributor
LK4D4 commented Jan 30, 2015

@crosbymichael Not very much simpler, but yes :) I'll prepare "proposal with code"

This was referenced Feb 4, 2015
@jessfraz jessfraz added the feature label Feb 26, 2015
@icecrime
Member

I think this is closed by #10568.

@icecrime icecrime closed this Mar 31, 2015
@varshneyjayant

Can we configure it to send applications logs to rsyslog of host machine?

@oncletom

@psquickitjayant by using --log-driver=syslog :-) cf. https://docs.docker.com/reference/run/#logging-drivers-log-driver

@varshneyjayant

@oncletom Thanks for the information. According to my understanding, we send stdout / stderr logs to host syslog. Possible to send logs from applications running inside container like Apache, cron, Nginx etc.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment