Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An Analysis of Conmon Container Logging Behaviors with Recommendations #262

Open
portante opened this issue May 27, 2021 · 1 comment
Open

Comments

@portante
Copy link

portante commented May 27, 2021

A detailed analysis of the container logging behavior in conmon as of tag v2.0.28.

Summary recommendations:

  • Match read size to actual pipe(7) size (64 KB by default on Linux)
    • This will reduce the number of partial log records emitted
  • Align read buffer on page boundary, don't use leading byte just for remote consoles
  • Consider reading from pipe(7) until EWOULDBLOCK
  • Raise IO vector count to 1,024 to reduce number of writev(2) system calls per buffer read(2)

An Analysis of Conmon Container Logging Behaviors

Peter Portante, May 2021
Based on the v2.0.28 (GitHub tag) code base

  1. conmon uses non-blocking pipes for reading for the stdout & stderr pipes of a container

    a. Pipes created at lines 163 [3] and 178 [4] of conmon.c using the pipe2() [5] system call, with the O_CLOEXEC flag

    • Pipes are 64 KB in size, the Linux default

    • Does Go change the size of the pipes dynamically?

    b. Registers stdio_cb(), line 17 [1] of ctr_stdio.c for poll() events

  2. In response to being invoked, stdio_cb() in turn calls read_stdio() [2], which reads from the pipe file descriptor using a STDIO_BUF_SIZE sized buffer (+2 as described at [7])

    a. Where STDIO_BUF_SIZE is 8 KB [6], for a total of 8,192 bytes used in memory

    b. [Flag] This means that 8 read system calls are required to drain the pipe

    • [Question] Does the Go integration change the pipe size to 8,192 bytes?

    • The read_stdio() only makes one read(2) system call

      • [Question] Why is only one read(2) system call made? Does that allow the polling loop to fairly round-robin between reading from stdout and stderr?

      • [Flag] We should consider having the read_stdio() continue reading until read(2) returns EWOULDBLOCK

    • [Flag] The STDIO_BUF_SIZE should be 64 KB to allow the most efficient use of system calls

    c. [Flag] This also has the effect that 8 KB reads increases the number of "partial" log records emitted by conmon, since the pipe can fill up while conmon is processing a read.

    d. The read buffer is a stack local variable in read_stdio(), but the target address used in the read(2) system call is moved by one byte [7], which means any reads returning >= 4 KB of data will result in 2 pages being used on the stack, with a maximum of 3 pages used

    • [Flag] This is only used for the remote consoles case [8], which means you could avoid the extra page use having write_back_to_remote_consoles() write the extra byte only when remote consoles are present

    d. When 1 or more bytes are returned by the read(2) system call in read_stdio():

    • write_to_logs() is invoked to write the logs to disk, which in turn calls write_journald() and write_k8s_log()

      • For now, we ignore write_journald() for the rest of this analysis

      • It is expected that the entire buffer returned by the read(2) system call is processed and written to disk

    • write_back_to_remote_consoles() is called to satisfy any remote consoles

      • For now, we ignore this case for the rest of this analysis
  3. Analysis of write_k8s_log()

    a. The approach taken by write_k8s_log() is to avoid memory copies of the data to be written to disk by using IO vectors with the writev(2) system call

    b. 128 vectors [9] are allocated [10] as a maximum to be used while processing any one buffer

    c. write_k8s_log() creates a time stamp once at the beginning and uses it as the time stamp value for all the logs it processes and writes

    d. A "log" is defined to be all the bytes from the previous "newline" character up to, and including, the next newline character

    • At the end of the buffer, if no newlines are found, a special marker is used to record that a "partial" log record was processed

    • The start of a buffer assumes the beginning of a new log entry

    e. When a new line is found, 3 vectors are used:

    1. time stamp buffer [12], 44 bytes [11]

    2. partial / full tag, 2 bytes [13]

    3. actual log contents, N bytes [14]

    f. A fourth vector is used when no "newline" characters are found in the remaining bytes of the buffer being processed

    g. For every "newline" character encountered, 46 bytes are added to the output stream

    h. 128 vectors can represent a maximum of 42 log entries (2 vectors unused)

    • For an 8,192 byte buffer, that is an average size of 195.048 bytes per line in order for one writev() system call to be used to write all bytes read to disk.

    • If the average line length is >= 196 bytes, fewer vectors are required.

    • If the average line length is < 196 bytes and >= 98 bytes, 1 additional writev() system calls will be required: 2 writev() calls per read().

    • If the average line length is < 98 bytes and >= 66 bytes, 2 additional writev() system calls will be required: 3 writev() calls per read().

    • If the average line length is < 66 bytes and >= 49 bytes, 3 additional writev() system calls will be required: 4 writev() calls per read().

    • Recommend 1,024 IO vectors to handle >= 25 byte average lines with 1 system call, and >= 13 byte average lines with 2 system calls

      • The IO Vector data structure consists of a 4 byte int and an array of 16 byte struct iovec entries, takikng up 4 pages of the stack
    • Really, we should be implementing an alternate format of writing what was read from the pipe directly to disk so that the data stream from either pipe does not impact the behavior of conmon, see WIP - Proposed Stream Log Writer cri-o/cri-o#1605

    • Here is a table how this works out as the average line length shrinks (byte ranges, and number of writev(2) system calls):

      < 49 >= 40   5
      < 40 >= 33   6
      < 33 >= 28   7
      < 28 >= 25   8
      < 25 >= 22   9
      < 22 >= 20  10
      < 20 >= 18  11
              17  12
              16  13
         15 & 14  14
              13  16
              12  17
              11  18
              10  20
               9  22
               8  25
               7  28
               6  33
               5  40
               4  49
               3  66
               2  98
               1 196
    

[1] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_stdio.c#L17
[2] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_stdio.c#L101
[3] https://github.com/containers/conmon/blob/v2.0.28/src/conmon.c#L163
[4] https://github.com/containers/conmon/blob/v2.0.28/src/conmon.c#L178
[5] https://man7.org/linux/man-pages/man2/pipe.2.html
[6] https://github.com/containers/conmon/blob/v2.0.28/src/config.h#L6
[7] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_stdio.c#L103
[8] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_stdio.c#L131
[9] https://github.com/containers/conmon/blob/v2.0.28/src/utils.h#L224
[10] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L60
[11] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L22
[12] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L330
[13] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L342
[14] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L349

@portante
Copy link
Author

I'll be updating the description to full mark down shortly.

portante added a commit to portante/conmon that referenced this issue May 31, 2021
Address a number of conditions flagged in issue containers#262.

The general thrust of this commit is to move closer to a one-to-one,
`read(2)` system call to `writev(2)` system call ratio by creating a
memory buffer for reading sized to match the stdio pipes.

The commit also allocates a maximum number of I/O vectors sized to the
read buffer size, calculating 2 I/O vectors per line, defaulting to a
target line size of 25 bytes (defined as the constant,
`AVG_LOG_LINE_TARGET`).

The commit also now allocates the I/O vectors and read buffer memory
using `mmap()` so that we don't cross page boundaries for I/O
operations.

As a result, the commit removes the `STDIO_BUF_SIZE` constance, using
`fcntl(F_GETPIPE_SZ)` calls to size all buffers for reading from pipes.

The `stdpipe_t` value is now passed to `write_to_remote_consoles()` so
that we do not have to play games with pre-/post- byte allocations to
buffers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant