Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement proposed k8s-stream-file format #265

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

portante
Copy link

@portante portante commented May 31, 2021

Instead of parsing the contents of the data read from the stdout and stderr pipes, this commit adds support for a "stream" format, named k8s-stream-file, which just records what is read from a pipe to disk.

It significantly saves on CPU spend processing the buffer read, uses only 2 I/O vectors, and never touches the memory read from the pipe.

This is a conceptual PR, meant to illustrate a proposed stream log file format that removes the byte level interpretation of stdout/stderr in favor of simply recording what data was read on each system call. It defers the interpretation of the byte stream to the consumer, allowing this writer to operate with as little overhead as possible (avoiding bad containers that write only newlines, or small numbers of bytes between newlines). The reader of the byte stream is then tasked with reassembling the stream according to whatever interpretation it sees fit to use.

The goal of this work is to provide a simple format that will stream well into an object store, such that, given enough metadata stored with the stream, the consumer can reconstruct the I/O stream at the time it is read.

This is an updated implementation of cri-o/cri-o#1605.

@portante
Copy link
Author

See also #262.

@lgtm-com
Copy link

lgtm-com bot commented May 31, 2021

This pull request introduces 1 alert when merging c0bc805 into 3161452 - view on LGTM.com

new alerts:

  • 1 for Use of potentially dangerous function

Copy link
Collaborator

@haircommander haircommander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the implementation @portante ! I have some fiirst-pass review comments, but if this is just a POC feel free to ignore them until the concept is proven.

you'll need to run make fmt and commit the changes to have the majority of CI checks run.

I am betting this will perpetually fail kubernetes e2e tests but I am curious to see what actually ends up happening.

src/ctr_logging.c Outdated Show resolved Hide resolved
src/ctr_logging.c Outdated Show resolved Hide resolved
/*
* PROPOSED: CRI Stream Format, variable length file format
*/
static int set_k8s_stream_timestamp(char *buf, ssize_t bufsiz, ssize_t *tsbuflen, const char *pipename, uint64_t offset, ssize_t buflen, ssize_t *btbw)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seem to be some similarities between this and set_k8s_timestamp. can we either make the shared functionality a function or document why that's not possible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to disturb any existing code with this to avoid introducing un-intended bugs.

If we decide that it is worth merging this, then we should consider combing those two methods into one.

Instead of parsing the contents of the data read from the `stdout` and
`stderr` pipes, this commit adds support for a "stream" format, named
`k8s-stream-file`, which just records what is read from a pipe to disk.

It significantly saves on CPU spend processing the buffer read, uses only
2 I/O vectors, and never touches the memory read from the pipe.

This is an updated implementation of
cri-o/cri-o#1605.
@portante
Copy link
Author

portante commented Jun 1, 2021

you'll need to run make fmt and commit the changes to have the majority of CI checks run.

Fixed.

Thanks for the review!

@rhatdan
Copy link
Member

rhatdan commented Jun 1, 2021

@giuseppe PTAL

off = -off;
}

len = snprintf(buf, bufsiz, "%d-%02d-%02dT%02d:%02d:%02d.%09ld%c%02d:%02d %s %lud %ld ", current_tm.tm_year + 1900,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A drive-by comment with little context:)

If the goal is to offload the CPU processing to consumers, shouldn’t this just be a timestamp instead of all the timezone lookups and formatting?

Either way, UTC would be better than ambiguous local time — does this one need custom parser code to get a struct timespec back?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants