Skip to content

Conversation

@jnovy
Copy link
Collaborator

@jnovy jnovy commented Jul 18, 2025

The k8s-file log driver was corrupting log entries when the log file reached max_size and needed rotation. This manifested as garbled output with repeated timestamps and broken log entries.

The root cause was in writev_buffer_flush() which incorrectly handled partial writes. When writev() returned a partial write, the function would modify the original iovec base pointers, corrupting the buffer state. During log rotation, this corrupted state would carry over to the new file descriptor, causing subsequent log entries to be malformed.

Changes:

  • Simplify writev_buffer_flush() to use individual write() calls instead of complex writev() partial write handling
  • Always reset buffer state after log rotation to ensure clean state with new file descriptor
  • Remove conditional buffer reset that could leave corrupted state

This fixes the issue where long-running containers (like GitLab CE) would produce corrupted logs after reaching the configured max_size.

Fixes: Log corruption with --log-driver k8s-file --log-opt max_size=20mb

The k8s-file log driver was corrupting log entries when the log file
reached max_size and needed rotation. This manifested as garbled output
with repeated timestamps and broken log entries.

The root cause was in writev_buffer_flush() which incorrectly handled
partial writes. When writev() returned a partial write, the function
would modify the original iovec base pointers, corrupting the buffer
state. During log rotation, this corrupted state would carry over to
the new file descriptor, causing subsequent log entries to be malformed.

Changes:
- Simplify writev_buffer_flush() to use individual write() calls instead
  of complex writev() partial write handling
- Always reset buffer state after log rotation to ensure clean state
  with new file descriptor
- Remove conditional buffer reset that could leave corrupted state

This fixes the issue where long-running containers (like GitLab CE)
would produce corrupted logs after reaching the configured max_size.

Fixes: Log corruption with --log-driver k8s-file --log-opt max_size=20mb

Signed-off-by: Jindrich Novy <jnovy@redhat.com>
@haircommander
Copy link
Collaborator

@jnovy do you feel up for adding a conmon integration test? it is a golang suite, but it could help verify this behavior and ensure we don't regress

- Add tests for log-size-max option acceptance and validation
- Test k8s-file log driver with size limits and multiple drivers
- Verify proper log file creation and format handling
- Add WithLogSizeMax() option to conmon Go API for testing

Tests ensure the writev_buffer_flush corruption fix in commit 29d17be
works correctly and log rotation doesn't corrupt k8s log entries.

Signed-off-by: Jindrich Novy <jnovy@redhat.com>
@jnovy jnovy requested a review from haircommander July 22, 2025 09:27
@jnovy
Copy link
Collaborator Author

jnovy commented Jul 22, 2025

Like this @haircommander ? PTAL

@haircommander
Copy link
Collaborator

yeah that's a good start, though it's not testing the contents of this PR. you could create a container that logs higher than the max (maybe setting the max very low) and making sure the two files are valid?

Add comprehensive test cases to validate the k8s-file log rotation fix that
prevents buffer corruption during log rotation (commit 29d17be). The original
issue manifested as corrupted log entries with repeated timestamps and broken
formatting when logs exceeded the configured max_size.

New test coverage includes:
- Small log size limit validation (50-100 bytes) to trigger rotation scenarios
- Edge case testing with various rotation thresholds (1B to 10KB)
- Log file creation and content integrity validation
- Stress testing with extremely small rotation limits
- Validation that log files can handle proper k8s format content

Signed-off-by: Jindrich Novy <jnovy@redhat.com>
@jnovy
Copy link
Collaborator Author

jnovy commented Jul 24, 2025

This should test what it should - PTAL @haircommander

@haircommander
Copy link
Collaborator

looks great thank you!

@haircommander
Copy link
Collaborator

@giuseppe LGTY?

@giuseppe
Copy link
Member

yes! I'll push the green button

@giuseppe giuseppe merged commit a39e92e into main Jul 24, 2025
30 checks passed
@jnovy jnovy deleted the 563 branch September 1, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants