Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

A working docker container stopped to work after upgrade to 1465.6.0 #2121

Closed
eunomie opened this issue Aug 25, 2017 · 4 comments
Closed

A working docker container stopped to work after upgrade to 1465.6.0 #2121

eunomie opened this issue Aug 25, 2017 · 4 comments

Comments

@eunomie
Copy link

eunomie commented Aug 25, 2017

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"      
ID=coreos                             
VERSION=1465.6.0                      
VERSION_ID=1465.6.0                   
BUILD_ID=2017-08-16-0012              
PRETTY_NAME="Container Linux by CoreOS 1465.6.0 (Ladybug)"                   
ANSI_COLOR="38;5;75"                  
HOME_URL="https://coreos.com/"        
BUG_REPORT_URL="https://issues.coreos.com"                                   
COREOS_BOARD="amd64-usr" 

Environment

Amazon AWS EC2, reproduced on vagrant with virtualbox

Expected Behavior

I use a docker container, journalbeat to send logs to an ELK.
I use the same container for month without any problem.
The container read journald logs (using go-systemd) and send them to logstash.
Logs should not be corrupted and sent to logstash.

Actual Behavior

Since 1465.6 no logs or corrupted logs are sent.
Corrupted logs are visible when we upgrade a CoreOS from a 1406.9 to a 1465.6 and a little after no logs are sent.

The journalbeat's binary by itself work as expected. In a docker image it doesn't work as expected, depending on the base image.

CoreOS version Docker base image Result
1406.9 debian:jessie
1406.9 upgraded to 1465.6 debian:jessie ❌ (log corruption)
1465.6 debian:jessie ❌ (no logs)
1406.9 upgraded to 1465.6 debian:stretch-slim
1465.6 debian:stretch-slim

Reproduction Steps

If you need I can publish a vagrant setup with units to show the problem in this particular case.

Other Information

I don't know if it's really a CoreOS issue.
Running a debian:stretch as base image do the job, and an issue is filled in journalbeat's project mheese/journalbeat#106

But I found strange the fact that a container that worked before for month doesn't work anymore. Maybe it can be related to the upgrade of one component in CoreOS and maybe other containers can stop working with the last stable.

If you need any other information just ask, I have a running setup that I can use to reproduce the problem.

@euank
Copy link
Contributor

euank commented Aug 25, 2017

I believe this is a result of this change: coreos/coreos-overlay#2593 (comment)

We enabled the LZ4 flag for journald starting from Container Linux version 1437.0.0.

The systemd binary and libraries included in a debian:jessie image do not include the LZ4 configuration option, so it's unable to read the lz4 compressed logs. systemd in debian:stretch-slim does have LZ4 support (as determined by systemctl --version's list of features).

Since journald doesn't have an option for reading lz4, but writing another format (e.g. no runtime option to pick lz4 vs xz), I don't think there's any way to fix this other than ensuring whatever containers parse the journal support lz4 compression.

@lucab
Copy link

lucab commented Aug 25, 2017

For reference, go-systemd sdjournal relies on libsystemd.so via dlopen, so the consuming application picks up whatever the base image provides. I also believe this is due to the recently introduced lz4 support.

@eunomie
Copy link
Author

eunomie commented Aug 25, 2017

@euank Thanks for the explanation, this is very interesting.

And this can explain why there's a difference between an upgraded 1406.9.0 to 1465.6.0 and a 1465.6.0 from scratch.

@bgilbert
Copy link
Contributor

At this point we're unlikely to disable LZ4 support in the Container Linux journald, so I'll close.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants