A working docker container stopped to work after upgrade to 1465.6.0 #2121

eunomie · 2017-08-25T17:06:41Z

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"      
ID=coreos                             
VERSION=1465.6.0                      
VERSION_ID=1465.6.0                   
BUILD_ID=2017-08-16-0012              
PRETTY_NAME="Container Linux by CoreOS 1465.6.0 (Ladybug)"                   
ANSI_COLOR="38;5;75"                  
HOME_URL="https://coreos.com/"        
BUG_REPORT_URL="https://issues.coreos.com"                                   
COREOS_BOARD="amd64-usr"

Environment

Amazon AWS EC2, reproduced on vagrant with virtualbox

Expected Behavior

I use a docker container, journalbeat to send logs to an ELK.
I use the same container for month without any problem.
The container read journald logs (using go-systemd) and send them to logstash.
Logs should not be corrupted and sent to logstash.

Actual Behavior

Since 1465.6 no logs or corrupted logs are sent.
Corrupted logs are visible when we upgrade a CoreOS from a 1406.9 to a 1465.6 and a little after no logs are sent.

The journalbeat's binary by itself work as expected. In a docker image it doesn't work as expected, depending on the base image.

CoreOS version	Docker base image	Result
1406.9	debian:jessie	✅
1406.9 upgraded to 1465.6	debian:jessie	❌ (log corruption)
1465.6	debian:jessie	❌ (no logs)
1406.9 upgraded to 1465.6	debian:stretch-slim	✅
1465.6	debian:stretch-slim	✅

Reproduction Steps

If you need I can publish a vagrant setup with units to show the problem in this particular case.

Other Information

I don't know if it's really a CoreOS issue.
Running a debian:stretch as base image do the job, and an issue is filled in journalbeat's project mheese/journalbeat#106

But I found strange the fact that a container that worked before for month doesn't work anymore. Maybe it can be related to the upgrade of one component in CoreOS and maybe other containers can stop working with the last stable.

If you need any other information just ask, I have a running setup that I can use to reproduce the problem.

The text was updated successfully, but these errors were encountered:

euank · 2017-08-25T17:55:28Z

I believe this is a result of this change: coreos/coreos-overlay#2593 (comment)

We enabled the LZ4 flag for journald starting from Container Linux version 1437.0.0.

The systemd binary and libraries included in a debian:jessie image do not include the LZ4 configuration option, so it's unable to read the lz4 compressed logs. systemd in debian:stretch-slim does have LZ4 support (as determined by systemctl --version's list of features).

Since journald doesn't have an option for reading lz4, but writing another format (e.g. no runtime option to pick lz4 vs xz), I don't think there's any way to fix this other than ensuring whatever containers parse the journal support lz4 compression.

lucab · 2017-08-25T18:01:20Z

For reference, go-systemd sdjournal relies on libsystemd.so via dlopen, so the consuming application picks up whatever the base image provides. I also believe this is due to the recently introduced lz4 support.

eunomie · 2017-08-25T18:02:14Z

@euank Thanks for the explanation, this is very interesting.

And this can explain why there's a difference between an upgraded 1406.9.0 to 1465.6.0 and a 1465.6.0 from scratch.

bgilbert · 2018-05-12T21:55:09Z

At this point we're unlikely to disable LZ4 support in the Container Linux journald, so I'll close.

euank added area/stability component/systemd kind/regression team/os labels Aug 25, 2017

bharrisau mentioned this issue Jan 4, 2018

Docker image 0.12.11 not compatible systemd using LZ4 fluent/fluent-bit#479

Closed

bgilbert closed this as completed May 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A working docker container stopped to work after upgrade to 1465.6.0 #2121

A working docker container stopped to work after upgrade to 1465.6.0 #2121

eunomie commented Aug 25, 2017

euank commented Aug 25, 2017

lucab commented Aug 25, 2017

eunomie commented Aug 25, 2017

bgilbert commented May 12, 2018

A working docker container stopped to work after upgrade to 1465.6.0 #2121

A working docker container stopped to work after upgrade to 1465.6.0 #2121

Comments

eunomie commented Aug 25, 2017

Issue Report

Bug

Container Linux Version

Environment

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

euank commented Aug 25, 2017

lucab commented Aug 25, 2017

eunomie commented Aug 25, 2017

bgilbert commented May 12, 2018