New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error creating overlay mount to .../merged: invalid argument. #2340

Closed
taadis opened this Issue Feb 1, 2018 · 10 comments

Comments

Projects
None yet
5 participants
@taadis

taadis commented Feb 1, 2018

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1576.5.0
VERSION_ID=1576.5.0
BUILD_ID=2018-01-05-1121
PRETTY_NAME="Container Linux by CoreOS 1576.5.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

阿里云ECS

Expected Behavior

Actual Behavior

Reproduction Steps

docker pull hello-world
docker run -it hello-world
/run/torcx/bin/docker: Error response from daemon: error creating overlay mount to /var/lib/docker/overlay/dd60d72e130f87883312069f54ae40d886ce40e3a589ad5a2b5c44020f9c8e1d/merged: invalid argument.
See '/run/torcx/bin/docker run --help'.

Other Information

@lucab

This comment has been minimized.

Member

lucab commented Feb 1, 2018

Thanks for the report. I can't reproduce this on stable, and our CI didn't seem to experience this either.

It would be great if you could add additional details on your environment. What is the content of /var/lib/update_engine/prefs/aleph-version and what does docker info say?

@taadis

This comment has been minimized.

taadis commented Feb 5, 2018

I reset the system...

# cat /var/lib/update_engine/prefs/aleph-version
1465.8.0

and

# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 17.09.1-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.16-coreos
Operating System: Container Linux by CoreOS 1632.2.1 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 989.9MiB
Name: iZuf6a8pezooag036yip2wZ
ID: JC5E:7OUR:T224:IAKF:JOA5:6ESW:DJYK:5QTH:MIBM:HZGU:RGIV:IY3H
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
@euank

This comment has been minimized.

Contributor

euank commented Feb 5, 2018

Since we don't actually publish images for Alibaba cloud, it's possible the issue is in their image or configuration.

If you're able to reproduce this issue on a different platform or figure out more information about the error, that would be helpful.

Possible steps to get more information about the error could be:

  1. Check docker's log output (set the loglevel to debug, look for any other output related to that mount or other errors)
  2. strace the containerd process, including spawned children, and find the set of calls related to that mountpoint and what arguments + return code the failed call had.
  3. Add additional debug code to the docker daemon
@ill55666

This comment has been minimized.

ill55666 commented Mar 9, 2018

Same issue here.

@taadis 有解吗?

My friend chat with aliyun cs: (using their public coreos image)

screen shot 2018-03-09 at 10 01 11

Summary:

The image is not owned by them.
You need debug by yourself. 😭

@ill55666

This comment has been minimized.

ill55666 commented Mar 9, 2018

Finally, i know how to fix this.
The reason is aliyun ecs may not support overlay storage driver,
you need manually changing it to devicemapper.

see moby/moby#23930 (comment)

Full script:

# fix wrong driver
echo '{ "storage-driver": "devicemapper" }' | sudo tee /etc/docker/daemon.json
sudo systemctl restart docker.service

# fix aliyun buggy selinux
sudo sed -i 's/SELINUXTYPE=mcs/SELINUXTYPE=targeted/' /etc/selinux/config
@ill55666

This comment has been minimized.

ill55666 commented Mar 9, 2018

BTW,
The engineer just reply us with NO solution and suggest us to use their container service...
😠

@taadis

This comment has been minimized.

taadis commented Mar 9, 2018

@ill55666 同感

@taadis

This comment has been minimized.

taadis commented Mar 9, 2018

@ill55666 给你看看阿里某工程师的解决方案:

vi /run/systemd/system/docker.service 
注释掉 #Environment=DOCKER_SELINUX=--selinux-enabled=true 
然后reload一下配置文件
systemctl daemon-reload 
重启docker 即可 
systemctl restart docker

好一个解决方法...

@euank

This comment has been minimized.

Contributor

euank commented Mar 9, 2018

Changing the overlay driver or selinux config shouldn't be necessary. It sounds like the Container Linux image being used might be broken, or the instance's storage might be misbehaving.

Unfortunately, since we don't publish or support those images, I don't think there's much we can do on our end to fix these problems.

If you do run into this issue on another platform or find more information about why things are broken (e.g. from steps in my previous comment), let us know and we can look again.

Best,
Euan

@euank euank closed this Mar 9, 2018

@congjie

This comment has been minimized.

congjie commented Mar 15, 2018

这个问题我的一个解决方案,修改/etc/selinux/config 改SELINUX=disable 改成 SELINUX=permissive

阿里云的coreos一直有这个问题,详细的还没去研究seLinux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment