-
Notifications
You must be signed in to change notification settings - Fork 26
Cloudstor EFS Volume not working in Docker CE for AWS 18.06.1-ce #177
Comments
I'm having the same issue with my actual stack in aws. it's happening since AWS 18.06.1-CE was upgraded. |
Same issue here.. for us cloudstor seems to break when using |
Has anyone figured out how to fix this? Appears we're getting bit with it now. @leostarcevic how did you go about downgrading? |
@mateodelnorte I basically just rolled back to the 18.03 AMI-IDs. I've been saving previous releases in our repository, because Docker only provides the latest release AFAIK. Let me know if you need help |
any update on solving this? |
they just link the latest template from the site, but there are all versions in the bucket. The version that works for us is on https://editions-us-east-1.s3.amazonaws.com/aws/stable/18.03.0/Docker.tmpl But the diff is only in new condition |
FYI still don't work with last version |
Shame that they don't give a fuck with even simple response so we know where we stand. Really like 6 months without anything ? .. Time to switch to rexray, period .. |
Anyone find a solution to this yet? I mounted the host log directory inside a container and didn’t see anything that meaningful (lots of timeouts). I’d really like to not be vulnerable to CVE-2019-5736... I thought this template was supposed to be “baked and tested...” |
From the kernel logs: Mar 31 01:47:13 moby kernel: INFO: task portainer:4823 blocked for more than |
Digging in a little further, I tried spinning up yet another brand new stack with the "Encrypt EFS" option turned on. Still no love. Also, it looks like I can mount the EFS volume (and see/inspect its contents) in a manger node that isn't trying to run a container that requires access to the volume. Any such interaction from a manager node that is trying to run a container that has that volume mapped hangs, and that container is completely unresponsive. So there doesn't appear to be anything wrong with EFS. Also, containers that don't rely on EFS work just fine. Seems like it's the plugin at fault here. Does anyone know where or if the code for the plugin is available somewhere? |
Thanks for looking into this. I actually never got any data written to the
volume... just a few directories and empty files created. I stood up a
brand new stack with three manager nodes, and one worker node. Then I
created an empty volume. Then I tried to start the default portainer stack
(as well as a few other various services). The container apparently created
a few directories and empty files, but otherwise hung indefinitely. Any
subsequent attempts to interact with that volume hang indefinitely. I could
go to another node, and see the volume.
Thoughts?
…On Thu, Apr 4, 2019 at 14:53 MikaHjalmarsson ***@***.***> wrote:
@jderusse <https://github.com/jderusse> @paullj1
<https://github.com/paullj1> What were your test cases? Can you provide
number of files and directories?
I'm trying out the Docker 18.09.2 AMI:s with T3 instances and I've created
file sizes from 1MB up to 1GB with Cloudstore/EFS and can't see any
problems. Swarm exists of 3 managers and 3 workers.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#177 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIb5BxbDAp1VfIU8XumtebH8LxSBpkR4ks5vdlhAgaJpZM4Xd5px>
.
|
Hi this isn't working for me either, slightly different error than others have mentioned:
I have created the cluster with the appropriate EFS setting:
And specified the proper mount config in a compose file:
This is happening with the latest version:
|
I'm seeing this issue too. My test setup is simple:
I then exec into the running container and try It will hang. Syslog reveals some information, but I think I have a line from the NFS module that is missing otherwise:
This is a single-node (manager only) deployment for testing, although I've created similar in a larger scale setup and seen it as well (when I ran into the problem in the first instance). It's running in I wonder if this is related to the mount options, such as I cannot see the mount options used by the opaque Given this issue has been open a long time, if the developers aren't able to support it, perhaps they should consider open sourcing it instead, or at least indicate if EE is similarly affected? edit: Just to add a bit more information. I can write varying amounts of data before it dies, even with Also, I have found the mount options in the log:
I note that |
Wow, I’m glad I’m not the only one seeing this. I don’t think it’s one of
the mount options... I’ve played pretty extensively with those. I have been
able to get an EFS volume mounted, and create directories, and files in it
with no issues. I’ve even been able to add small contents to files (echo
“Hello world” > .keep). The problem seems to come in when you write lots of
data... or maybe it’s binary data causing the issue?
…On Wed, Jun 19, 2019 at 21:36 Steve Kerrison ***@***.***> wrote:
I'm seeing this issue too.
My test setup is simple:
version: "3.7"
services:
test:
image: alpine
command: "sh -c 'sleep 900'"
volumes:
- teststorage:/mnt
deploy:
restart_policy:
condition: none
volumes:
teststorage:
driver: "cloudstor:aws"
driver_opts:
backing: "shared"
I then exec into the running container and try dd if=/dev/urandom
of=/mnt/test.file bs=1M count=1
It will hang. Syslog reveals some information, but I think I have a line
from the NFS module that is missing otherwise:
Jun 20 02:00:01 moby syslogd 1.5.1: restart.
Jun 20 02:03:49 moby kernel: nfs: <<my EFS DNS name>> not responding, still trying
Jun 20 02:04:05 moby kernel: INFO: task dd:7813 blocked for more than 120 seconds.
Jun 20 02:04:05 moby kernel: Not tainted 4.9.114-moby #1
Jun 20 02:04:05 moby kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 02:04:05 moby kernel: dd D 0 7813 7188 0x00000100
Jun 20 02:04:05 moby kernel: 00000000000190c0 0000000000000000 ffff9ce08fbdb7c0 ffff9ce0a5dd8100
Jun 20 02:04:05 moby kernel: ffff9ce0a30ea040 ffff9ce0b62190c0 ffffffff8d83caf6 0000000000000002
Jun 20 02:04:05 moby kernel: ffff9ce0a30ea040 ffffc1064117bce0 7fffffffffffffff 0000000000000002
Jun 20 02:04:05 moby kernel: Call Trace:
Jun 20 02:04:05 moby kernel: [<ffffffff8d83caf6>] ? __schedule+0x35f/0x43d
Jun 20 02:04:05 moby kernel: [<ffffffff8d83cf26>] ? bit_wait+0x2a/0x2a
Jun 20 02:04:05 moby kernel: [<ffffffff8d83cc52>] ? schedule+0x7e/0x87
Jun 20 02:04:05 moby kernel: [<ffffffff8d83e8de>] ? schedule_timeout+0x43/0x101
Jun 20 02:04:05 moby kernel: [<ffffffff8d019808>] ? xen_clocksource_read+0x11/0x12
Jun 20 02:04:05 moby kernel: [<ffffffff8d12e281>] ? timekeeping_get_ns+0x19/0x2c
Jun 20 02:04:05 moby kernel: [<ffffffff8d83c739>] ? io_schedule_timeout+0x99/0xf7
Jun 20 02:04:05 moby kernel: [<ffffffff8d83c739>] ? io_schedule_timeout+0x99/0xf7
Jun 20 02:04:05 moby kernel: [<ffffffff8d83cf3d>] ? bit_wait_io+0x17/0x34
Jun 20 02:04:05 moby kernel: [<ffffffff8d83d009>] ? __wait_on_bit+0x48/0x76
Jun 20 02:04:05 moby kernel: [<ffffffff8d19e758>] ? wait_on_page_bit+0x7c/0x96
Jun 20 02:04:05 moby kernel: [<ffffffff8d10f99e>] ? autoremove_wake_function+0x35/0x35
Jun 20 02:04:05 moby kernel: [<ffffffff8d19e842>] ? __filemap_fdatawait_range+0xd0/0x12b
Jun 20 02:04:05 moby kernel: [<ffffffff8d19e8ac>] ? filemap_fdatawait_range+0xf/0x23
Jun 20 02:04:05 moby kernel: [<ffffffff8d1a060c>] ? filemap_write_and_wait_range+0x3a/0x4f
Jun 20 02:04:05 moby kernel: [<ffffffff8d2bcf98>] ? nfs_file_fsync+0x54/0x187
Jun 20 02:04:05 moby kernel: [<ffffffff8d1f6c4d>] ? filp_close+0x39/0x66
Jun 20 02:04:05 moby kernel: [<ffffffff8d1f6c99>] ? SyS_close+0x1f/0x47
Jun 20 02:04:05 moby kernel: [<ffffffff8d0033b7>] ? do_syscall_64+0x69/0x79
Jun 20 02:04:05 moby kernel: [<ffffffff8d83f64e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
This is a single-node (manager only) deployment for testing, although I've
created similar in a larger scale setup and seen it as well (when I ran
into the problem in the first instance). It's running in ap-southeast-1,
on a modified template, because the last template update was just after EFS
was released into the AP region. I will, when I have time, see if I can
replicate the behaviour in another region.
I wonder if this is related to the mount options, such as noresvport not
being set? More info here:
https://forums.aws.amazon.com/message.jspa?messageID=812356#882043
I cannot see the mount options used by the opaque cloudstor:aws plugin,
so it's hard to say.
Given this issue has been open a long time, if the developers aren't able
to support it, perhaps they should consider open sourcing it instead, or at
least indicate if EE is similarly affected?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#177>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACDPSBYWE4A4K75YCKPYZTTP3LUMNANCNFSM4F3XTJYQ>
.
|
Hi @paullj1, OK that's interesting. Thanks for the extra data points. How did you test the mount options? In my view it's not possible to tweak how cloudstor mounts the EFS volume it will attach to the container. If you tested those options separately it might not be a fair comparison. |
I specified them in the compose file in the mount options
<https://forums.docker.com/t/how-to-mount-nfs-drive-in-container-simplest-way/46699>
(which
I believe is the only thing that the cloudstor plugin does, but obviously
cannot confirm since the source is nowhere to be found). Understood it may
not be totally fair, but in each case, the volumes showed up as cloudstor
volumes when I did a volume list... also, if the options specified by the
cloudstor plugin don't work at all, I'm not sure how else to troubleshoot.
…On Wed, Jul 3, 2019 at 1:44 AM Steve Kerrison ***@***.***> wrote:
Hi @paullj1 <https://github.com/paullj1>,
OK that's interesting. Thanks for the extra data points.
How did you test the mount options? In my view it's not possible to tweak
how cloudstor mounts the EFS volume it will attach to the container. If you
tested those options separately it might not be a fair comparison.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#177>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACDPSB2R3GPEJJOUYX2QRY3P5RDFZANCNFSM4F3XTJYQ>
.
|
We are having the exact same issue with ECS mounting to efs volumes. Looks like mount fails/recovers intermittently which causes containers that mounts to the efs to fail with following error.
Docker version 18.06.1-ce, build e68fc7a215d7133c34aa18e3b72b4a21fd0c6136 |
@paullj1 I suspect the cloustor driver might not treat your mount options the same way, but I'm not sure. If those options get ignored, then all bets are off. @serkanh if you get intermittent errors, it might still be perpetuated by my hunch. Or it might not. I'd consider offering a bounty for this, but there's little point when the only people with access to the code don't seem to even look at their issues lists... |
@paullj1 I'm sorry I re-read your message and see you were using pure NFS on a local mount. I may also do some experiments along those lines when I get a chance. |
@stevekerrison, no worries! There has to be a combination of options that work, I just haven't found it yet. Once those options are found, I suspect the only thing that will need to change for the Cloudstor plugin to work, is those options. @serkanh, I see the same thing in my logs (syslog, and dmesg). It's not that it's failing intermittently, it's that it periodically updates you on its failure to mount the share. Since mounting a disk mostly happens in kernel space, the kernel is letting you know that it has a hung task. Those messages should appear every 2 minutes. |
I ran a test similar to yours, and get the same failures. I mounted a local nfs mount, using docker-compose, in swarm mode, attached to the EFS volume that's supposed to be used by CloudStor. I used these options:
What I did notice was that upon creating a new directory, it appeared in my swarm's So I suspect that some part of the cloudstor plugin is interfering with my NFS mount. I'd be interested to see how the system handles NFS-mounted EFS volumes if cloudstor's EFS support is disabled. Alas, I don't know if the cloudformation without EFS will include the NFS drivers or not, as I've not dug that deep. |
Yup. I see the same thing. I have taken it further and deployed a swarm
without EFS/Cloudstor support, made a manual EFS volume, then mounted it
like you describe and had the same issues. So, can confirm it isn’t
Cloudstor messing anything up. I suspect it’s just the EFS options. We’ve
got to find which options cause it to hang.
…On Wed, Jul 10, 2019 at 02:12 Steve Kerrison ***@***.***> wrote:
I ran a test similar to yours, and get the same failures. I mounted a
local nfs mount, using docker-compose, in swarm mode, attached to the EFS
volume that's supposed to be used by CloudStor. I used these options:
nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport
What I did notice was that upon creating a new directory, it appeared in
my swarm's docker volume ls as a cloudstor:aws volume (the volumes are
just subdirectories of the EFS volume). In fact if you inspect EFS
cloudstor mounts you'll see they go into /mnt/efs/{mode}/{name} where
{mode} differentiates between regular and maxIO.
So I suspect that some part of the cloudstor plugin is interfering with my
NFS mount. I'd be interested to see how the system handles NFS-mounted EFS
volumes if cloudstor's EFS support is disabled. Alas, I don't know if the
cloudformation without EFS will include the NFS drivers or not, as I've not
dug that deep.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#177>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACDPSB5EBLQQJ4OXUX5LJTTP6WDVZANCNFSM4F3XTJYQ>
.
|
More testing... it doesn't look like it's the options. I looked at the EFS options from one of my other swarms (using an older template where Cloudstor actually works), and they're identical. The delta might be that they added the "encryption" option? Maybe that's causing issues? To recap:
|
Expected behavior
Copying data to volume must work.
Actual behavior
Copying data to volume just froze the stack and only restart helps.
Information
yes
Steps to reproduce the behavior
docker -H 127.0.0.1:2374 volume create \ --driver "cloudstor:aws" \ --opt backing=shared \ --opt perfmode=maxio \ shared_volume docker -H 127.0.0.1:2374 run -it --rm \ --mount type=volume,volume-driver=cloudstor:aws,source=shared_volume,destination=/volume \ alpine_based_image \ rsync -az --verbose --numeric-ids --human-readable /data4share/ /volume/
The text was updated successfully, but these errors were encountered: