Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with grootfs store for unprivileged causes garden failure #66

Closed
sparameswaran opened this issue Mar 27, 2018 · 4 comments
Closed
Labels

Comments

@sparameswaran
Copy link

sparameswaran commented Mar 27, 2018

Garden job fails to come up with Pivotal App Service Tile v2.0.8 due to grootfs issue:

diego_cell/59ad51f8-81a2-443f-a77c-a066f5c612e6:~# tail -f /var/vcap/sys/log/garden/garden.std*
==> /var/vcap/sys/log/garden/garden.stderr.log <==

==> /var/vcap/sys/log/garden/garden.stdout.log <==
{"timestamp":"1522086995.059141159","source":"guardian","message":"guardian.create.create-failed-cleaningup.volumizer.image-plugin-destroy.grootfs.delete.failed-to-initialise-image-driver","log_level":2,"data":{"cause":"running image plugin create: reading namespace file: open /var/vcap/data/grootfs/store/unprivileged/meta/namespace.json: input/output error\n: exit status 1","error":"reading namespace file: open /var/vcap/data/grootfs/store/unprivileged/meta/namespace.json: input/output error","handle":"executor-healthcheck-2001547f-aa74-45df-6286-fb6ebf812728","original_timestamp":"2018-03-26T17:56:35.0587776Z","session":"91.3.2.1"}}
{"timestamp":"1522086995.063849688","source":"guardian","message":"guardian.create.create-failed-cleaningup.volumizer.image-plugin-destroy.image-plugin-result","log_level":2,"data":{"action":"destroy","cause":"running image plugin create: reading namespace file: open /var/vcap/data/grootfs/store/unprivileged/meta/namespace.json: input/output error\n: exit status 1","error":"exit status 1","handle":"executor-healthcheck-2001547f-aa74-45df-6286-fb6ebf812728","session":"91.3.2.1","stdout":"reading namespace file: open /var/vcap/data/grootfs/store/unprivileged/meta/namespace.json: input/output error\n"}}
{"timestamp":"1522086995.064245462","source":"guardian","message":"guardian.create.create-failed-cleaningup.destroy-failed","log_level":2,"data":{"cause":"running image plugin create: reading namespace file: open /var/vcap/data/grootfs/store/unprivileged/meta/namespace.json: input/output error\n: exit status 1","error":"running image plugin destroy: reading namespace file: open /var/vcap/data/grootfs/store/unprivileged/meta/namespace.json: input/output error\n: exit status 1","handle":"executor-healthcheck-2001547f-aa74-45df-6286-fb6ebf812728","session":"91.3"}}
{"timestamp":"1522086995.064504623","source":"guardian","message":"guardian.create.create-failed-cleaningup.cleanedup","log_level":1,"data":{"cause":"running image plugin create: reading namespace file: open /var/vcap/data/grootfs/store/unprivileged/meta/namespace.json: input/output error\n: exit status 1","handle":"executor-healthcheck-2001547f-aa74-45df-6286-fb6ebf812728","session":"91.3"}}
{"timestamp":"1522086995.064779043","source":"guardian","message":"guardian.api.garden-server.create.failed","log_level":2,"data":{"error":"running image plugin create: reading namespace file: open /var/vcap/data/grootfs/store/unprivileged/meta/namespace.json: input/output error\n: exit status 1","request":{"Handle":"executor-healthcheck-2001547f-aa74-45df-6286-fb6ebf812728","GraceTime":0,"RootFSPath":"/var/vcap/packages/cflinuxfs2/rootfs.tar","BindMounts":null,"Network":"","Privileged":false,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"1.1.122"}}
{"timestamp":"1522087003.623143435","source":"guardian","message":"guardian.list-containers.starting","log_level":1,"data":{"session":"92"}}
{"timestamp":"1522087003.623309374","source":"guardian","message":"guardian.list-containers.finished","log_level":1,"data":{"session":"92"}}
{"timestamp":"1522087035.329714775","source":"guardian","message":"guardian.api.garden-server.waiting-for-connections-to-close","log_level":1,"data":{"session":"1.1"}}
{"timestamp":"1522087035.332157612","source":"guardian","message":"guardian.api.garden-server.stopping-backend","log_level":1,"data":{"session":"1.1"}}
{"timestamp":"1522087035.332249641","source":"guardian","message":"guardian.api.garden-server.stopped","log_level":1,"data":{"session":"1.1"}}

Underlying problem with the grootfs unprivileged store:

~# ls -al /var/vcap/data/grootfs/store/
ls: cannot access /var/vcap/data/grootfs/store/unprivileged: Input/output error
total 113824
drwxr-xr-x 4 root root         4096 Mar 26 17:52 .
drwxr-xr-x 3 root root         4096 Mar 26 17:51 ..
drwx------ 9 root root          118 Mar 26 17:53 privileged
-rw------- 1 root root 118579802112 Mar 26 17:55 privileged.backing-store
d????????? ? ?    ?               ?            ? unprivileged
-rw------- 1 root root 118579802112 Mar 26 17:53 unprivileged.backing-store

Running on vsphere with Ops Mgr build 2.0-build-269, PAS v2.0.8, NSX-T tile v2.1 integration

  • garden-runc-release version: 1.11.1
  • IaaS: vsphere
  • Stemcell version: ubuntu trusty 3468.27
  • Kernel version: 4.4.0-116-generic

Steps to reproduce

PAS deployment fails to come up with the garden job in diego_cell failing.

Logs

See above

Cause

We believe the root cause was an underlying issue with the IaaS.

Resolution

Recreating the diego_cell didnt work. cannot delete the folder as there is no permission. This was actually resolved by deleting the VM from the IaaS and redeploying.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/156314215

The labels on this github issue will be updated when the story is started.

@williammartin
Copy link

Thanks for opening the issue with us. The most suspicious output here is the I/O error and the very strange directory listing for the unprivileged directory.

~# ls -al /var/vcap/data/grootfs/store/
ls: cannot access /var/vcap/data/grootfs/store/unprivileged: Input/output error
total 113824
drwxr-xr-x 4 root root         4096 Mar 26 17:52 .
drwxr-xr-x 3 root root         4096 Mar 26 17:51 ..
drwx------ 9 root root          118 Mar 26 17:53 privileged
-rw------- 1 root root 118579802112 Mar 26 17:55 privileged.backing-store
d????????? ? ?    ?               ?            ? unprivileged
-rw------- 1 root root 118579802112 Mar 26 17:53 unprivileged.backing-store

Preliminary investigations would suggest a problem in your cluster with the backing data store:

https://unix.stackexchange.com/questions/39905/input-output-error-when-accessing-a-directory/39908
https://unix.stackexchange.com/questions/158443/files-in-ls-l-output

I can't really imagine anything that GrootFS could do to result in this, the kernel is seems unable to get the right information from disk.

As we discussed on Slack, let's see if deleting the cell in the IAAS and letting BOSH bring it back results in a healthy VM. It might be worth doing some disk diagnostics on your cluster regardless.

@williammartin
Copy link

@sparameswaran Sounds like from further investigation, the IAAS delete resolved this. Do you think you need anything more in Garden or can we close this out?

@sparameswaran
Copy link
Author

Deleting the vm from iaas and redeploying appears to have fixed the problem. We can close this issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants