Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Put /var/lib/docker under /mnt? #1307

Closed
yuvipanda opened this issue Aug 20, 2017 · 11 comments
Closed

Put /var/lib/docker under /mnt? #1307

yuvipanda opened this issue Aug 20, 2017 · 11 comments
Labels

Comments

@yuvipanda
Copy link
Contributor

Is this a request for help?: No


Is this an ISSUE or FEATURE REQUEST? (choose one): Issue


What version of acs-engine?: Master


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes, 1.7.2

What happened:

I used instances with 120G of advertised space for my agent nodes, since I expect to have a lot of heavy containers in my agents. However, it looks like /var/lib/docker is only on rootfs, which is only 30G. The bigger /120G partition is unused

What you expected to happen:

/var/lib/docker is able to use up more of the ephemeral disk space

How to reproduce it (as minimally and precisely as possible):

  1. Create a k8s cluster with agent pool instance size Standard_E8s_v3
  2. Pull in a bunch of heavy images
  3. Run out of space
@skuda
Copy link

skuda commented Sep 24, 2017

This should be configurable indeed, it makes sense to use the ephemeral disk in many cases.

@andyzhangx
Copy link
Contributor

/mnt is a temp disk, data will be lost when reboot, you could change os disk size by:
https://github.com/Azure/acs-engine/blob/master/examples/disks-managed/kubernetes-vmas.json#L11

@skuda
Copy link

skuda commented Nov 27, 2017

Why is that a problem? Kubernetes will download images again if they are missing from /var/lib/docker after the temporary storage is lost.

There is another, related, issue: #543

@andyzhangx
Copy link
Contributor

andyzhangx commented Nov 29, 2017

This sounds reasonable, only run one command in the agent VM setup would work:

ln -s /mnt /var/lib/docker

@snebel29
Copy link

snebel29 commented Nov 29, 2017

Hi,
This is just In case this is useful for some one, I arrived to this thread because i was having space issues into my cluster due to containers and overlay.

So I did a quick test by creating the suggested above symlink before running the agents provisioning scripts, like this

"commandToExecute": "[concat('ln -s /mnt /var/lib/docker && /usr/bin/nohup /bin/bash -c \"/bin/bash /opt ..........

At first glance seemed to work fine, all docker stuff went to /mnt but then realize pods were unable to properly get a resolv.conf file showing the following error

  FirstSeen	LastSeen	Count	From					SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----					-------------	--------	------		-------
  7m		7m		1	default-scheduler					Normal		Scheduled	Successfully assigned concepts-english-2680431353-fjlww to k8s-agentpool1-21562761-0
  7m		7m		1	kubelet, k8s-agentpool1-21562761-0			Warning		FailedSync	Error syncing pod, skipping: failed to "CreatePodSandbox" for "concepts-english-2680431353-fjlww_concepts(71a27833-d50e-11e7-9e88-000d3ab77490)" with CreatePodSandboxError: "CreatePodSandbox for pod \"concepts-english-2680431353-fjlww_concepts(71a27833-d50e-11e7-9e88-000d3ab77490)\" failed: rpc error: code = 2 desc = rewrite resolf.conf faield for pod \"concepts-english-2680431353-fjlww\": ResolvConfPath \"/mnt/containers/7960dcd95084bb56f49d7b9636813361c8a8a8b7b63e9b43e3a38f8f56052768/resolv.conf\" does not exist"

I also noticed that there is some typos into the kubernetes error descriptions resolf.conf faield

@sheerun
Copy link

sheerun commented Dec 3, 2017

Yes, please...

@sheerun
Copy link

sheerun commented Dec 3, 2017

I tried the same thing as @snebel29 but by adding -g /mnt/docker flag in /etc/systemd/system/docker.service.d/exec_start.conf, and I've got the same errors in pods:

failed: rpc error: code = 2 desc = rewrite resolf.conf faield for pod \"clockwork-2943196675-xjkcm\"

@sheerun
Copy link

sheerun commented Dec 3, 2017

It seems I fixed it by doing sudo cp /etc/resolv.conf /etc/resolf.conf. Seriously guys, what a typo. I did a grep on while system but it didn't find anything.

@sheerun
Copy link

sheerun commented Dec 3, 2017

It seems it's just error message typo and seems to be happening because following commit is on ACS production: https://github.com/kubernetes/kubernetes/pull/43368/files

Also it turned out sudo cp /etc/resolv.conf /etc/resolf.conf didn't fix it. I'm still getting ResolvConfPath "/mnt/docker/containers/ec149c9b9ee41438742704bd7cd00f15ed06ed1375ac37676b1ba6f35ec4750e/resolv.conf" does not exist

It seems because app container is starting with "ResolvConfPath": "/mnt/docker/containers/ec149c9b9ee41438742704bd7cd00f15ed06ed1375ac37676b1ba6f35ec4750e/resolv.conf"

but there's no resolv.conf there..

I tried replacing following in /etc/systemd/system/kubelet.service and rebooting:

--volume=/var/lib/docker/:/mnt/docker:rw \

to

  --volume=/mnt/docker/:/var/lib/docker:rw \

but it didn't help. I've ended reverting back to /var/lib/docker

@andyzhangx
Copy link
Contributor

@sheerun I got the fix, on every node:

1.

sudo service docker stop
sudo mv /var/lib/docker /mnt
sudo ln -s /mnt/docker /var/lib/docker
sudo service docker start

2.

sudo vi /etc/systemd/system/kubelet.service, append following (this is the key point here)

--volume=/mnt/docker:/mnt/docker:rw \

3.

sudo systemctl daemon-reload
sudo systemctl restart kubelet

@stale
Copy link

stale bot commented Mar 9, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. Note that acs-engine is deprecated--see https://github.com/Azure/aks-engine instead.

@stale stale bot added the stale label Mar 9, 2019
@stale stale bot closed this as completed Mar 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants