Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xfs quota for overlay #24807

Closed
wants to merge 1 commit into from
Closed

Conversation

albamc
Copy link

@albamc albamc commented Jul 19, 2016

- What I did
Apply xfs project quota for overlay storage driver.
Basically it's same work for #24771 Xfs quota PR but I applied to overlay instead of overlay2
I planed to improve this PR and apply overlay2 too but better PR (#24771) is already summited...

- How I did it
Set xfs project quota to container's overlay storage root directory.
Use projectquota.go in #24771 and apply it to overlay driver.
The --storage-opt size=#(m/g) options to set a limit for containers.

- Known issues
This quota only applied to container's upper layer and file changes which made by container itself.
XFS quota doesn't allow hardlink across quota boundries. so basic ApplyDiff() hardlink operation is changed.

  • if source/destination directories are same quota boundries or no quota --> hardlink file
  • if source/destination directories are different quota boundries --> copy file
    XFS project id is not deleted when quota is deleted by overlay driver (ex. container rm).
    And there's no way to figure out current project id is used in another place.
    So, If you want to create unique project id, set a project id to home directory of overlay for it. (use it to minimum project id for quota)

- How to verify it

  • basic operation
# mount | grep xfs
/dev/sda1 on / type xfs (rw,relatime,attr2,inode64,prjquota)

# dockerd -H 0.0.0.0:2375 --storage-driver=overlay -D &
albamc@albamc-laptop:~/dev/github/albamc/docker$ docker run -it --storage-opt size=200m ubuntu
root@1a58611ae018:/# df
Filesystem     1K-blocks      Used Available Use% Mounted on
overlay           204800         8    204792   1% /
tmpfs            3596008         0   3596008   0% /dev
tmpfs            3596008         0   3596008   0% /sys/fs/cgroup
/dev/sda1      234318044 185557116  48760928  80% /etc/hosts
shm                65536         0     65536   0% /dev/shm
tmpfs            3596008         0   3596008   0% /sys/firmware
root@1a58611ae018:/# dd if=/dev/zero of=/file bs=1024 count=102400000
dd: error writing '/file': No space left on device
204793+0 records in
204792+0 records out
209707008 bytes (210 MB, 200 MiB) copied, 0.905432 s, 232 MB/s
root@1a58611ae018:/# df
Filesystem     1K-blocks      Used Available Use% Mounted on
overlay           204800    204800         0 100% /
tmpfs            3596008         0   3596008   0% /dev
tmpfs            3596008         0   3596008   0% /sys/fs/cgroup
/dev/sda1      234318044 185761228  48556816  80% /etc/hosts
shm                65536         0     65536   0% /dev/shm
tmpfs            3596008         0   3596008   0% /sys/firmware
  • check for different quota boundries
#
# tests for different quota boundries (copy file)
#
# run container
albamc@albamc-laptop:~$ docker run -d --name qtest --storage-opt size=200m alpine ping www.naver.com
bf08c9969da3a05628ac54881e60a703790a35b5600a7c3815f0a882aae9cc65
# change container files
albamc@albamc-laptop:~$ docker exec -it qtest /bin/sh -c "echo blarblarblar > /blarblarblar"
# get lower-dir
albamc@albamc-laptop:~$ docker inspect qtest -f {{.GraphDriver.Data.LowerDir}}
/var/lib/docker/overlay/2f2e66b0f66638ffd326d20fbec1dc54858daccea958a691d89a12ce81d14bd1/root
# set project quota to lower-dir
albamc@albamc-laptop:~$ sudo xfs_quota -x -c 'project -s -p /var/lib/docker/overlay/2f2e66b0f66638ffd326d20fbec1dc54858daccea958a691d89a12ce81d14bd1 10000' / &> /dev/null
# commit container for calling ApplyDiff()
albamc@albamc-laptop:~$ docker commit qtest alpine:qset
sha256:1a5b0bd64c2838bb41ca5ff13de424edbc026c62ea4fd486b088403808407960
# get created image's root dir
albamc@albamc-laptop:~$ docker image inspect alpine:qset -f {{.GraphDriver.Data.RootDir}}
/var/lib/docker/overlay/268334faf34e0a0e2880bd314e32eccdd5acd5f31e5beaff31b2dbc986d8554f/root
# check inode number is different (copied)
albamc@albamc-laptop:~/dev/github/albamc/docker$ sudo stat /var/lib/docker/overlay/2f2e66b0f66638ffd326d20fbec1dc54858daccea958a691d89a12ce81d14bd1/root/bin/busybox
  File: '/var/lib/docker/overlay/2f2e66b0f66638ffd326d20fbec1dc54858daccea958a691d89a12ce81d14bd1/root/bin/busybox'
  Size: 805032    	Blocks: 1576       IO Block: 4096   일반 파일
Device: 801h/2049d	Inode: 6737775     Links: 3
...
albamc@albamc-laptop:~/dev/github/albamc/docker$ sudo stat /var/lib/docker/overlay/268334faf34e0a0e2880bd314e32eccdd5acd5f31e5beaff31b2dbc986d8554f/root/bin/busybox
  File: '/var/lib/docker/overlay/268334faf34e0a0e2880bd314e32eccdd5acd5f31e5beaff31b2dbc986d8554f/root/bin/busybox'
  Size: 805032    	Blocks: 1576       IO Block: 4096   일반 파일
Device: 801h/2049d	Inode: 406782300   Links: 1
...
# 
# tests for same quota boundries (or quota is not set)
#
# clear project quota
albamc@albamc-laptop:~$ sudo xfs_quota -x -c 'project -C -p /var/lib/docker/overlay/2f2e66b0f66638ffd326d20fbec1dc54858daccea958a691d89a12ce81d14bd1 10000' / &> /dev/null
# change container files
albamc@albamc-laptop:~/dev/github/albamc/docker$ docker exec -it qtest /bin/sh -c "echo blarblarblar2 > /blarblarblar2"
# commit container for calling ApplyDiff()
albamc@albamc-laptop:~/dev/github/albamc/docker$ docker commit qtest alpine:qunset
sha256:813e7bca6da679d3ee7601e4b8541fcc7b3f586da0fb59d1171f8d1a385b1d0a
# get created image's root dir
albamc@albamc-laptop:~/dev/github/albamc/docker$ docker image inspect alpine:qunset -f {{.GraphDriver.Data.RootDir}}
/var/lib/docker/overlay/aa42873df47dc2812bcaeccefd635339562e91cceeb1b0e3ba17fe1ea6fc5b78/root
# check inode number is same (hardlink)
albamc@albamc-laptop:~/dev/github/albamc/docker$ sudo stat /var/lib/docker/overlay/2f2e66b0f66638ffd326d20fbec1dc54858daccea958a691d89a12ce81d14bd1/root/bin/busybox
  File: '/var/lib/docker/overlay/2f2e66b0f66638ffd326d20fbec1dc54858daccea958a691d89a12ce81d14bd1/root/bin/busybox'
  Size: 805032    	Blocks: 1576       IO Block: 4096   일반 파일
Device: 801h/2049d	Inode: 6737775     Links: 3
...
albamc@albamc-laptop:~/dev/github/albamc/docker$ sudo stat /var/lib/docker/overlay/aa42873df47dc2812bcaeccefd635339562e91cceeb1b0e3ba17fe1ea6fc5b78/root/bin/busybox
  File: '/var/lib/docker/overlay/aa42873df47dc2812bcaeccefd635339562e91cceeb1b0e3ba17fe1ea6fc5b78/root/bin/busybox'
  Size: 805032    	Blocks: 1576       IO Block: 4096   일반 파일
Device: 801h/2049d	Inode: 6737775     Links: 3
...

- Description for the changelog
apply xfs project quota for overlay over xfs

- A picture of a cute animal (not mandatory but encouraged)
image

Signed-off-by: Jonghyun Lee albam.c@navercorp.com

@amir73il
Copy link
Contributor

Some timing we have... nice work :-)
There are a few problems with this PR:

  • "Xfs project quota requires an unique project id so I use directory inode number for project id."

That's a smart move, but it has one problem. xfs inode is 64bit and project id is 32bit,
so while it may take time until you overflow the project id, but it is going to overflow, especially
on host with large xfs volume.
I can'y say that my solution in PR #24771 is complete - it's not, but it is up to the storage driver to manage and assign unique project ids to containers

  • "Should use ioctl system calls instead of xfs_quota command line. (it requires xfs_quota)"

I did this for you :-)
feel free to rip my code to a common file and use it
send me the patch for review and to get my signed-off-by if you do.

  • "I'm not sure xfs project is cleaned up when directory is removed (container rm)"

it's not. not that it matters so much if only docker uses this specific project id.

But most importantly, in my PR I wrote:
"overlay (1) cannot use project quotas, because hardlinks are not allowed across
directories that are assigned to different project quotas."
see: http://oss.sgi.com/archives/xfs/2012-11/msg00613.html

So I may have been wrong with this statement, but are you sure that all the cases
where overlay creates hardlinks are not crossing project quota boundaries with your implementation?
If indeed I was wrong and there is no conflict wrt project quotas and overlay hardlinks,
then by all means, let's work on unifying the storage options and documentation to cover overlay*

@albamc
Copy link
Author

albamc commented Jul 21, 2016

@amir73il Thanks for your comments :)
Oops... xfs project id is 32bits ?
It can be easier for generate unique id automatically when I rip your codes and add a way to remove project id in case of container "rm". I'll try and send patch to you for review.
Yes, you are right (hardlinks are not allowed across project quota enabled directories) and tests before I send a PR, I can't find any errors but I'm not sure I test all cases. I'll check it too.

@amir73il
Copy link
Contributor

Correction: xfs 64bit is an opt-in feature only available when the mount option inode64 is used.
So it should be safe to use directory inode number as project id as long as you verify that the inode64
mount option is not used.

@GordonTheTurtle GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Sep 1, 2016
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Sep 1, 2016
}
if projectID <= uint32(fsx.fsx_projid) {
projectID = uint32(fsx.fsx_projid) + 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you want to set this projid in the q.quotas[] map? if you don't, users won't be able to call getQuota() for those directories later on

@albamc albamc force-pushed the xfs_quota_for_overlay branch 2 times, most recently from 2e88ece to 0cc50ec Compare September 22, 2016 03:52
@thaJeztah
Copy link
Member

ping @albamc could you rebase? Trying to see if we can get both #24771, and this one moving again

@GordonTheTurtle GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Oct 17, 2016
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Oct 17, 2016
@albamc
Copy link
Author

albamc commented Oct 17, 2016

@thaJeztah I get projectquota.go from #24771 and apply it (for rebase)
Umm... am I right ? there's conflicting files but I don't know how to fix it.

@thaJeztah
Copy link
Member

@albamc #24771 was merged, so you can rebase on master

also ping @crosbymichael @dmcgowan for review

@albamc
Copy link
Author

albamc commented Oct 18, 2016

@thaJeztah I did rebase on master.
BTW there's one big drawback in this PR because xfs quota doesn't allow hardlink between different quota boundries and sometimes I found that file IO is failed when quota is set.
I changed basic applyDiff() function to copy files instead of hardlink files when quota is set, and it can be a big drawback...

@@ -420,7 +467,13 @@ func (d *Driver) ApplyDiff(id string, parent string, diff archive.Reader) (size
}
}()

if err = copyDir(parentRootDir, tmpRootDir, copyHardlink); err != nil {
// if quota is used, hardlinks between different volumes are not supported.
Copy link
Contributor

@amir73il amir73il Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this condition is too harsh.
anyone using overlay over xfs with pquota (even without any intention to use pquota with docker) will have much worse performance for starting containers.
Probably the right thing to do is to use d.quotaCtl.GetQuota() to find out if either the source or target directory have a pquota set. Otherwise hardlinks should be used.

Also, I did not look at the driver code closely, but I have the feeling that --size argument to create/run applies to both lower and upper layer directories. Is that correct? If this is the case, then avoiding to set project quota on the lower dir, should enable applying diff with hardlinks.

Truth is that if project quota is to be applied to both lower and upper dir, then they should be using the same project id and that is not what quotaCtl.SetQuota() does.
The semantics of setting quota only to upper dir is same as overlay2 (limited amount of modifications to base fs) while the semantics os setting quota on both lower and upper is more like devmapper (limited amount of space for entire composed image)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should limit this to just the upper layer as is the case with overlay2. We can set the quota on the upper directory and not worry about the potential for project to apply to a root directory. Any directory that has ApplyDiff called on it will not be used as a container layer and the root directory will never be writable from a container.

@amir73il
Copy link
Contributor

Looks good besides my comment on hardlinks in ApplyDiff(). IMO, the exact use cases and exact consequences should be clarified before merge. And as I wrote, there is a possibility that this drawback can be completely avoided.

@albamc
Copy link
Author

albamc commented Oct 21, 2016

I agree. Of course, hardlink problem must be solved before merge.
I think there's 2 solutions (which should be tested.)

  1. set quota to upper or merged directory instead of container directory root (prevent hardlink)
  2. temporary disable quota and create hardlink and enable quota (or remove quota and create quota)

if 2 solutions doesn't work, at least check hardlink source/destinations and copy files only different quota boundries.
But, I think... most case of hardlinks will across quota boundries according to overlay operations.
Today, I tested case 1) and I found that quota is not working after upper directory created.
I need more time to figure out why quota is not working.
Error message for create hardlink is like below

albamc@albamc-laptop:~/dev/oss/albam-c/scollector$ docker run -it registry.navercorp.com/centos:centos7 
lsUnable to find image 'registry.navercorp.com/centos:centos7' locally
centos7: Pulling from centos
5f70bf18a086: Pull complete 
dc44cf6f74b6: Pull complete 
c2a26298f8e0: Download complete 
edb4ed46d3b6: Download complete 
docker: failed to register layer: link /docker/overlay/601ee9760686da2e9986ad784e592f9ef48406dba024f18673559c469f09af22/root/anaconda-post.log /docker/overlay/273888049ba06c3ed1a03e7bd3dd8ffa68d3603d6680963c3b07b5a7aa567c77/tmproot429554697/anaconda-post.log: invalid cross-device link.
See 'docker run --help'.

@GordonTheTurtle GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Oct 24, 2016
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Oct 24, 2016
@albamc
Copy link
Author

albamc commented Oct 24, 2016

@amir73il
Quota is applied both upper/lower directories because my code set quota to container root (/var/lib/docker/overlay/#containerid)
I also tested apply quota to upper directory, but it doesn't work.(/var/lib/docker/overlay/#containerid/upper)
I think overlayfs doesn't recognize upper directory's quota limits. (any suggestions ?)
And I think quota limits is same to overlay2 (limited amount of modifications to base fs) because overlay only store lower-id files in container root directory. (lower layer is stored in another directory)

@cpuguy83
Copy link
Member

@albamc Let me know if you want some help with this.

@albamc
Copy link
Author

albamc commented Apr 10, 2017

@cpuguy83
Yes. setting quota for all upper directories is a problem.
But If I set it only rw layer only, quota doesn't seems to work.
I still can not find how to set rw layer only and quota is works well.
Any idea or suggestions ?

@cpuguy83
Copy link
Member

@albamc cpuguy83@6048a98

It's been a week or so since I've tried it, but IIRC this worked for me.

@albamc
Copy link
Author

albamc commented Apr 13, 2017

@cpuguy83
Umm... maybe I tried wrong way to set quota.
I'll test your patches soon.
Do you want me to test it and apply it to this PR ?

@cpuguy83
Copy link
Member

@albamc It's definitely not in perfect condition, but somewhere to start.

@cpuguy83
Copy link
Member

@albamc Any news?

@albamc
Copy link
Author

albamc commented May 22, 2017

@cpuguy83
Sorry for late answer.
I found this error and try to solve problems.

$ docker run -d --name g30 --storage-opt size=30g nginx
docker: Error response from daemon: open /var/lib/docker/overlay/8be34a39540846ae59170a7cdeeca6883ee1bf4761ff265de46fbfc4e5e9abca-init/merged/dev/console: invalid cross-device link.
$ docker run -d --name g30 --storage-opt size=30g nginx
docker: Error response from daemon: mkdir /var/lib/docker/overlay/3cc62e831bb7b1036afb0f5795826f03a46a97789e9b2cb3e05a0d173138e1fa-init/merged/dev/shm: invalid cross-device link.

@amir73il
Copy link
Contributor

@albamc the reason for the error is written in my original PR #24771 under section Why only overlay2?:
"overlay (1) cannot use project quotas, because hardlinks are not allowed across
directories that are assigned to different project quotas."
/merged/ is a directory created from hardlinks from several other lower layers, so merged dir must have the same project id (0) as all the lower layers it combines.
Probably the only way to make this work is set the quota on upper dir only (not sure why that didn't work for you) and make sure that when upper is committed as an image that could be later used as lower, that its content is copied to a directory with no pquota, or all files and dirs in upper get set to project id 0.

@cpuguy83
Copy link
Member

ping

@thaJeztah
Copy link
Member

ping @albamc do you have time to work on this, or should this be closed? The command line reference docs should now be moved to the https://github.com/docker/cli repository, thus split from this PR

@thaJeztah thaJeztah added this to backlog in maintainers-session Jul 20, 2017
@thaJeztah
Copy link
Member

@albamc if you don't have time to work on this, let us know as well, then we can look if someone can carry this (perhaps @amir73il is interested 😇 )

@thaJeztah thaJeztah removed this from backlog in maintainers-session Jul 20, 2017
@amir73il
Copy link
Contributor

I'll go ahead and forward that 👼 to @imkin who did a fine job on refactoring the storage options code in #32977 and to @cpuguy83 who has already offered his help on this PR and claims to have been able to solve the hardlink copying problem. My opinion remains that this PR is as good as overlay2.size in it's current form, give or take the improvements of #32977 and that IMO copying instead of hardlinks in case of commiting a limited quota container is simply good enough for overlay driver. As always I'm available for review. Cheers

@albamc
Copy link
Author

albamc commented Jul 23, 2017

@thaJeztah
I tried to apply @cpuguy83 's comments (set quota to upper directory) but I could not found any solutions.
If this PR is able to merge without applying set quota to upper directory, I would like to merge with additional works,
However if it's required to set quota to upper for merge, It's ok for someone else to solve this problems...

@amir73il
Copy link
Contributor

amir73il commented Jul 23, 2017

@dmcgowan I think this thread has gone into an unnecessary spin, partly on my account, because I did not study how overlay driver works before replying on this thread. When you commented on December that quota should be applied only to upper to "match the behavior of the overlay2 quota" I did not read enough into it, but looking back, I do think that @albamc current work is a match to the overlay2 quota behavior, unless I am missing something.

IIUC, overlay driver creates 2 types of dirs: images (ro) and containers (rw). is that correct?
For images, quota will never be applied by this change which only affects creating containers.
For containers, there is nothing that takes disk usage in the container dir besides 'upper' and 'work',
which MUST be on the same project id anyway, so there is really no point whatsoever to apply
project quota to anything else but the root container dir.

Now with ApplyDiff(), the solution that @albamc has implemented is really the only solution I can thing of, because as discussed on PR #32977, quota should NOT apply to images, and the only way to "release" the changed files out of the project quota account is by copying them to the new root dir.

IMO, this PR should be rebased, an additional commit to match PR #32977 should be applied and it should be ready to go.

@dmcgowan do you agree with my analysis? If not, please explain where am I wrong.

w.r.t. @cpuguy83 comments about re-factoring, I do agree with them, but I suggest that since they apply to both overlay and overlay2, that this re-factoring be postponed to after merging this PR and perhaps done by someone who knows more about the design of the graphdriver.

@bklau
Copy link

bklau commented Aug 11, 2017

@amir73il @thaJeztah So if I got this correct: If the storage driver is "overlay2" and the XFS is used, then quota will be automatically ENABLED?. Is this correct?. No other daemon start options needed right? Pls clarify. Thx.

@dmcgowan
Copy link
Member

@amir73il sorry I didn't respond sooner. I was coming up with a response but wasn't quite sure. Looking at how the quota is not applied on ApplyDiff, it does work as intended. It still feels hacky to me but in practice, we don't use the ro and rw methods. My only concern would be any changes to how the graph driver interface is used could cause unintended side effects with quota. I don't foresee that being a problem nor are we looking to make changes to this interface or how it is used. I agree we should rebase and get it in.

@dmcgowan
Copy link
Member

@bklau this PR is not related to overlay2, quotas must always be explicitly enabled for this PR and overlay2

@bklau
Copy link

bklau commented Aug 12, 2017

@dmcgowan Hi Derek, how is the "quota" enabled?. Any special start option flags
needed to be set for the Docker daemon?.

@thaJeztah thaJeztah added this to backlog in maintainers-session Sep 28, 2017
@thaJeztah
Copy link
Member

ping @albamc can you rebase this PR? Looks like we may be able to get it in (looking at @dmcgowan's last comment)

@thaJeztah thaJeztah removed this from backlog in maintainers-session Sep 28, 2017
@albamc
Copy link
Author

albamc commented Oct 13, 2017

@thaJeztah
Oops. I'll rebase it soon. sorry!

Signed-off-by: albam.c <albam.c@navercorp.com>
@vdemeester
Copy link
Member

ping @albamc what's the status here ?

@AkihiroSuda
Copy link
Member

This PR has been open for almost two years, but I guess people have already moved to overlay2.

Can we close this for now?

cc @dmcgowan @stevvooe

@vdemeester
Copy link
Member

@AkihiroSuda let's close it for now.
@albamc thx for the contribution, and please comment if you still wanna work on this, I'll re-open 😉

@vdemeester vdemeester closed this Mar 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet