Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File system friendly CAS storage #568

Open
walles opened this issue Oct 28, 2020 · 5 comments
Open

File system friendly CAS storage #568

walles opened this issue Oct 28, 2020 · 5 comments

Comments

@walles
Copy link
Contributor

walles commented Oct 28, 2020

NTFS has a limit of at most 1023 hard links to each file.

And this document indicates Buildfarm won't work well with that:

A strongly recommended filesystem to back this is XFS, due to its high link counts limits per inode. A strongly discouraged filesystem is ext4, which places a hard limit of 65000 link counts per inode.

If we ever want to set up Windows workers at some point in the future, what would be the best way for us to do that?

Notes

I think it would be nice to have an on-disk storage format that:

  • Did not put millions of files in a single directory (problematic for ext4fs which is really common on Linux)
  • Did not use hard links (which is problematic on both ext4fs and NTFS to different degrees)

On ext4fs you have to tune2fs -O large_dir /dev/sda1 to cope with the number of files stored in a single directory by the current CAS storage implementation, but that won't change the hard links limitation.

I checked APFS, and at least on paper it should be fine. IIUC it can do 2bn files per directory and 2bn hard links to the same file, no idea about its performance in this situation but the amounts are there at least.

@werkt
Copy link
Collaborator

werkt commented Nov 16, 2020

There are definitely very early versions of buildfarm that did literal copies of files in order to make exec dirs, and we should be able to (and should take the opportunity to) reintroduce this strategy should be injectable, possibly through filesystem/OS type detection.

The first part of your notes is accomplished via #589 - you can now set the hex_bucket_levels as some value 0-4, to give yourself that many levels of hash-derived subdivision for cas contents.
The second part will require some strategy injection, which honestly should have been done already, to extend exec filesystems to support heavyweight copies. Presumably the benefit of running under windows is worth such a performance tradeoff.

@walles
Copy link
Contributor Author

walles commented Nov 27, 2020

Presumably the benefit of running under windows is worth such a performance tradeoff.

Just curious here, but do you know what the performance tradeoff actually was for regular files vs hard links? On any platform?

Naively I can't see why plain files would perform worse than hard links, but I assume you found there was a difference?

@werkt
Copy link
Collaborator

werkt commented Nov 27, 2020

Presumably the benefit of running under windows is worth such a performance tradeoff.

Just curious here, but do you know what the performance tradeoff actually was for regular files vs hard links? On any platform?

It is the cost to create copies of all of the files specified as inputs - read-streamed-writes to the tune of N files with M size.

Naively I can't see why plain files would perform worse than hard links, but I assume you found there was a difference?

The problem was not in performance of the executions, it was in preparing for them - with substantial numbers of very-low-wall-time operations, we could see starvation in the worker execution stages, wasting their numerous cpus waiting to get content prepared for an exec. Hardlinks and directory symlinks are required to help keep the average input processing size reasonable compared to executions.

If execution time is drastically (many times) higher than the maximum cost for a set of actions to have their files copied and mkdir'd into place, then this doesn't apply. When granularity increases, if input set culling does not happen, this is likely to get substantially (i.e. nonlinearly) worse.

@fwingerter-Ocient
Copy link

We ran into this recently. Although one option to deal with hard-link limits is to avoid hard links entirely, it seems like they do have real space-saving benefits, even on NTFS. Ideally bazel-buildfarm would treat maximum hardlink count as a basic attribute of the filesystem and create a new file (with an inode that can support another $HARDLINK_LIMIT hardlinks and its own identical copy of data) when it hits that limit on the current one.

There's a FIXME to this effect here.

@werkt
Copy link
Collaborator

werkt commented Jan 26, 2021

Agreed as to their benefits, even in limited link-count capacities. My strategic solution for this will be to create an abstract strategy for emplacing files that takes advantage of as much link-count as the filesystem is willing to give us. That will have applications beyond NTFS, in say ext4 with its 65k limit, and can still reduce the file duplications by the factor of the maximum link counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants