Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of sparse files in container images #1091

Closed
alicefr opened this issue Dec 20, 2021 · 8 comments · Fixed by #1102
Closed

Support of sparse files in container images #1091

alicefr opened this issue Dec 20, 2021 · 8 comments · Fixed by #1102

Comments

@alicefr
Copy link
Contributor

alicefr commented Dec 20, 2021

Today, it isn't possible to preserve the sparse files if they are copied inside a container image. This is reported as a limitation in the OCI spec. A similar issue has been reported in buildkit. Do we have any chances in the future to support this?

My particular use case affects KubeVirt and container disks. I reported the issue in kubevirt/kubevirt#6976 . In KubeVirt , we use container images to ship VM images and the spare files allow us to reduce the final container image size. I understand that this is a very specific use case but the feature might be useful for other situations.

@alicefr alicefr changed the title Support of sparse files Support of sparse files in container images Dec 20, 2021
@giuseppe
Copy link
Member

It sounds like a useful feature, and we can probably extend our format to support them (I've not looked though into it, so I am not sure how difficult it would be with the existing Go libraries for handling tarballs).

Should the holes to be automatically detected (like cp --sparse= can do) or use what the file system tells us (via lseek(SEEK_HOLE)?

@alicefr
Copy link
Contributor Author

alicefr commented Dec 22, 2021

It seems an old missing feature in go: golang/go#13548

@giuseppe
Copy link
Member

could we circumvent the limitation and inject the missing metadata in the tar header?

@alicefr
Copy link
Contributor Author

alicefr commented Dec 22, 2021

I haven't dug too much into the technical alternatives. However, I think the main issue might be that we need to have the ability to write the missing metadata at build time and read it when we pull and extract the layer. So, this depends on which tool we used to build the image.

@chainmail
Copy link

Hi,
I'm excited to see this coming back and with a positive chance of implementation. Our application stack uses sparse files heavily and means our test framework containers are really very big in some cases (10's of Gb). If you need any one to do some testing, please give me a shout.

@giuseppe
Copy link
Member

I think we could add it as part of the zstd:chunked format based on top of: #1092

I don't think it is worth trying to add this information to the tar stream itself, as anyway the compression deals with sparse files quite well.

giuseppe added a commit to giuseppe/storage that referenced this issue Jan 12, 2022
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

I've opened a PR for sparse files support in zstd:chunked: #1102

I think this is the best we can achieve with the current container images.

The holes are automatically detected. The threshold is currently hardcoded to 1kb. So it doesn't "preserve" the existing information, but this would require a lot of changes in the container tools, because we first create the tar stream and pass it around without any access to the original file additional metadata (including holes).

@alicefr is it fine for your use case?

A simple test:

$ (dd if=/dev/zero of=/dev/stdout bs=1 count=1M; date; dd if=/dev/zero of=/dev/stdout bs=1 count=1M) > zeros
$ cat Containerfile 
FROM scratch
COPY zeros /
$ podman build --format oci -t 192.168.125.1:5000/sparse
$ podman push --format oci --compression-format zstd:chunked 192.168.125.1:5000/sparse

the generated layer:

$ skopeo copy docker://192.168.125.1:5000/sparse oci:/tmp/sparse-oci
Getting image source signatures
WARN[0000] Compressor for blob with digest sha256:ffeb0a38199ca1cc2bba94cee8acf4c8d256150b62fc6423e46f43ec21f8fe15 previously recorded as zstd:chunked, now zstd 
Copying blob ffeb0a38199c done  
Copying config e5c2b3fa5f done  
Writing manifest to image destination
Storing signatures
$ ls -ln /tmp/sparse-oci/blobs/sha256/
total 12
-rw-r--r--. 1 1000 1000 699 Jan 12 11:05 3c36a2024e7d6e0c84fb969f8562bb6944c15b06e2f79bccb998d62a0ed8a325
-rw-r--r--. 1 1000 1000 497 Jan 12 11:05 e5c2b3fa5f3f863e13afcdc59e39435dad5e2fb39384854ae1d65ad0b55df295
-rw-r--r--. 1 1000 1000 689 Jan 12 11:05 ffeb0a38199ca1cc2bba94cee8acf4c8d256150b62fc6423e46f43ec21f8fe15

and from the receiver side:

$ bin/podman pull 192.168.125.1:5000/sparse
Trying to pull 192.168.125.1:5000/sparse:latest...
Getting image source signatures
Copying blob ffeb0a38199c done  689.0b / 689.0b (skipped: 278.0b = 40.35%)
Copying config e5c2b3fa5f done  
Writing manifest to image destination
Storing signatures
e5c2b3fa5f3f863e13afcdc59e39435dad5e2fb39384854ae1d65ad0b55df295

$ podman unshare sh -c 'find ~/.local/share/containers/storage/ -name zeros -type f | xargs stat'
  File: ~/.local/share/containers/storage/overlay/ffeb0a38199ca1cc2bba94cee8acf4c8d256150b62fc6423e46f43ec21f8fe15/diff/zeros
  Size: 2097184   	Blocks: 128        IO Block: 4096   regular file
Device: fd02h/64770d	Inode: 275206130   Links: 1
Access: (0664/-rw-rw-r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:container_ro_file_t:s0
Access: 2022-01-12 11:03:00.790172814 +0100
Modify: 2022-01-12 11:03:00.794172844 +0100
Change: 2022-01-12 11:03:00.805172927 +0100
 Birth: 2022-01-12 11:03:00.790172814 +0100

@alicefr
Copy link
Contributor Author

alicefr commented Jan 12, 2022

@giuseppe I'll give a try and let you know :) thanks

giuseppe added a commit to giuseppe/storage that referenced this issue Jan 12, 2022
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jan 12, 2022
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jan 12, 2022
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jan 12, 2022
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jan 13, 2022
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jan 13, 2022
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jan 13, 2022
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants