chunked: various improvements #1092

giuseppe · 2021-12-21T20:02:54Z

various improvements for the pkg/chunked package.

Noteworthy:

support splitting a file into multiple chunks and perform network deduplication.
local metadata cache for each layer. It avoids parsing the json metadata file for each layer in the storage.
use valyala/gozstd instead of github.com/klauspost/compress. It is much faster.
copy local files from multiple goroutines.
use json-iterator to improve JSON manifest parsing
reuse a static buffer for io operations

More details in each commit.

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

giuseppe · 2021-12-21T20:03:57Z

It is still a draft as it is barely tested, but I think this could help with the deduplication of the large Go binaries we were discussing.

The deduplication, in this case, would be just for the network data since locally it is not usable as reflinks require the data to be aligned.

cgwalters

Can you toss up some sort of even brief description or diagram (e.g. https://asciiflow.com/#/ ) of how the chunks are stored? How does it relate to the container layers, etc? It's hard to just jump in and look at code without that.

Also, it'd really help build confidence in this to have even some basic unit tests. See e.g. https://github.com/ostreedev/ostree/blob/main/tests/test-rollsum.c

The ostree static delta code which uses this is also tested by unit tests that verify that it regenerates exactly the same files.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2021-12-23T16:08:13Z

Can you toss up some sort of even brief description or diagram (e.g. https://asciiflow.com/#/ ) of how the chunks are stored? How does it relate to the container layers, etc? It's hard to just jump in and look at code without that.

Also, it'd really help build confidence in this to have even some basic unit tests. See e.g. https://github.com/ostreedev/ostree/blob/main/tests/test-rollsum.c

The ostree static delta code which uses this is also tested by unit tests that verify that it regenerates exactly the same files.

yes, I need to write some documentation about it.

The metadata JSON file is based on the same format as estargz: https://github.com/containerd/stargz-snapshotter/blob/main/docs/estargz.md, chunks are represented using the estargz TOC file.

For the tarball itself, there is not much difference, the only difference is that the compression must be restarted when a chunk is created, so we can pick the offset and use it for generating the TOC JSON metadata file that is appended at the end of the tarball.

Locally there is no difference in how the files are stored. We lookup the layers metadata to know what files/chunks are present. If a chunk is already present locally, we read directly from the checked-out file.

e.g. we have the following entries in the metadata for a layer present in the local store:

 {
      "type": "reg",
      "name": "usr/bin/podman",
      "mode": 493,
      "size": 49984904,
      "modtime": "2021-12-23T13:10:02+01:00",
      "accesstime": "0001-01-01T00:00:00Z",
      "changetime": "0001-01-01T00:00:00Z",
      "digest": "sha256:8270b870b26cdca16bc158f6d230a1d6ca1d3d7503b2f97d9bcf85dbb6327f1b",
      "offset": 131,
      "endOffset": 17374600,
      "chunkSize": 1625,
      "chunkDigest": "sha256:1b1a58f4ba739b5d076472a4559c15a7c31b20182c7e1abd5bb55761bd185a8b"
    },
    {
      "type": "chunk",
      "name": "usr/bin/podman",
      "offset": 1055,
      "chunkSize": 9188,
      "chunkOffset": 1625,
      "chunkDigest": "sha256:5d7700542a3069fecd4cdfc5f1e1f5f7f50f5109e78ff137267be199af52f724"
    },
    {
      "type": "chunk",
      "name": "usr/bin/podman",
      "offset": 4720,
      "chunkSize": 2799,
      "chunkOffset": 10813,
      "chunkDigest": "sha256:ebb089e3b60dac8a238d1154e9259efee9ffd298d3da1ba1d13a34c3eda9340a"
    },

if we pull a new image that has a chunk with the digest "chunkDigest": "sha256:ebb089e3b60dac8a238d1154e9259efee9ffd298d3da1ba1d13a34c3eda9340a" then we copy it from the file $LAYER_CHECKOUT/usr/bin/podman at offset 2799 for 2799 bytes.

Loading the JSON metadata for each layer in the storage is quite expensive, this is something that could be improved, but I've not looked into it yet.

I've done some tests with appending data to a big file, and the rolling checksum seems to help a lot. Unfortunately, I don't see a big improvement with Go binaries (I've tried with podman version 3.4.0 and podman version 3.4.4 and I get ~12% dedup if I use 13 bits for the rolling checksum, and ~%5 if using 16 bits as it is set now). With 13 bits, there are too many chunks generated and that has an impact on the final tarball size. I wonder if there is anything we can do with the linker settings to generate more similar binaries

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

it solves a problem where the discard could be performed before the compression handler was closed (through a deferred call). Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

avoid using slices. I've seen a drop of ~20M in memory usage with a fedora image. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

a reference to the just created parent directory is already opened, so use it directly. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

avoid parsing each json TOC file for the layers in the local storage, but attempt to create a lookaside cache in a custom format faster to load (and potentially be mmap'able). The same cache is used to lookup files, chunks and candidates for deduplication with hard links. There are 3 kind of digests stored: - digest(file.payload)) - digest(digest(file.payload) + file.UID + file.GID + file.mode + file.xattrs) - digest(i) for each i in chunks(file payload) Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

reduce the number of allocations done by the parser by reading into a bytes.Buffer. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

rhatdan · 2022-01-10T16:45:27Z

LGTM

giuseppe force-pushed the chunked-intra-files branch from 17460df to 2d47880 Compare December 21, 2021 20:03

giuseppe force-pushed the chunked-intra-files branch 4 times, most recently from 989f9c0 to 98c459b Compare December 22, 2021 14:35

cgwalters reviewed Dec 22, 2021

View reviewed changes

giuseppe force-pushed the chunked-intra-files branch from 98c459b to d5ffb23 Compare December 22, 2021 16:14

giuseppe added 4 commits December 23, 2021 14:11

chunked: change zstd default compression level

315d4fb

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: move check to helper function

2e6acd4

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: drop argument mode

8fabddd

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: rename types

20282b3

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the chunked-intra-files branch from d5ffb23 to 432e3b9 Compare December 23, 2021 15:32

giuseppe changed the title ~~[WIP] chunked: split files using a rolling checksum~~ chunked: split files using a rolling checksum Dec 23, 2021

giuseppe force-pushed the chunked-intra-files branch 5 times, most recently from b92cb8e to 5dd63b3 Compare December 24, 2021 12:22

giuseppe added 5 commits December 24, 2021 13:28

chunked: allow streaming to the same file

8e67467

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: add infra to dedup from partial file

22ba9b0

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: support copy from uncompressed stream

24b99d1

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: move discard call to outer scope

dce078f

it solves a problem where the discard could be performed before the compression handler was closed (through a deferred call). Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: move cache to separate file

f18141f

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the chunked-intra-files branch 5 times, most recently from 0795a4d to 45e6852 Compare December 24, 2021 16:48

giuseppe added 4 commits January 7, 2022 21:28

chunked: omit empty fields in json manifest

96fc5c8

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: use rolling checksum to split files

54096ca

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

storage: reuse zstd decoder

4e8554e

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: use a static buffer for io operations

3736053

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the chunked-intra-files branch 3 times, most recently from b4340ae to c2defb9 Compare January 10, 2022 10:22

giuseppe changed the title ~~chunked: split files using a rolling checksum~~ chunked: various improvements Jan 10, 2022

giuseppe added 17 commits January 10, 2022 11:26

chunked: move file close to separate goroutine

b04b70a

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: optimize mergeTocEntries

12e9b99

avoid using slices. I've seen a drop of ~20M in memory usage with a fedora image. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: split appendCompressedStreamToFile

834db5a

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: use just created parent directory

a5f0cdd

a reference to the just created parent directory is already opened, so use it directly. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: use json-iterator

048f7c0

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: improve json parsing

0621da7

reduce the number of allocations done by the parser by reading into a bytes.Buffer. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: avoid duplicated file path in error message

ed714c7

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: factor out function

9b95fc1

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: use just created parent directory

0d21b61

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: use a RWMutex for the cache

31b28db

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: copy local files from multiple goroutines

63be926

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: validate chunk digest

2edca4e

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

vendor add valyala/gozstd

7ac0e7b

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

archive: use valyala/gozstd

a3abf19

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: use valyala/gozstd

5bb6d8e

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

vendor: drop github.com/klauspost/compress

812c8bd

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the chunked-intra-files branch from c2defb9 to 812c8bd Compare January 10, 2022 10:29

giuseppe mentioned this pull request Jan 10, 2022

Support of sparse files in container images #1091

Closed

giuseppe marked this pull request as ready for review January 10, 2022 13:18

rhatdan merged commit a6837c9 into containers:main Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunked: various improvements #1092

chunked: various improvements #1092

giuseppe commented Dec 21, 2021 •

edited

Loading

giuseppe commented Dec 21, 2021 •

edited

Loading

cgwalters left a comment

giuseppe commented Dec 23, 2021

rhatdan commented Jan 10, 2022

chunked: various improvements #1092

chunked: various improvements #1092

Conversation

giuseppe commented Dec 21, 2021 • edited Loading

giuseppe commented Dec 21, 2021 • edited Loading

cgwalters left a comment

Choose a reason for hiding this comment

giuseppe commented Dec 23, 2021

rhatdan commented Jan 10, 2022

giuseppe commented Dec 21, 2021 •

edited

Loading

giuseppe commented Dec 21, 2021 •

edited

Loading