Run KinD example got memory leak. #88

sequix · 2020-04-29T08:36:18Z

Environment:
CentOS Linux release 7.7.1908 (Core)
Linux instance-aai99beh 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Docker version 19.03.8, build afacb8b

After running KinD example on my server, the memory usage of stargz-snapshotter increased exponentially. At the end, I have to reboot my server...

ktock · 2020-05-07T06:46:03Z

@sequix Sorry for the late reply (I was taking a 1-week vacation). Seems filesystem uses too much memory. Could you try it after #89 got merged?

sequix · 2020-05-07T06:54:16Z

no worries, I took a week too ^v^.

ktock · 2020-05-08T09:05:15Z

@sequix Memory usage has been improved in #89. Can you try this again?

sequix · 2020-05-08T13:03:41Z

Tried again, no more memory leak, great job.

I wonder will this snapshotter cache original OCI image as well, if so, how?

ktock · 2020-05-11T03:23:03Z

@sequix Great! Thanks for the retrying!

I wonder will this snapshotter cache original OCI image as well, if so, how?

Yes, we cache layer data in chunk granularity and have caches for some Readers used in this snapshotter. For understanding them, let's look at Readers used when a file on the filesystem is read.

*Reader.file.ReadAt corresponds to the inode is called. It reads decompressed file contents from stargz lib (*github.com/google/crfs/stargz.fileReader.ReadAt) and returns it to the kernel.
*github.com/google/crfs/stargz.fileReader recognizes required files and gzip boundaries on the layer. It queries the gzip-compressed chunk in the layer using *Remote.blob.ReadAt and decompresses this chunk.
*Remote.blob fetches chunks in the layer blob from the registry. We want to minimize the num of requests so *Remote.blob.ReadAt fetches and caches neighbouring few KB/MB data in addition to the required chunk range.

We have two caches in *Reader.file and *Remote.blob. *Reader.file is for caching decompressed file contents and *Remote.blob is for caching raw (gzip-compressed) chunks of layers. Obviously, there is duplication among these caches and we should integrate them to a single cache in the future. But in this post, I describe the context why we have these two separated caches.

As mentioned above, we want to minimize the num of requests to registries by fetching and caching neighbouring few KB/MB data by a single HTTP Range Request, in addition to the range required by kernel. But this is difficult for the stargz lib client (*Reader.file) because the client can only view targetting file entry from Reader(*github.com/google/crfs/stargz.fileReader) passed by the stargz lib and the offset/size of neighbouring file entries (and gzip boundaries) are invisible.

For solving this, we implemented another cache under the stargz lib (*Remote.blob) for caching raw (gzip-compressed) data, which allows us to view the whole layer blob and fetch and cache neighbouring additional bytes. *Remote.blob is agnostic to the file/gzip boundary so this can't decompress data. The decompression is done in the stargz lib (*github.com/google/crfs/stargz.fileReader.ReadAt) and the decompressed target file contents are cached in *Reader.file. This can reduce the time to take for decompression for later access.

The ideal implementation is,

having one cache in *Reader.file for decompressed file contents and
enabling it to get and cache neighbouring few KB/MB in addition to the range required by kernel.
Maybe we need to fork and patch the stargz lib.

ktock mentioned this issue May 7, 2020

Improve memory usage #89

Merged

ktock added the bug Something isn't working label May 8, 2020

sequix closed this as completed May 11, 2020

ktock mentioned this issue Jun 1, 2020

Improve file read performance #105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run KinD example got memory leak. #88

Run KinD example got memory leak. #88

sequix commented Apr 29, 2020

ktock commented May 7, 2020

sequix commented May 7, 2020

ktock commented May 8, 2020

sequix commented May 8, 2020

ktock commented May 11, 2020

Run KinD example got memory leak. #88

Run KinD example got memory leak. #88

Comments

sequix commented Apr 29, 2020

ktock commented May 7, 2020

sequix commented May 7, 2020

ktock commented May 8, 2020

sequix commented May 8, 2020

ktock commented May 11, 2020