Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run KinD example got memory leak. #88

Closed
sequix opened this issue Apr 29, 2020 · 5 comments
Closed

Run KinD example got memory leak. #88

sequix opened this issue Apr 29, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@sequix
Copy link

sequix commented Apr 29, 2020

Environment:
CentOS Linux release 7.7.1908 (Core)
Linux instance-aai99beh 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Docker version 19.03.8, build afacb8b

After running KinD example on my server, the memory usage of stargz-snapshotter increased exponentially. At the end, I have to reboot my server...

image

@ktock
Copy link
Member

ktock commented May 7, 2020

@sequix Sorry for the late reply (I was taking a 1-week vacation). Seems filesystem uses too much memory. Could you try it after #89 got merged?

@sequix
Copy link
Author

sequix commented May 7, 2020

no worries, I took a week too ^v^.

@ktock ktock added the bug Something isn't working label May 8, 2020
@ktock
Copy link
Member

ktock commented May 8, 2020

@sequix Memory usage has been improved in #89. Can you try this again?

@sequix
Copy link
Author

sequix commented May 8, 2020

Tried again, no more memory leak, great job.

I wonder will this snapshotter cache original OCI image as well, if so, how?

@sequix sequix closed this as completed May 11, 2020
@ktock
Copy link
Member

ktock commented May 11, 2020

@sequix Great! Thanks for the retrying!

I wonder will this snapshotter cache original OCI image as well, if so, how?

Yes, we cache layer data in chunk granularity and have caches for some Readers used in this snapshotter. For understanding them, let's look at Readers used when a file on the filesystem is read.

  1. *Reader.file.ReadAt corresponds to the inode is called. It reads decompressed file contents from stargz lib (*github.com/google/crfs/stargz.fileReader.ReadAt) and returns it to the kernel.
  2. *github.com/google/crfs/stargz.fileReader recognizes required files and gzip boundaries on the layer. It queries the gzip-compressed chunk in the layer using *Remote.blob.ReadAt and decompresses this chunk.
  3. *Remote.blob fetches chunks in the layer blob from the registry. We want to minimize the num of requests so *Remote.blob.ReadAt fetches and caches neighbouring few KB/MB data in addition to the required chunk range.

We have two caches in *Reader.file and *Remote.blob. *Reader.file is for caching decompressed file contents and *Remote.blob is for caching raw (gzip-compressed) chunks of layers. Obviously, there is duplication among these caches and we should integrate them to a single cache in the future. But in this post, I describe the context why we have these two separated caches.

As mentioned above, we want to minimize the num of requests to registries by fetching and caching neighbouring few KB/MB data by a single HTTP Range Request, in addition to the range required by kernel. But this is difficult for the stargz lib client (*Reader.file) because the client can only view targetting file entry from Reader(*github.com/google/crfs/stargz.fileReader) passed by the stargz lib and the offset/size of neighbouring file entries (and gzip boundaries) are invisible.

For solving this, we implemented another cache under the stargz lib (*Remote.blob) for caching raw (gzip-compressed) data, which allows us to view the whole layer blob and fetch and cache neighbouring additional bytes. *Remote.blob is agnostic to the file/gzip boundary so this can't decompress data. The decompression is done in the stargz lib (*github.com/google/crfs/stargz.fileReader.ReadAt) and the decompressed target file contents are cached in *Reader.file. This can reduce the time to take for decompression for later access.

The ideal implementation is,

  • having one cache in *Reader.file for decompressed file contents and
  • enabling it to get and cache neighbouring few KB/MB in addition to the range required by kernel.
    Maybe we need to fork and patch the stargz lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants