layer resolver: Avoid many cache misses occur when many pullings of images happen #600
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When
filesystem.Mount
is called for the first layer in an image, stargz-snapshotter resolves all layers in that image in parallel and caches these resolved layers metadata in LRU cache. Whenfilesystem.Mount
is called for the neighbouring layers of that image, the cached layer can be used for speeding up the mounts.However, when many pulling of images happen in parallel, layers are soonly evicted from the LRU cache by other image pullings and many cache misses happen. This result to many resource consumption (e.g. fd), many (duplicated) requests to the registry, etc.
This commit solves this by using TTL-based cache instead of LRU cache. TTL cache doesn't have size limitations and manages eviction using TTL of each element. This avoids the above problems of quick evictions and many cache misses when many parallel pulling of images happen. When
filesystem.Mount
of a layer of an image is called, mounts of the neighbouring layers also happen soon. And once a mount of a layer completes, reusing of the layer is managed by the snapshotter's side but not by the filesystem so we don't need to cache the layer in long term. So TTL-based cache should be a better choice than LRU cache here.max fd consumption comparison
The following client command mounts 7 images (74 snapshots) in parallel.
Each snapshot consumes 1 fd for fuse, 2 fds (at most) for registry connection.
So stargz-snapshotter consumes around
3 * number_of_snapshots
fds (+ misc fds for local cache, etc.).Main branch consumes much more fds than expected because of lots of cache misses and duplicated layer resolves.