Skip to content

Latest commit

 

History

History
76 lines (46 loc) · 4.94 KB

prefetch.md

File metadata and controls

76 lines (46 loc) · 4.94 KB

Cache and Prefetch

Nydus stores blobs in OCI compatible registry, OSS (Aliyun Object Storage Service) and local file system. Regarding to local filesystem storage backend, it doesn't mean blob can only be stored on local disk. It can surely be stored in the NAS device, which can be accessed by POSIX file system interfaces.

Nydus divides a single regular file into segments by 1MB size which will be compressed by a configurable compressor like lz4, etc. The compressed segments are called chunk. Chunks are contiguously arranged within a blob file.

The underlying technique of prefetch is that it tries to merge several backend reads into one that spans a wider range. Moreover, users usually know what files are more likely to be read when container starts.

Prefetch policy:

When converting or creating nydus image, we can add an option to tell nydus-image which files should be prefetched when nydusd is started. This is accomplished by option --prefetch-policy <prefetch-policy>. Prefetch-policy has 3 possible values:

  • fs
  • blob
  • none

With option prefetch-policy, nydus-image tries to read stdin to gather a list of files that are proposed to prefetch. The list can have both regular files and directories, even a file belongs to a directory that is also in the same list.

Note that, fs_prefetch has to be enabled in rafs configuration file if prefetch is required.

1. File System Level

Nydus issues prefetch requests to backend and pulls needed chunks to local storage. So read IO can hit the blobcache which was previously filled by prefetch. Speaking of file system level prefetch, the prefetch request is issued from Rafs layer. So it is easier to better understand about files layout on disk, the relationship between files and directories. Prefetch works on top of file system is born of agility and very nimble.

Prefetch is configurable by Rafs configuration file.

  • threads_count

    It has its own background workers to do prefetch. By this parameter, nydus can start different quantity of threads to issue read from backend storage in parallel.

  • merging_size

    The upper limit of request size to backend storage. In unit of bytes.

  • bandwidth_rate

    In unit of bytes. In order to mitigate possible backend bandwidth contention, we can give a bandwidth rate limit to prefetch. Note that the bandwidth_rate sets the limit to the aggregated backend bandwidth consumed by all the threads configured by threads_count. So with a lower bandwidth_rate limit, more prefetch threads might be meaningless.

A rafs configuration file (only $.fs_prefetch shows, other properties are omitted) follows:

{
  "fs_prefetch": {
    "enable": true,
    "threads_count": 4,
    "merging_size": 131072,
    "bandwidth_rate": 10485760
  }
}

1.1 Prefetch Hints

nydus-image statically and permanently writes a list of inode numbers to prefetch table of minimal size to bootstrap. The prefetch table will give a hint to nydus when it is mounted how to prefetch files from storage backend.

1.2 Dynamically Specified Files

Thanks to rafs disk layout, even no prefetch hint was given when creating nydus image, we can still provide option --prefetch-files <prefetch-files>... to nydusd. Afterwards rafs will prefetch those files specified in the list when the mount is initiated. If fortunately enough, rafs tries best to merge backend read requests to reduce latency. A good practice for this is to provide directories which is more possible to get merged to raise prefetch efficiency. Please be aware of the fact that this method to initiate prefetch does not conflict with "prefetch hints" stored in bootstrap prefetch table. In fact, rafs will firstly try to load prefetch table and then takes the specified files list into account.

1.3 Prefetch policy (future work)

Nydus can now only prefetch data from backend by an explicit hint either from prefetch table or command line starting flag. No globally configured prefetch policy as below is available:

  • Prefetch all the blobs entirely from backend slowly and in low priority.
  • User IO triggered, block-level readahead.
  • Prefetch the parent directory if one of its child is read.

2. Blob Level

Not like file system level prefetch, blob level prefetch directly pre-fetches a contiguous region from blob when nydusd started. This prefetch procedure is not aware of files, directories and chunks structures. When creating nydus image, a range descriptor composed of readahead_offset and readahead_length is written bootstrap. But blob level prefetch won't cache any data into blobcache or any other kind of cache. It works at StorageBackend level which is lower than RafsCache. For now, blob level prefetch only benefits LocalFs specific backend. In particular, LocalFs backend can perform Linux system call readahead(2) to load data from readahead_offset up to readahead_length bytes.

Sequence Diagram demonstrating prefetch workflow

Container Cold Startup