Instead of calling pread multiple times on a shared io context, mmap up to 'maxmmap' of a file and call memcpy for read requests contained in that area. This is intended to help parallel executable load time. The default value of 'maxmmap' is 4MB. It can be changed via the diod.conf maxmmap variable or the diod server -m,--maxmmap command line option.
This is a candidate performance/scalability enhancement for the case where a parallel job tries to load many shared libraries, or otherwise read the same files simultaneously. We were running out of file descriptors in the Pynamic test case, for example, when running with ~2048 tasks opening around 500 files each. First, upon walking a fid to a pathname, put the pathname in a hash and create references to it rather than recreating it in every fid. This should save a bit of memory. It also provides a place where fids associated with the same file can coordinate sharing. Next, encapsulate the file descriptor and other "open file" state formerly stored in the fid in an IOCtx struct, linked to the fid. In addition to linking to the fid, add it to a linked list on the Path struct. If a path hashes to an existing entry, and a candidate IOCtx exists on the path's list, evaluate the criteria for sharing the IOCtx. The criteria are: - must have same path - must refer to a regular file (not directory, etc) - must be opened by same user - must be opened with identical open flags - must be opened O_RDONLY (no writing) The path hash can be viewd by monitoring ctl:files, which has the format: <refs> <tot open> <act open> <path> Advisory locking would have been complicated by this change were we not promoting record locking to full file locking. Since only read-only file descriptors are shared, there should be no issue with promotion of read locks to write locks. There is a global feature flag to enable sharing that is turned on by default in this patch. TODO: add config file support to enable on a per-file system basis, and make it off by default.
We set req->flushreq when walking the work queue holding srv->lock. The original reply is sent while req is still in the work queue, thus there is a race where flushreq could be set just after the postprocess function tests it and the reply discarded. Defer the flushreq reply until after the req has been removed from the work queue.