Skip to content

FUSE file system with transparent access to zip files as if they were folders.

License

Notifications You must be signed in to change notification settings

christophgil/ZipROFS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZipROFS

Build Status

ZipROFS is a FUSE file-system that acts as pass through to another FS except it expands zip files like folders and allows direct transparent access to the contents.

We created a branch of ZipROFS to adopt it for the needs of mass spectrometry software. Our mass spectrometry records are stored in ZIP files:

File tree with zip files on NAS server:
 ├── brukertimstof
 │   └── 202302
 │       ├── 20230209_hsapiens_Sample_001.d.Zip
 │       ├── 20230209_hsapiens_Sample_002.d.Zip
 │       └── 20230209_hsapiens_Sample_003.d.Zip

...

With the original version of ZipROFS we would see folders ending with .d.Zip. However, the software requires folders ending with .d like this:

Virtual file tree presented by ZipROFS:
 ├── brukertimstof
 │   └── 202302
 │       ├── 20230209_hsapiens_Sample_001.d
 │       │   ├── analysis.tdf
 │       │   └── analysis.tdf_bin
 │       ├── 20230209_hsapiens_Sample_002.d
 │       │   ├── analysis.tdf
 │       │   └── analysis.tdf_bin
 │       └── 20230209_hsapiens_Sample_003.d
 │           ├── analysis.tdf
 │           └── analysis.tdf_bin
 

A current problem is that computation is slowed down with ZipROFS compared to conventional file systems.

The reason lies within the closed source shared library timsdata.dll. Reading proprietary mass spectrometry files with this library creates a huge amount of file system requests. These many requests have to pass the user-space-kernel boundary. Another reason for reduced performance is that file reading is not sequential.

To solve the performance problem, we

  • Re-implement ZipROFS using the language C: ZIPsFS.

  • Catching calls to the file API using the LD_PRELOAD technique. Filtering the calls and implementing a cache for directory listings: cache_readdir_stat

Dependencies

  • FUSE
  • fusepy

Limitations

  • Read only
  • Nested zip files are not expanded, they are still just files

Example usage

To mount run ziprofs.py:

$ ./ziprofs.py ~/root ~/mount -o allowother,cachesize=2048

Example results:

$ tree root
root
├── folder
├── test.zip
└── text.txt

$ tree mount
mount
├── folder
├── test.zip
│   ├── folder
│   │   ├── emptyfile
│   │   └── subfolder
│   │       └── file.txt
│   ├── script.sh
│   └── text.txt
└── text.txt

You can later unmount it using:

$ fusermount -u ~/mount

Or:

$ umount ~/mount

Full help:

$ ./ziprofs.py -h
usage: ziprofs.py [-h] [-o options] [root] [mountpoint]

ZipROFS read only transparent zip filesystem.

positional arguments:
  root        filesystem root (default: None)
  mountpoint  filesystem mount point (default: None)

optional arguments:
  -h, --help  show this help message and exit
  -o options  comma separated list of options: foreground, debug, allowother, async, cachesize=N (default: {})

foreground and allowother options are passed to FUSE directly.

debug option is used to print all syscall details to stdout.

By default ZipROFS disables async reads to improve performance since async syscalls can be reordered in fuse which heavily impacts read speeds. If async reads are preferable, pass async option on mount.

cachesize option determines in memory zipfile cache size, defaults to 1000

About

FUSE file system with transparent access to zip files as if they were folders.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.9%
  • Shell 10.1%