An work-in-progress experimental storage-agnostic filesystem representation.
This project began as a way of sharing state for a transportable agent system I started building as a hobby project between undergrad and grad school. I lost interest in that, but found the idea behind the storage was still worth having. The original was built in a combination of Python and C and used a custom binary format; this re-implementation in Go uses Protocol Buffers and eliminates the need for FFI.
A file in FFS is represented as a Merkle tree encoded in a content-addressable blob store. Unlike files in POSIX style filesystems, all files in FFS have the same structure, consisting of binary content, children, and metadata. In other words, every "file" is also potentially a "directory", and vice versa.
Files are encoded in storage using wire-format protocol
buffer messages as defined in
wiretype.proto
. The key messages are:
-
A
Node
is the top-level encoding of a file. The storage key for a file is the content address (storage key) of its wire-encoded node message. An emptyNode
message is a valid encoding of an empty file with no children and no metadata. -
An
Index
records the binary content of a file, if any. An index records the total size of the file along with the sizes, offsets, and storage keys of its data blocks. -
A
Child
records the name and storage key of a child of a file. Children are ordered lexicographically by name.
Binary file content is stored in discrete blocks. The block size is not fixed, but varies over a (configurable) predefined range of sizes. Block boundaries are chosen by splitting the file data with a rolling hash, similar to the technique used in rsync or LBFS, and contents are stored as raw blobs.
The blocks belonging to a particular file are recorded in one or more extents, where each extent represents an ordered, contiguous sequence of blocks. Ranges of file content that consist of all zero-valued bytes are not stored, allowing sparse files to be represented compactly.
The children of a file are themselves files. Within the node, each child is recorded as a pair comprising a non-empty string name and the storage key of another file. Each name must be unique among the children of a given file, but it is fine for multiple children to share the same storage key.
Files have no required metadata, but for convenience the node representation
includes optional Stat
and
XAttr
messages that encode typical
filesystem metadata like POSIX permissions, file type, modification timestamp,
and ownership. These fields are persisted in the encoding of a node, and thus
affect its storage key, but are not otherwise interpreted.
The ffstools repository defines command-line tools for manipulating FFS data structures.
In addition, the ffuse
repository
defines a FUSE filesystem that exposes the FFS data format.
# To install:
go install github.com/creachadair/ffuse/cmd/ffuse@latest