Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Bloom^ Filesystem

BFS implements a chunked, distributed filesystem (mostly) in the Bloom
language.  BFS is architecturally based on BOOMFS [1], which is itself based
on the Google Filesystem (GFS) [2].  As in GFS, a single master node manages
filesystem metadata, while data blocks are replicated and stored on a large
number of storage nodes.  Writing or reading data involves a multi-step
protocol in which clients interact with the master, retrieving metadata and
possibly changing state, then interact with storage nodes to read or write
chunks.  Background jobs running on the master will contact storage nodes to
orchestrate chunk migrations, during which storage nodes communicate with
other storage nodes.  As in BFS, the communication protocols and the data
channel used for communication between clients and datanodes and between
datanodes is written outside Bloom (in Ruby).

BFS is structured as follows:

FSMaster implements the basic filesystem metadata operations defined in
FSProtocol.  FSMaster uses the KVSProtocol, storing FS metadata in a flat
namespace.  As GFS is a central master system, we choose BasicKVS from the
implementations of KVSProtocol that the library provides, though using
one of the varieties of replicated KVS instead is a simple change.

ChunkedFSMaster extends FSMaster to add operations specific to chunked data
storage: specifically, the ability to generate new chunk identifiers, to
list the chunks associated with a file and to list the storage nodes in
possession of a chunk.  It relies on cached chunk metadata that is managed by

HBMaster collects heartbeat messages from storage nodes and maintains
summary information about chunk distribution and node state.

BFSDatanode represents the storage node logic.  It is currently fairly
trivial: it polls the data directory for copies of chunks, and uses the
HeartbeatProtocol to communicate its list of local chunks to the master.

BFSDataProtocol implements the data transfer protocol run on clients and
storage nodes, and is written strictly in Ruby.

BFSClient is a hybrid of Bloom and Ruby code.  It provides a filesystem-like
API to calling ruby code, and manages the (rather complex) client side of the
distributed filesystem.  Communication with the master is performed by
asynchronous interaction with a local bud instance, which interacts with the
master over a Bud channel.  Communication with storage nodes uses the

BFSMaster glues together the ChunkedFSMaster API with the BFSClient API.

[1] Alvaro, P., Condie, T., Conway, N., Elmeleegy, K., Hellerstein, J. M., &
Sears, R. (2010). BOOM Analytics: exploring data-centric, declarative
programming for the cloud. EuroSys.

[2] Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google File
System. SOSP.