Latest commit 72230b4
Jun 8, 2013
|Failed to load latest commit information.|
Bloom^ Filesystem --------------------------------------------------------------------------- BFS implements a chunked, distributed filesystem (mostly) in the Bloom language. BFS is architecturally based on BOOMFS , which is itself based on the Google Filesystem (GFS) . As in GFS, a single master node manages filesystem metadata, while data blocks are replicated and stored on a large number of storage nodes. Writing or reading data involves a multi-step protocol in which clients interact with the master, retrieving metadata and possibly changing state, then interact with storage nodes to read or write chunks. Background jobs running on the master will contact storage nodes to orchestrate chunk migrations, during which storage nodes communicate with other storage nodes. As in BFS, the communication protocols and the data channel used for communication between clients and datanodes and between datanodes is written outside Bloom (in Ruby). BFS is structured as follows: FSMaster implements the basic filesystem metadata operations defined in FSProtocol. FSMaster uses the KVSProtocol, storing FS metadata in a flat namespace. As GFS is a central master system, we choose BasicKVS from the implementations of KVSProtocol that the library provides, though using one of the varieties of replicated KVS instead is a simple change. ChunkedFSMaster extends FSMaster to add operations specific to chunked data storage: specifically, the ability to generate new chunk identifiers, to list the chunks associated with a file and to list the storage nodes in possession of a chunk. It relies on cached chunk metadata that is managed by HBMaster. HBMaster collects heartbeat messages from storage nodes and maintains summary information about chunk distribution and node state. BFSDatanode represents the storage node logic. It is currently fairly trivial: it polls the data directory for copies of chunks, and uses the HeartbeatProtocol to communicate its list of local chunks to the master. BFSDataProtocol implements the data transfer protocol run on clients and storage nodes, and is written strictly in Ruby. BFSClient is a hybrid of Bloom and Ruby code. It provides a filesystem-like API to calling ruby code, and manages the (rather complex) client side of the distributed filesystem. Communication with the master is performed by asynchronous interaction with a local bud instance, which interacts with the master over a Bud channel. Communication with storage nodes uses the BFSDataProtocol. BFSMaster glues together the ChunkedFSMaster API with the BFSClient API.  Alvaro, P., Condie, T., Conway, N., Elmeleegy, K., Hellerstein, J. M., & Sears, R. (2010). BOOM Analytics: exploring data-centric, declarative programming for the cloud. EuroSys.  Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google File System. SOSP.