-
Notifications
You must be signed in to change notification settings - Fork 2
fun projects
This is a list of fun project ideas that no one is currently working on.
Storage systems have become much more specialized in recent years with each system providing expertise in certain areas—Hadoop and proprietary data warehouses provide batch processing capabilities, Search indexes provide support for complex ranked text queries, and a variety of distributed databases have sprung up. Voldemort is a specialized key-value system, but the same data stored in Voldemort may need to be indexed by search, churned over in hadoop, or otherwise processed by another system. Each of these systems needs the ability to subscribe to the changes happening in Voldemort and get a stream of such changes that they can process in their own specialized way.
Indeed even voldemort nodes could subscribe to one another as a quick catch-up mechanism for recovering from failure.
Amazon has implemented this functionality as a “Merkle tree” data structure in their Dynamo system which allows nodes to compare their contents quickly and catch up to differences they have missed, but this is not the only approach. It could be a simple secondary index that implements a node-specific logical counter that tracks modification number for each key.
The api that would be provided would be something like getAllChangesSince(int changeNumber), and this api would provide the latest change for each key.
There is a protocol buffers network protocol for accessing the voldemort server. This goal of this project would be to create a python, c++, or other protocol buffers client to provided an excellent interface to the system that models the guarantees the system provides in the best possible way in the implementation language.
A minimal implementation must allow the client to provide the ability to deal with conflicting results and deal with server failure (by reconnecting to another node).
The network protocol is pluggable so a slightly more difficult implementation could add both a network protocol and a client (say in a language not well supported by protocol buffers).
One of the primary problems for a practical distributed system is knowing the state of the system. Voldemort has a rudimentary GUI that provides basic information. This project would be to make a first rate management GUI and corresponding control functionality to be able to know the performance and availability of each node in the system as well as perform basic operations such as starting and stopping nodes (or the whole cluster), performing queries, etc.
Part of this project would be providing remote access to the administrative functionality that the GUI can invoke. Some of the basic administrative functionality could be shared with the Scala shell project.
Voldemort comes with a very simple text shell. A better way to build such a thing is to fully integrate a language with an interpreter and provide a set of predefined administrative commands as functions in the shell. Scala has a flexible syntax and integrates easily with Java so it would be a good choice for such a shell.
Part of this project would be providing the administrative commands that the shell could invoke. Some of the basic administrative functionality could be shared with the Operational Interface project.