Skip to content

Future Work

Chiyoung Seo edited this page Feb 25, 2016 · 8 revisions

We plan to work on various new features and add them to ForestDB in the upcoming releases. The following list describes some of those new features briefly:

  • Solid-State Drive (SSD) optimization

    • Leverage async IO library (e.g., libaio) to exploit the parallel IO capabilities provided by SSD drives
      • This would be quite useful in querying secondary indexes when items satisfying a query predicate are located in multiple blocks.
    • Volume manager inside ForestDB
      • The volume manager runs on unformatted SSD drives and maintains the list of blocks used by each database file and garbage-collects invalid blocks to the free block list.
      • This allows us to bypass the entire OS file system stack and perform the raw block I/O operations on the SSD drives for better I/O performance with ForestDB's buffer cache.
    • Lightweight and I/O efficient compaction
      • If ForestDB manages the list of blocks used by each database file through its own volume manager (as explained above), the compaction can be simply performed by mapping the valid blocks of the old file to a new file and moving stale blocks of the old file to the free block list.
  • Reduce feature

    • The "reduce" operation (part of the map/reduce algorithm) aggregates multiple documents into a single value, such as an average or standard deviation. Some databases, such as Apache CouchDB, store intermediate reduced values in interior b-tree nodes, which greatly optimizes reduce operations over arbitrary key ranges.
    • There are many use cases where this reduce feature would be quite useful and provide scalable performance for reduced queries. For this, we need to extend the ForestDB's HB+-Trie, so that each B+ tree can store intermediate reduce values internally and the root node in the top B+ tree in HB+-Trie can have a final reduce value.
  • Pause/resume of compaction

    • When there are a lot of documents (e.g., billions of documents) in a ForestDB instance, a compaction task can take very long time; from tens of minutes to several hours. If ForestDB shuts down during the compaction, then all new documents should be moved from the new (partially compacted) file to the old file upon the next open operation, and we have to start over the compaction again. This can be significant unnecessary overhead. If ForestDB supports pause/resume of compaction, then this unnecessary overhead can be easily avoided.
  • N1QL query language support

    • N1QL is a new query language for JSON document databases recently proposed by Couchbase's query language team. We plan to implement the lightweight query processor on top of ForestDB to process a query expressed in N1QL.
Clone this wiki locally