Skip to content

NoSQL systems design research project built upon Facebook's existing RocksDB. Funded by NSERC URSA.

License

GPL-2.0 and 2 other licenses found

Licenses found

GPL-2.0
COPYING
Apache-2.0
LICENSE.Apache
Unknown
LICENSE.leveldb
Notifications You must be signed in to change notification settings

BriannHu/NSERC_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

NoSQL Systems Design for Real-Time Data Analytics

This research project is generously supported by the NSERC USRA, and supervised by Dr. Oana Balmau. It is built upon existing work on RocksDB developed by Facebook.

Motivation

In our modern era of technology, the goal of most (if not all) online platforms is to provide a customized experience to their users, by suggesting relevant items to adjust to their preferences. These recommendation systems ingest enormous amounts of information at a very high rate, and require a datastore that can simultaneously provide:

  1. High write throughput to ingest the incoming user events.
  2. High read throughput and low read latency to efficiently merge new information with the old data.
  3. Persistent storage, since the amount of incoming data is large and is growing continuously.

It is achieving all three objectives simultaneously that is a challenge.

Project Description

To address the challenge of designing an optimal recommendation system that satisfies the three requirements (high write throughput, high read throughput, persistent storage), this project will build upon Facebook's existing RocksDB, which uses log-structured merge trees. LSM trees are a popular data structure for write-optimized, persistent NoSQL datastores, but are not optimized for reads and provide little support for complex data structures like those in recommendation systems.

This project aims to augment the LSM tree that RocksDB uses, to support real-time recommendations on fresh data, and meet the three requirements of an effective modern recommendation system.

Phase 1 - Implementing and Running a Representative Benchmark

To simulate a recommendation system for a social media platform, there are four "maps" (distinguished using different column families in RocksDB), which represent the different key-value pairs that store the data generated during a benchmark:

Map Key Value*
1) Picture Annotations ImageID Set[Annotation]
2) Picture User Time Series UserID Map{ImageID → Map{ActionID → (List[TimeStamp], Counter)}}
3) User Annotation Time Series UserID Map{AnnotationID → Map{ActionID → (List[TimeStamp], Counter)}}
4) User Annotation Scores (UserID, AnnotationID) Map{ActionID → Score}

*In Phase 1, these maps are represented using RocksDB's Slice structure.

The data of interest collected from the benchmark is:

  • Average throughput of using all maps
  • Average read/write latency of all maps
  • 50th and 99th percentile read/write latency of all maps

The following is an example figure generated using the benchmark:

Map2 99th Percentile Figure

Phase 2 - Modifying the LSM Tree used by RocksDB

To augment the underlying LSM tree, RocksDB needs to be modified to support data structures other than just the Slice. The benchmark will then be re-run using these new data structures to compare the read/write performance against that of using a Slice.

Currently in progress!

Relevant Files

The following table highlights the files that have been modified/created for this project and their purpose.

File Status Description
tools/db_bench_tool.cc Modified Implemented custom benchmark to simulate social media workload.
util/json_serializer.cc Created* Created functions to stringify all four custom maps.
util/zipf.cc** Created* Created functions to generate random number between specified range using Zipfian distribution.
CMakeList.txt + src.mk Modified Added created files to enable Makefile to target.

* Corresponding header files have also been created for these files.
** Original author: Oana Balmau (link to code).

About

NoSQL systems design research project built upon Facebook's existing RocksDB. Funded by NSERC URSA.

Topics

Resources

License

GPL-2.0 and 2 other licenses found

Licenses found

GPL-2.0
COPYING
Apache-2.0
LICENSE.Apache
Unknown
LICENSE.leveldb

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published