README

MRIS: Multi-Resultion Image Store
CSE602 Course Project

In many scenarios, the access patterns and the sizes of structured objects
present a property that matches the property of the storage hierarchy. For
example, in an image storage system, the attributes and thumbnails of stored
images are small in size but frequently accessed, which makes them fit well in
small but fast and expensive top-tiered storage devices, such as NVRAM and SSD.
Whereas the images themselves are big but less accessed, which makes them
suitable for large but slow and inexpensive bottom-tiered storage devices, such
as HDD and tape. Moreover, slow seeks of bottom-tiered devices can be amortized
by fast sequential access followed.

For small objects, throughput is important in term of op/sec. They tend to be
accessed randomly because of their small sizes and the implication that they
are likely to be metadata and attributes. Top-tiered devices, e.g., NVRAM,
exhibit great IOPS performance, and they allow storage to be used in finer
granularity which causes less inner fragmentation as well. But for large
objects, throughput is more important in terms of mb/s. Their I/Os tend to be
more sequential as well. Bottom-tiered devices, e.g., HDD, are large in
capacity and exhibit satisfactory throughput for sequential I/Os. 

This project tries to prove the idea that when objects present a size-tiered
property, a corresponding size-tiered object store can provide good trade-off
between cost and performance as it gets the best from different tiers of the
storage hierarchy. Essentially, we are considering the object size as one more
cost in the design of storage cache (NVRAM). Some in-memory cache already do
that, for example, the "cache charge" in LevelDB cache. However, most storage
systems do not because of the traditional view of block devices as blocks of
fixed size.

This project is a follow-up work of my labmate's work publised in "An Efficient
Multi-Tier Tablet Server Storage Architecture". The idea is also supported by
some recent publications, for example, Facebook's Sigmetrics'12 paper "Workload
analysis of a large-scale key-value store". The project is based on KV
databases include LevelDB from Google and KVDB from FSL Stony Brook.