I enjoyed reading the BigTable paper so much I decided to go ahead and attempt to implement some of the ideas in it.
Concepts worked on:
-
Tablet
- ❌ Scan/filter entire tablet
- ❌ Scan/filter entire row
- ✅ Timestamped values
- ✅ Read/Write to memtable
- 💡 Commit log
- Need to figure out how this is stored and how to checkpoint it....
- ✅ Read/Flush to SSTable
- ✅ Amazon S3 supported
- ✅ Local filesystem
- ✅ Tablet compaction
- ❌ Tablet split
-
SSTable
- ✅ Blocks compressed with GZIP
- ✅ Footer compressed with GZIP
- ✅ Configurable block size
- ✅ Configurable compression (GZIP, SNAPPY, Uncompressed supported)
- ✅ Storage agnostic
-
TabletServer
- ✅ Each tablet responsible for a row range
- ❌ column family locality
- in progress
Blocks are of fixed length and block size is stored in the header. All blocks are compressed using a defined compression algorithm. Footer is compressed using a defined compression algorithm.
Reader will
- Read header
- Read footer
Blocks are read when a value is requested, and cached if appropriate.