Skip to content

mv sst rewrite

Matthew Von-Maszewski edited this page Jul 12, 2016 · 4 revisions

Status

  • merged to master - July 25, 2015
  • code complete - July 16, 2015
  • development started - June 24, 2015

History / Context

Riak is used in commercial environments. It requires that new features have a method both migrate to a newer version and to revert to a previous version. This branch contains the skeleton for a new tool that is intended to allow migration and reversion testing. It is not complete. It is a skeleton to get testing, measure, and other development rolling. There will be future branches that add features, such as selecting target compression mode.

As part of its research nature, this branch contains an initial implementation of LZ4 compression.

Branch Description

The key source file for the sst_rewrite tool is tools/sst_rewrite.cc. Its structure is based upon the previous sst_scan.cc utility. The key coding difference is time. sst_scan.cc was created early in Basho's use of leveldb. sst_scan.cc takes many unnecessary steps to setup read access to a single leveldb .sst table file. sst_rewrite.cc accesses the files in a much simpler fashion. Yes, sst_rewrite's method could and should be back ported to sst_scan.

sst_rewrite design goal is to have a series of command line parameters that change the global leveldb::Options structure. New options in the future, e.g. -b for block_size and/or -z for compression method selection, would allow easy recreation of entire .sst datasets using new internal architectures. The include -c option allows comparison of files from two different architectures to validate data integrity.

tools/sst_rewrite.cc

This is the basic tool file. It is new to this branch. The code loops over the command line arguments and executes against each argument as found. This methodology allows for a user to create multiple variation on a given file once future editions of the code support more commands.

include/leveldb/options.h

Previous Basho branches defined a kNoCompressionAutomated constant to indicate when compression was disabled internally, not user directed. This branch adds the kLZ4Compression compression constant to mark blocks using LZ4 instead of the Google default kSnappyCompression.

table/format.cc

ReadBlock() decodes all blocks read from disk. Its job is to first validate the CRC of a block, then potentially decompress it based upon the block's type code. This branch adds the decode case for kLZ4Compression blocks.

table/table_builder.cc

TableBuilder::WriteBlock() encodes all blocks written to disk. This branch adds the encode case for kLZ4Compression.

util/lz4.c / util/lz4.h

BSD copyrighted code from https://github.com/Cyan4973/lz4. Reuse within Basho reviewed and approved by Basho's legal department.

Clone this wiki locally