SQUISH: Compression for Archival and Distribution of Structured Datasets

This repository contains code for near-optimal compression of relational datasets, using a combination of bayesian networks and arithmetic coding.

Details are provided in the paper "Squish: Near-Optimal Compression for Archival of Relational Datasets", available at this link: http://arxiv.org/abs/1602.04256

The project is configured as a library, which can be used to create compression program for any relational file format.

An example program using SQUISH can be found in examples/sample.cpp, which compresses csv-style file:

#Compress:
./sample -c covtype.data covtype.compressed covtype.config
#Decompress:
./sample -d covtype.compressed covtype.recovered covtype.config

SQUISH allows user to define new data types and create associated SquID such that they can be compressed using SQUISH. The interface of SquID can be found in model.h. The SquIDModel class allows more flexible SquID creation. It is optional in the sense that most functions can simply return 0 or do nothing.

In SQUISH all primitive data types are implemented using SquID: categorical_model.h/cpp; numerical_model.h/cpp; string_model.h/cpp. A simpler SquID example built upon numerical SquID can be found in examples/corel.h.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
examples		examples
Makefile		Makefile
README.md		README.md
base.h		base.h
categorical_model.cpp		categorical_model.cpp
categorical_model.h		categorical_model.h
categorical_model_test.cpp		categorical_model_test.cpp
compression.cpp		compression.cpp
compression.h		compression.h
compression_test.cpp		compression_test.cpp
data_io.cpp		data_io.cpp
data_io.h		data_io.h
data_io_test.cpp		data_io_test.cpp
decompression.cpp		decompression.cpp
decompression.h		decompression.h
decompression_test.cpp		decompression_test.cpp
model.cpp		model.cpp
model.h		model.h
model_learner.cpp		model_learner.cpp
model_learner.h		model_learner.h
model_learner_test.cpp		model_learner_test.cpp
model_test.cpp		model_test.cpp
numerical_model.cpp		numerical_model.cpp
numerical_model.h		numerical_model.h
numerical_model_test.cpp		numerical_model_test.cpp
string_model.cpp		string_model.cpp
string_model.h		string_model.h
string_model_test.cpp		string_model_test.cpp
test_run.cpp		test_run.cpp
unit_test.h		unit_test.h
utility.cpp		utility.cpp
utility.h		utility.h
utility_test.cpp		utility_test.cpp

Preparation-Publication-BD2K/db_compress

Folders and files

Latest commit

History

Repository files navigation

SQUISH: Compression for Archival and Distribution of Structured Datasets

About

Resources

Stars

Watchers

Forks

Languages