No description, website, or topics provided.
C++ Makefile
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples
Makefile
README.md
base.h
categorical_model.cpp
categorical_model.h
categorical_model_test.cpp
compression.cpp
compression.h
compression_test.cpp
data_io.cpp
data_io.h
data_io_test.cpp
decompression.cpp
decompression.h
decompression_test.cpp
model.cpp
model.h
model_learner.cpp
model_learner.h
model_learner_test.cpp
model_test.cpp
numerical_model.cpp
numerical_model.h
numerical_model_test.cpp
string_model.cpp
string_model.h
string_model_test.cpp
test_run.cpp
unit_test.h
utility.cpp
utility.h
utility_test.cpp

README.md

SQUISH: Compression for Archival and Distribution of Structured Datasets

This repository contains code for near-optimal compression of relational datasets, using a combination of bayesian networks and arithmetic coding.

Details are provided in the paper "Squish: Near-Optimal Compression for Archival of Relational Datasets", available at this link: http://arxiv.org/abs/1602.04256

The project is configured as a library, which can be used to create compression program for any relational file format.

An example program using SQUISH can be found in examples/sample.cpp, which compresses csv-style file:

#Compress:
./sample -c covtype.data covtype.compressed covtype.config
#Decompress:
./sample -d covtype.compressed covtype.recovered covtype.config

SQUISH allows user to define new data types and create associated SquID such that they can be compressed using SQUISH. The interface of SquID can be found in model.h. The SquIDModel class allows more flexible SquID creation. It is optional in the sense that most functions can simply return 0 or do nothing.

In SQUISH all primitive data types are implemented using SquID: categorical_model.h/cpp; numerical_model.h/cpp; string_model.h/cpp. A simpler SquID example built upon numerical SquID can be found in examples/corel.h.