Developed @ ISTI CNR - HPC Lab (Pisa)
Full paper here
TSXor, a simple yet effective encoder/decoder for time series that achieves high compression ratios and fast decompression speed. TSXor exploits the redundancy/similarity between close-in-time values through a window that acts as a cache.
TSXor compares each vn value with its preceding W ≤ 127 values, logically corresponding to the values seen in the time range [tn−W , tn−1]. The goal is to compress vn relative to this “window” containing the previous W values. We distinguish between 3 cases, namely Reference, XOR, and Exception.
If vn is equal to a value in the window, just output its position p in the window. Since the window contains at most 127 values, 1 byte suffices to write the position with the most significant bit always equal to 0.
If the window does not contain vn, then we search for the value u in the window such that x = vn ⨁ u has the largest number of leading (LZ) and trailing (TZ) zeros bytes. Let p be the position of u in the window. We first write p + 128 using 1 byte. In this case the most significant bit will always be 1 because of sum, which allows us to distinguish this case from the Reference case.
If LZ +TZ ≥ 2, we output a byte where 4 bits are dedicated to TZ and the other 4 bits to the length (in bytes) of the segment of x between the leading and trailing zero bytes. We then write such middle bytes.
We output an exception code, i.e., the value 255 using 1 byte, followed by the plain double-precision representation of vn using 8 bytes.
The code has been tested both on Linux and MacOS.
No dependencies are needed.
Just clone this repo and execute:
make all
The algorithm can process any .csv
file containing numbers only.
You need first to convert the .csv
into a .bin
file using the csv_to_bin
utility as follows:
cd util
./csv_to_bin.o path/to/MY_DATASET.csv
Please note: the first column will be interpreted as the timestamp, the rest will be interpreted as values.
To run a compression test of a .bin
file, execute the following commands:
cd test
./compression.o path/to/MY_DATASET.bin
This will produce a file called compressed_data.tsx
To decompress the file compressed_data.tsx
, run the command:
./decompression.o
The following tables show the comparison between TSXor with respect to Gorilla by Facebook and FPC by Burtscher and Ratanaworabhan. The experiments were run on an Ubuntu 18.04 machine with Intel i7-7700 CPU @ 3.60GHz.
FPC | Gorilla | TSXor | |
---|---|---|---|
AMPds2 | 339,28 | 703,72 | 66,59 |
Bar Crawl | 423,71 | 466,49 | 28,74 |
Max-Planck | 313,40 | 870,58 | 51,74 |
Kinect | 166,28 | 696,10 | 17,14 |
Oxford-Man | 170,27 | 630,33 | 15,43 |
PAMAP | 181,59 | 521,41 | 45,05 |
UCI Gas Sensor Array | 286,94 | 654,32 | 21,93 |
FPC | Gorilla | TSXor | |
---|---|---|---|
AMPds2 | 411,29 | 666,52 | 1173,65 |
Bar Crawl | 436,12 | 447,42 | 709,68 |
Max-Planck | 355,30 | 858,68 | 1057,00 |
Kinect | 287,18 | 635,74 | 665,47 |
Oxford-Man | 221,80 | 573,67 | 604,54 |
PAMAP | 223,86 | 487,41 | 949,28 |
UCI Gas Sensor Array | 454,91 | 578,41 | 642,40 |
FPC | Gorilla | TSXor | |
---|---|---|---|
AMPds2 | 1,10x | 2,03x | 6,39x |
Bar Crawl | 1,20x | 1,44x | 2,36x |
Max-Planck | 1,06x | 2,97x | 4,84x |
Kinect | 1,09x | 1,41x | 1,37x |
Oxford-Man | 1,06x | 1,28x | 1,30x |
PAMAP | 1,01x | 1,38x | 4,85x |
UCI Gas Sensor Array | 1,19x | 1,23x | 3,50x |
This is a beta version. Use it at your own risk.