Skip to content
Error-bounded Lossy Data Compressor (for floating-point/integer datasets)
C Shell Makefile Fortran C++ Cuda Other
Branch: master
Clone or download
Sheng Di
Sheng Di Fix a bug (error couldn't be bounded) happening when setting point-wi…
…se relative error bound for 3D double-precision data.
Latest commit a7e8c12 Aug 9, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
NetCDFReader fix the problem of missing -lzlib in Makefile.linux of NetCDFReader May 16, 2018
adiosReader fix a potential bug in the optimization of # quantization bins: int n… Jan 29, 2018
cmake Optionally use system zstd/zlib Jul 27, 2019
doc update user guide because of updated temporal compression Jun 14, 2019
example Optionally use system zstd/zlib Jul 27, 2019
hdf5-filter/H5Z-SZ remove mpiio from Makefile in test/ because it needs to install paral… May 22, 2019
sz Fix a bug (error couldn't be bounded) happening when setting point-wi… Aug 10, 2019
test add the testing codes for ArithmeticCoding and HuffmanCoding, respect… Dec 18, 2018
zlib copy header files in the include/ directory during installation Jul 28, 2019
zstd copy header files in the include/ directory during installation Jul 28, 2019
.gitignore update .gitignore to ignore more files from CMake Dec 4, 2018
.travis.yml revise .travis.yml Oct 3, 2018
CMakeLists.txt Remove useless flags Jul 29, 2019
COPYRIGHT.txt accelerate point-wise relative error bounded compression significantl… Feb 28, 2019
Makefile.am release 2.0.2.0, which replaces Zlib/Gzip by Zstd because of its fast… Jul 21, 2018
Makefile.in accelerate point-wise relative error bounded compression significantl… Feb 28, 2019
README Fix a bug (error couldn't be bounded) happening when setting point-wi… Aug 10, 2019
aclocal.m4 further revise the setting of autoconf Sep 11, 2017
ar-lib SZ-1.4.6-beta Oct 29, 2016
config.guess further revise the setting of autoconf Sep 11, 2017
config.h.in further revise the setting of autoconf Sep 11, 2017
config.sub further revise the setting of autoconf Sep 11, 2017
configure accelerate point-wise relative error bounded compression significantl… Feb 28, 2019
configure.ac accelerate point-wise relative error bounded compression significantl… Feb 28, 2019
depcomp accelerate point-wise relative error bounded compression significantl… Feb 28, 2019
install-sh further revise the setting of autoconf Sep 11, 2017
ltmain.sh further revise the setting of autoconf Sep 11, 2017
missing accelerate point-wise relative error bounded compression significantl… Feb 28, 2019

README

--------SZ (version 2.1.5): Error-bounded Lossy Compressor for HPC Data--------
 (C) 2016 by Mathematics and Computer Science (MCS), Argonne National Laboratory.
       See COPYRIGHT in top-level directory.

*** Major Authors: Sheng Di, Dingwen Tao, Xin Liang ***
*** Supervisor: Franck Cappello ***
*** Other Contributors: Robert Underwood, Sihuan Li, Ali M. Gok, Xiangyu Zou, Tao Lu, Wen Xia, Xuan Wang, Weizhe Zhang ***

*** Citations ***
**Kindly note: If you mention SZ in your paper, please cite the following three references, because they cover different versions of SZ completely and latest versions also include the design of early versions.**

SZ 2.0+: Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, Franck Cappello, "Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets", in IEEE International Conference on Big Data (Bigdata 2018), Seattle, WA, USA, 2018.

SZ 1.4.0-1.4.13: Dingwen Tao, Sheng Di, Franck Cappello, "Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization", in IEEE International Parallel and Distributed Processing Symposium (IPDPS 2017), Orlando, Florida, USA, 2017.

SZ 0.1-1.0: Sheng Di, Franck Cappello, "Fast Error-bounded Lossy HPC Data Compression with SZ", in IEEE International Parallel and Distributed Processing Symposium (IPDPS 2016), Chicago, IL, USA, 2016.

Point-wise relative error bound mode (i.e., PW_REL): Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, Franck Cappello, "Efficient Transformation Scheme for Lossy Data Compression with Point-wise Relative Error Bound", in IEEE International Conference on Clustering Computing (CLUSTER 2018), Belfast, UK, 2018. (Best Paper)

=================================
This document simply introduces how to install and use the SZ compressor. More details can be found in doc/userguide.pdf.

===========Installation==========
---------------------------------

Installation way 1:
./configure --prefix=[INSTALL_DIR]
(Please use --enable-fortran if you need Fortran interface)
make
make install

Installation way 2:
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX:PATH=[INSTALL_DIR]
make
make install

Then, you'll find all the executables in [INSTALL_DIR]/bin and .a and .so libraries in [INSTALL_DIR]/lib

===========Testing Example============
--------------------------------------

Examples can be found in the [SZ_PACKAGE]/example

You can use the executable 'sz' command to do the compression/decompression. Please see the user guide or run 'sz --help' for details.

Alternatively, you can also also call our API to do the compression/decompressoin. Here are two examples: testfloat_compress.c and testfloat_decompress.c

===Compress===
--------------
./test_compress sz.config testdouble_8_8_8_128.dat 8 8 8 128
./test_compress sz.config testdouble_8_8_128.dat 8 8 128

Decription: 

testdouble_8_8_128.dat and testdouble_8_8_8_128.dat are two binary testing files (small-endian format), which contains a 3d array (128X8X8) and a 4d array (128X8X8X8) respectively. Their data values are shown in the two plain text files, testdouble_8_8_128.txt and testdouble_8_8_8_128.txt. These two data files are from FLASH_Blast2 and FLASH_MacLaurin respectively (both are at time step 100). The compressed data files are namely testdouble_8_8_8_128.dat.sz and testdouble_8_8_128.dat.sz respectively.

sz.config is the configuration file. The key settings are errorBoundMode, absErrBound, and relBoundRatio, which are described below.

*absErrBound refers to the absolute error bound, which is to limit the (de)compression errors to be within an absolute error. For example, absErrBound=0.0001 means the decompressed value must be in [V-0.0001,V+0.0001], where V is the original true value.

*relBoundRatio refers to relative bound ratio, which is to limit the (de)compression errors by considering the global data value range size (i.e., taking into account the range size (max_value - min_value)). For example, suppose relBoundRatio is set to 0.01, and the data set is {100,101,102,103,104,...,110}, so the global value range size is 110-100=10, so the error bound will actually be 10*0.01=0.1, from the perspective of "relBoundRatio".

*errorBoundMode is to indicate the error-bounding way in the compression, such as based on absolute error bound, relative error bound, etc. 
The options are shown below.
	ABS: take only "absolute error bound" into account. That is, relative bound ratio will be ignored.
	REL: take only "relative bound ratio" into account. That is, absolute error bound will be ignored. 
	ABS_AND_REL: take both of the two bounds into account. The compression errors will be limited using both absErrBound and relBoundRatio*rangesize. That is, the two bounds must be both met.
	ABS_OR_REL: take both of the two bounds into account. The compression errors will be limited using either absErrBound or relBoundRatio*rangesize. That is, only one bound is required to be met.
	PW_REL: take "point-wise relative error bound" in the compression. 
===Decompress===
----------------
./test_decompress sz.config testdouble_8_8_8_128.dat.sz 8 8 8 128
./test_decompress sz.config testdouble_8_8_128.dat.sz 8 8 128

The output files are testdouble_8_8_8_128.dat.sz.out and testdouble_8_8_128.dat.sz.out respectively. You can compare .txt file and .out file for checking the compression errors for each data point. For instance, compare testdouble_8_8_8_128.txt and testdouble_8_8_8_128.dat.sz.out.

===Application Programming Interface (API)===
---------------------------------------------
Programming interfaces are procides in two programming languages - C and Java: 
The interfaces are listed below. More details can be found in the user guide. 
----C Interface----

int SZ_Init();

char *SZ_compress(void *data, size_t *outSize, size_t r5, size_t r4, size_t r3, size_t r2, size_t r1);
char *SZ_compress_args(void *data, size_t *outSize, int errBoundMode, double absErrBound, double relBoundRatio, size_t r5, size_t r4, size_t r3, size_t r2, size_t r1);

double *SZ_decompress(char *bytes, size_t byteLength, size_t r5, size_t r4, size_t r3, size_t r2, size_t r1);
void SZ_Finalize();

----JAVA Interface----
----------------------
Java interfaces are provided in SZ-0.5.15, but not in SZ-1.0. If you are interested in Java interfaces, please download and install SZ-0.5.15 instead.

Class name: compression.SingleFillingCurveDecompressVariable

public static void SZ_Init(String configFilePath);

public static byte[] SZ_compress(double[] doubleData, size_t r5, size_t r4, size_t r3, size_t r2, size_t r1)

public static byte[] SZ_compress(double[] doubleData, int errBoundMode, double absErrBound, double relBoundRatio, 
			size_t r5, size_t r4, size_t r3, size_t r2, size_t r1)

public static double[] SZ_decompress(byte[] bytes, size_t r5, size_t r4, size_t r3, size_t r2, size_t r1)

===Limitation of this version===
--------------------------------
SZ is not suitable for compressing tiny datastes (such as the size <20KB)

===version history===
---------------------
version		New features
SZ 0.2-0.4	Compression ratio is the same as SZ 0.5. The key difference is different implementation ways, such that SZ 0.5 is much faster than SZ 0.2-0.4.
SZ 0.5.1	Support version checking
SZ 0.5.2	finer compression granularity for unpredictable data, and also remove redundant Java storage bytes
SZ 0.5.3 	Integrate with the dynamic segmentation support
SZ 0.5.4	Gzip_mode: defaut --> fast_mode ; Support reserved value	
SZ 0.5.5	runtime memory is shrinked (by changing int xxx to byte xxx in the codes)
		The bug that writing decompressed data may encounter exceptions is fixed.
		Memory leaking bug for ppc architecture is fixed.
SZ 0.5.6	improve compression ratio for some cases (when the values in some segementation are always the same, this segment will be merged forward)
SZ 0.5.7	improve the decompression speed for some cases
SZ 0.5.8	Refine the leading-zero granularity (change it from byte to bits based on the distribution). For example, in SZ0.5.7, the leading-zero is always in bytes, 0, 1, 2, or 3. In SZ0.5.8, the leading-zero part could be xxxx xxxx xx xx xx xx xxxx xxxx (where each x means a bit in the leading-zero part)
SZ 0.5.9	optimize the offset by using simple right-shifting method. Experiments show that this cannot improve compression ratio actually, because simple right-shifting actually make each data be multiplied by 2^ {-k}, where k is # right-shifting bits. The pros is to save bits because of more leading-zero bytes, but the cons is much more required bits to save. A good solution is SZ 0.5.10!
SZ 0.5.10	optimze the offset by using the optimized formula of computing the median_value based on optimized right-shifting method. Anyway, SZ0.5.10 improves compression ratio a lot for hard-to-compress datasets. (Hard-to-compress datasets refer to the cases whose compression ratios are usually very limited)
SZ 0.5.11	In a very few cases, SZ 0.5.10 cannot guarantee the error-bounds to a certain user-specified level. For example, when absolute error bound = 1E-6, the maximum decompression error may be 0.01(>>1E-6) because of the huge value range even in the optimized segments such that the normalized data cannot reach the required precision even stoaring all of the 64 or 32 mantissa bits. SZ 0.5.11 fixed the problem well, with degraded compression ratio less than 1%.
SZ 0.5.12 	A parameter setting called "offset" is added to the configuration file sz.config. The value of offset is an integer in [1,7]. Generally, we recommend offset=2 or 3, while we also find that some other settings (such as offset=7) may lead to better compression ratios in some cases. How to automize/optimize the selection of offset value would be the future work. In addition, the compression speed is improved, by replacing java List by array implementation in the code.
SZ 0.5.13	Compression performance is improved, by replacing some class instances in the source code by primitive data type implementation.
SZ 0.5.14 	fixed a design bug, which improves the compression ratio further.
SZ 0.5.15	improved the compression ratio for single-precision data compression, by tuning the offset.

The version 0.x were all coded in Java, and C/Fortran interfaces were provided by using JNI and C/Fortran wrapper. SZ 1.0 is coded in C purely.
SZ 1.0		Pure C version. In this version, the users don't need to install JDK and make the relative configurations any more. It provides dataEndienType in the sz.config 		file, so it can be used to compress the data file which was generated on different endian-type systems.
SZ 1.2		Improve compression: using 8 intervals (best selection between linear-curve-fitting and quadratic-curve fitting)
SZ 1.3		Significantly improved compression: using 255 intervals, with multi-dimensional data prediction (on 1D, 2D, and 3D)
SZ 1.4		Use 65536 intervals
SZ 1.4.2	Extending the number of intervals from 255 to 65536, by tailoring/reimplementing the Huffman encoding by ourselves.
SZ 1.4.3	Add the intervals_count to the configuration file (sz.config), allowing users to control it.
SZ 1.4.4	Remove segmentation step quantization_intervals
SZ 1.4.5	Optimize the number of intervals: the # intervals will be automatically optimized before the compression if quantization_itnervals is set to 0.
SZ 1.4.6-beta   Three compression modes are provided (SZ_BEST_SPEED, SZ_BEST_COMPRESSION, SZ_DEFAULT_COMPRESSION), the maximum # quantization intervals is 65536.
SZ 1.4.7-beta   Fix some mem leakage bugs. Fix the bugs about memory crash or segmentation faults when the number of data points is pretty large. Fix the sementation fault bug happening when the data size is super small for 1D array. Fix the error bound may not be guaranteed in some cases.
SZ 1.4.9-beta	Support point-wise relative error bound setting, and optional Fortran compilation.
SZ 1.4.9.1-beta Fix the bug in the fortran interface about SZ_batch_compression
SZ 1.4.9.2-beta Update the user guide by describing how to optimize the compression quality on demand.
SZ 1.4.9.3-beta Fix the sementation fault bug happening when the data size is super small for 2D array and 3D array. (Specifically, when the data size is very small while the error bound is set to very small too, the Huffman tree overhead will be relatively huge such that the compressed size may exceed the original data size, leading to segmentation fault when further compressing it by the last lossless compression step. Solution: In this case, the data will be compressed by Zlib for simplicity, with no compression errors).
SZ 1.4.10-beta (1) Support direct sub-block data compression; (2) Support compression of large data file directly (i.e., the number of data points could be up to as large as LONG size, unlike the previous version that can only compress 2^{32} data points each time): that is, int nbEle --> size_t nbEle; (3) separate the internel functions from the sz.h; 
SZ 1.4.11-beta (1) Support HDF5. (2) Support integer data compression. (3) Provide optional wavelet transform as a preprocessing step in SZ and an optional Tucker tensor decomposition.
SZ 1.4.11: (1) This is a stable version which have went through a long period test. (2) Fix a small bug (the maximum compression error may be slightly greater than error bound in some cases); (3) Support integer compression (for all types of integers); (4) Support HDF5-SZ for all types of integers; (5) Support getting the metadata from a given compressed data file (by using SZ_getMetaData) and printing the metadata (by SZ_printMetaData) with -p option of the executable command ("sz"); (6) Change libsz.a to libSZ.a in case of conflict with szip (note that szip has already used libsz.a).
SZ 1.4.12: (1) Support thresholding-based strategy for 1D data compression based on point-wise relative error bound. (In order to test it, please select errBoundMode = PW_REL, and set the point-wise relative error bound using the parameter pw_relBoundRatio in the sz.config.) For other dimensions of data, point-wise relative error based compression is using block-based strategy (see our DRBSD-2 paper for details) (2) fix the bug in the callZlib.c (previously, segmentation fault might happen when using best_compression mode). (2) Fix a small bug that happened when the data size is extremely huge (nbEle>4G) and the compression mode is SZ_BEST_COMPRSSION. Specifically, the previous call to zlib functions has one potential bug that may lead to segmentation fault, which has been fixed.
SZ 1.4.12.1: (1) Fix the bug tha segmentation fault may happen when the error bound is greater than the value range size. 
SZ 1.4.13.0: Support openMP
SZ 1.4.13.1: Integrate PaSTRI algorithm into SZ for GAMESS two-electron integral data compression.
SZ 1.4.13.2: Clean the code by putting all the global variables into two data structures (conf_params and exe_params) and change Huffman tree encoder from global state to local state in order to support thread-safe feature.
SZ 1.4.13.3: Support the time-based compression for 1D array
SZ 1.4.13.4: Support the time-based compression for 1D,2D, and 3D array (use --enable-timecmpr to switch it on). SZ 1.4.13.0/1/2/3/ has a bug when doing compression/decompression alternatively in the same progress. SZ 1.4.13.4 fixed it.
SZ 1.4.13.5: remove the pwrType in SZ_compress_args() interface, because of uselessness of pwrType setting in the new design of point-wise relative error based compression in the later version of SZ.
SZ 2.0.0.0: Significantly improve the compression quality for the high-compression cases (i.e., improve the PSNR for the cases with high compression ratios)
SZ 2.0.1.0: Further improve the compression quality for high-compression cases than 2.0.0.0. Moreover, improve point-wise relative error bounded compression.
SZ 2.0.2.0: Further improve the compression/decompression rate and also compression ratio (by 10-20%), by replacing Zlib by Zstd as default setting.
SZ 2.1: Significantly improve the compressoin speed for point-wise relative error bound (please see our paper published in MSST19 for details)
SZ 2.1.1: We accelerated compression for absolute error bound slightly. We fixed the compilation issue for OSX.
SZ 2.1.2: Fix memory a little memory leak for point-wise relative error bound Remove a useless step (optimization of # quantization bins) for point-wise relative error bound mode, such that the compression rate is further improved (3%).
SZ 2.1.4.1: Fix a few bugs which were introduced when integrating fast point-wise relative error bounded compression. The bugs could cause unbounded decompression and slow speed, especially for 1D datasets. CMakeLists are also revised to enable SZ to be compiled on Cray system such as Cori.
SZ 2.1.4.2: Revise a bug in 2.1.4.2: The bug happened only when the data size is too small (smaller than a datablock such as 6x6x6) for 2D and 3D cases.
SZ 2.1.5: Fix some bugs in temporal compression and support random-access variable selection for compression/decompression. Fix compilation bug for Fortran version. 
SZ 2.1.6: Fix a bug (error couldn't be bounded) happening when setting point-wise relative error bound for 3D double-precision data.
You can’t perform that action at this time.