From d81e7f0e8083808741906735997d343ca4623cc0 Mon Sep 17 00:00:00 2001 From: Andreas Unterweger Date: Mon, 1 Apr 2019 10:39:37 +0200 Subject: [PATCH 1/6] Converted readme to MarkDown --- DataCompressor/common/doc/overview.txt | 12 ------------ DataCompressor/common/doc/readme.md | 17 +++++++++++++++++ 2 files changed, 17 insertions(+), 12 deletions(-) delete mode 100644 DataCompressor/common/doc/overview.txt create mode 100644 DataCompressor/common/doc/readme.md diff --git a/DataCompressor/common/doc/overview.txt b/DataCompressor/common/doc/overview.txt deleted file mode 100644 index 2888dd7..0000000 --- a/DataCompressor/common/doc/overview.txt +++ /dev/null @@ -1,12 +0,0 @@ -common is a collection of headers for other libraries - -log.h: Provides a printf-style logging macro - -io.h: Provides types for file I/O as well as ftell and fopen macros (for 64-bit file I/O on platform supports it) - -err_codes.h: Provides constants for common errors - -Notes on usage: -* IO_SIZE_BITS specifies the number of bits used for file-I/O-related operations. In particular, the size of return values for Read/Write functions in dependent libraries are based on it -* If IO_SIZE_BITS is the same size as size_t, the Read/Write functions in dependent libraries do not work properly if the MSB of a size_t variable specifying the size to be read/written is used. For example, if IO_SIZE_BITS is 32 and sizeof(size_t) is 4, the maximum size (parameter value) that the Read/Write function can work with is 2^31 - 1, i.e., the 32nd bit cannot be used. If it is used, the return value of the functions will be interpreted as an error (since it is interpreted as a negative number) -* Error codes have to be negative in order to distinguish them from return values which signal the amount of bytes read/written (which is positive) diff --git a/DataCompressor/common/doc/readme.md b/DataCompressor/common/doc/readme.md new file mode 100644 index 0000000..6ad1b69 --- /dev/null +++ b/DataCompressor/common/doc/readme.md @@ -0,0 +1,17 @@ +Overview +--- + +common is a collection of headers for other libraries + +`log.h`: Provides a printf-style logging macro + +`io.h`: Provides types for file I/O as well as ftell and fopen macros (for 64-bit file I/O on platform supports it) + +`err_codes.h`: Provides constants for common errors + +Notes on usage +--- + +* `IO_SIZE_BITS` specifies the number of bits used for file-I/O-related operations. In particular, the size of return values for Read/Write functions in dependent libraries are based on it +* If `IO_SIZE_BITS` is the same size as size_t, the Read/Write functions in dependent libraries do not work properly if the MSB of a size_t variable specifying the size to be read/written is used. For example, if `IO_SIZE_BITS` is 32 and `sizeof(size_t)` is 4, the maximum size (parameter value) that the Read/Write function can work with is `2^31 - 1`, i.e., the 32nd bit cannot be used. If it is used, the return value of the functions will be interpreted as an error (since it is interpreted as a negative number) +* Error codes have to be negative in order to distinguish them from return values which signal the amount of bytes read/written (which is positive) From aaca71fd84a9e8ab3c1dc12919fdeb1df740a6c9 Mon Sep 17 00:00:00 2001 From: Andreas Unterweger Date: Mon, 1 Apr 2019 10:42:40 +0200 Subject: [PATCH 2/6] Convert DCIOLib documentation to Markdown --- DataCompressor/DCIOLib/doc/overview.txt | 12 ------------ DataCompressor/DCIOLib/doc/readme.md | 17 +++++++++++++++++ 2 files changed, 17 insertions(+), 12 deletions(-) delete mode 100644 DataCompressor/DCIOLib/doc/overview.txt create mode 100644 DataCompressor/DCIOLib/doc/readme.md diff --git a/DataCompressor/DCIOLib/doc/overview.txt b/DataCompressor/DCIOLib/doc/overview.txt deleted file mode 100644 index a020f1e..0000000 --- a/DataCompressor/DCIOLib/doc/overview.txt +++ /dev/null @@ -1,12 +0,0 @@ -DCIOLib is a library which allows performing bit-wise reading and writing operations on buffers which are linked to either files or memory. - -buffer.h (buffer_t): A buffer implementation for byte-wise reading and writing operations. It can be resized, if necessary, while retaining the old data. Life cycle: AllocateBuffer -> InitBuffer -> (read, write or other operations) -> UninitBuffer -> FreeBuffer - -file_buffer.h (file_buffer_t): A buffer implementation for byte-wise reading and writing operations on files. It wraps a buffer_t and can thus also be used to read or write in memory. It is possible to switch between reading and writing. Life cycle: AllocateFileBuffer -> InitFileBuffer with an opened file or InitFileBufferInMemory -> (read, write or other operations) -> UninitFileBuffer -> FreeFileBuffer - -bit_file_buffer.h (bit_file_buffer_t): A buffer implementation for bit-wise reading and writing on files or in memory. It provides single-bit and constant-bit-size read/write access. It uses uses a file_buffer_t which needs to be initialized and uninitialized separately. It is possible to switch from writing to reading; the opposite way is not supported. Life cycle: AllocateBitFileBuffer -> InitBitFileBuffer with an initialized file_buffer_t instance -> (read, write or other operations) -> UninitBitFileBuffer -> FreeBitFileBuffer - -Notes on usage: -* Although bit_file_buffer_t cannot be changed from reading mode back to writing mode, it is possible to reset the buffer which discards buffered data -* When file_buffer_t is used to write to memory, the underlying buffer will be automatically resized when it is too small -* file_buffer_t and bit_file_buffer_t flush contents automatically when they are uninitialized. To do so before uninitializing, an explicit flush operation is required \ No newline at end of file diff --git a/DataCompressor/DCIOLib/doc/readme.md b/DataCompressor/DCIOLib/doc/readme.md new file mode 100644 index 0000000..7a9db66 --- /dev/null +++ b/DataCompressor/DCIOLib/doc/readme.md @@ -0,0 +1,17 @@ +Overview +--- + +DCIOLib is a library which allows performing bit-wise reading and writing operations on buffers which are linked to either files or memory. + +`buffer.h` (`buffer_t`): A buffer implementation for byte-wise reading and writing operations. It can be resized, if necessary, while retaining the old data. Life cycle: `AllocateBuffer` -> `InitBuffer` -> (read, write or other operations) -> `UninitBuffer` -> `FreeBuffer` + +`file_buffer.h` (`file_buffer_t`): A buffer implementation for byte-wise reading and writing operations on files. It wraps a `buffer_t` and can thus also be used to read or write in memory. It is possible to switch between reading and writing. Life cycle: `AllocateFileBuffer` -> `InitFileBuffer` with an opened file or `InitFileBufferInMemory` -> (read, write or other operations) -> `UninitFileBuffer` -> `FreeFileBuffer` + +`bit_file_buffer.h` (`bit_file_buffer_t`): A buffer implementation for bit-wise reading and writing on files or in memory. It provides single-bit and constant-bit-size read/write access. It uses uses a `file_buffer_t` which needs to be initialized and uninitialized separately. It is possible to switch from writing to reading; the opposite way is not supported. Life cycle: `AllocateBitFileBuffer` -> `InitBitFileBuffer` with an initialized `file_buffer_t` instance -> (read, write or other operations) -> `UninitBitFileBuffer` -> `FreeBitFileBuffer` + +Notes on usage +--- + +* Although `bit_file_buffer_t` cannot be changed from reading mode back to writing mode, it is possible to reset the buffer, which discards buffered data +* When `file_buffer_t` is used to write to memory, the underlying buffer will be automatically resized when it is too small +* `file_buffer_t` and `bit_file_buffer_t` flush contents automatically when they are uninitialized. To do so before uninitializing, an explicit flush operation is required From aa10435c1ae8f3921af6510ad07e6ce5021e9c13 Mon Sep 17 00:00:00 2001 From: Andreas Unterweger Date: Mon, 1 Apr 2019 10:47:42 +0200 Subject: [PATCH 3/6] Convert DCLib documentation to Markdown --- DataCompressor/DCLib/doc/overview.txt | 27 ------------------- DataCompressor/DCLib/doc/readme.md | 38 +++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 27 deletions(-) delete mode 100644 DataCompressor/DCLib/doc/overview.txt create mode 100644 DataCompressor/DCLib/doc/readme.md diff --git a/DataCompressor/DCLib/doc/overview.txt b/DataCompressor/DCLib/doc/overview.txt deleted file mode 100644 index 47bc12e..0000000 --- a/DataCompressor/DCLib/doc/overview.txt +++ /dev/null @@ -1,27 +0,0 @@ -DCLib is a library which allows compressing and decompressing (referred to as encoding and decoding henceforth) data from buffers (see DCIOLib). - -enc_dec.h: Allows listing and using all implemented encoders/decoders as well as their options. Life cycle: GetEncoder -> (optional option configuration, see below) -> enc_dec_t.encoder (for encoding) or enc_dec_t.decoder (for decoding) call on initialized input and output bit buffers. Optional option configuration: (optional) OptionNameExists -> (optional) EncoderSupportsOption -> GetOptionType -> GetAllowedOptionValueRange -> SetOptionValue - -Encoders/decoders: -* aggregate: Sums of 'num_values' (option name) consecutive floating-point values (no decoder!) -* bac: Performs binary arithmetic coding as implemented by Witten et al. -* copy: Copies the input to the output, i.e., it performs no compression whatsoever. This encoder/decoder operates on blocks of 'blocksize' (option name) bits size -* csv: Reads lines of comma-separated values and converts the strings in column number 'column' (option name) of each line to a list of (binary) floating-point values when encoding; performs the reverse conversion when decoding and inserts blank columns if necessary -* diff: Encodes (signed) differences between consecutive (unsigned) values of 'valuesize' (option name) bits size when encoding; reconstructs (unsigned) values of 'valuesize' (option name) bits size from their consecutive (signed) differences when decoding -* lzmh: Performs LZMH coding and decoding from Ringwelski et al. This is an integrated third-party implementation -* normalize: Converts floating-point values to (signed) integer values of 'valuesize' (option name) bits size when encoding; performs the reverse conversion when decoding. To preserve decimal places after the decimal point, all values are multiplied by 'normalization_factor' (option name) when encoding, and divided when decoding -* seg: Creates Exponential Golomb code words from values when encoding; reconstructs Exponential Golomb code words when decoding. All values are 'valuesize' (option name) bits in size and signed - -Supported encoder input and output formats (decoder input and formats are reversed, if there is a decoder): -* aggregate: binary float in, binary float out -* bac: arbitrary in, binary out -* copy: arbitrary in, arbitrary out -* csv: ASCII float in, binary float out -* diff: unsigned int in, signed int out -* lzmh: ASCII float in, binary out -* normalize: float in, signed int out -* seg: signed int in, binary out - -Notes on usage: -* GetEncoderNames requires a char* array with GetNumberOfEncoders fields -* When adding or renaming encoders/decoders or options, make sure the arrays remain sorted by name. Otherwise, the find operations will not work as expected. \ No newline at end of file diff --git a/DataCompressor/DCLib/doc/readme.md b/DataCompressor/DCLib/doc/readme.md new file mode 100644 index 0000000..05a8f92 --- /dev/null +++ b/DataCompressor/DCLib/doc/readme.md @@ -0,0 +1,38 @@ +Overview +--- + +DCLib is a library which allows compressing and decompressing (referred to as encoding and decoding henceforth) data from buffers (see DCIOLib). + +`enc_dec.h`: Allows listing and using all implemented encoders/decoders as well as their options. Life cycle: `GetEncoder` -> (optional option configuration, see below) -> `enc_dec_t.encoder` (for encoding) or `enc_dec_t.decoder` (for decoding) call on initialized input and output bit buffers. Optional option configuration: (optional) `OptionNameExists` -> (optional) `EncoderSupportsOption` -> `GetOptionType` -> `GetAllowedOptionValueRange` -> `SetOptionValue`. + +Encoders/decoders +--- + +* aggregate: Sums of `num_values` (option name) consecutive floating-point values (no decoder!). +* bac: Performs binary arithmetic coding as implemented by Witten et al. +* copy: Copies the input to the output, i.e., it performs no compression whatsoever. This encoder/decoder operates on blocks of `blocksize` (option name) bits size. +* csv: Reads lines of comma-separated values and converts the strings in column number `column` (option name) of each line to a list of (binary) floating-point values when encoding; performs the reverse conversion when decoding and inserts blank columns if necessary. +* diff: Encodes (signed) differences between consecutive (unsigned) values of `valuesize` (option name) bits size when encoding; reconstructs (unsigned) values of `valuesize` (option name) bits size from their consecutive (signed) differences when decoding +* lzmh: Performs LZMH coding and decoding from Ringwelski et al. This is an integrated third-party implementation. +* normalize: Converts floating-point values to (signed) integer values of `valuesize` (option name) bits size when encoding; performs the reverse conversion when decoding. To preserve decimal places after the decimal point, all values are multiplied by `normalization_factor` (option name) when encoding, and divided when decoding. +* seg: Creates Exponential Golomb code words from values when encoding; reconstructs Exponential Golomb code words when decoding. All values are `valuesize` (option name) bits in size and signed. + +Supported encoder input and output formats +--- + +Note: Decoder input and formats are reversed, if there is a decoder). + +* aggregate: binary float in, binary float out +* bac: arbitrary in, binary out +* copy: arbitrary in, arbitrary out +* csv: ASCII float in, binary float out +* diff: unsigned int in, signed int out +* lzmh: ASCII float in, binary out +* normalize: float in, signed int out +* seg: signed int in, binary out + +Notes on usage +--- + +* GetEncoderNames requires a `char*` array with `GetNumberOfEncoders` fields. +* When adding or renaming encoders/decoders or options, make sure the arrays remain sorted by name. Otherwise, the find operations will not work as expected. From ad9d5b652040c792658f88b5281a189d1cd57b04 Mon Sep 17 00:00:00 2001 From: Andreas Unterweger Date: Mon, 1 Apr 2019 10:48:17 +0200 Subject: [PATCH 4/6] Fixed missing periods --- DataCompressor/DCIOLib/doc/readme.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/DataCompressor/DCIOLib/doc/readme.md b/DataCompressor/DCIOLib/doc/readme.md index 7a9db66..420930c 100644 --- a/DataCompressor/DCIOLib/doc/readme.md +++ b/DataCompressor/DCIOLib/doc/readme.md @@ -3,15 +3,15 @@ Overview DCIOLib is a library which allows performing bit-wise reading and writing operations on buffers which are linked to either files or memory. -`buffer.h` (`buffer_t`): A buffer implementation for byte-wise reading and writing operations. It can be resized, if necessary, while retaining the old data. Life cycle: `AllocateBuffer` -> `InitBuffer` -> (read, write or other operations) -> `UninitBuffer` -> `FreeBuffer` +`buffer.h` (`buffer_t`): A buffer implementation for byte-wise reading and writing operations. It can be resized, if necessary, while retaining the old data. Life cycle: `AllocateBuffer` -> `InitBuffer` -> (read, write or other operations) -> `UninitBuffer` -> `FreeBuffer`. -`file_buffer.h` (`file_buffer_t`): A buffer implementation for byte-wise reading and writing operations on files. It wraps a `buffer_t` and can thus also be used to read or write in memory. It is possible to switch between reading and writing. Life cycle: `AllocateFileBuffer` -> `InitFileBuffer` with an opened file or `InitFileBufferInMemory` -> (read, write or other operations) -> `UninitFileBuffer` -> `FreeFileBuffer` +`file_buffer.h` (`file_buffer_t`): A buffer implementation for byte-wise reading and writing operations on files. It wraps a `buffer_t` and can thus also be used to read or write in memory. It is possible to switch between reading and writing. Life cycle: `AllocateFileBuffer` -> `InitFileBuffer` with an opened file or `InitFileBufferInMemory` -> (read, write or other operations) -> `UninitFileBuffer` -> `FreeFileBuffer`. -`bit_file_buffer.h` (`bit_file_buffer_t`): A buffer implementation for bit-wise reading and writing on files or in memory. It provides single-bit and constant-bit-size read/write access. It uses uses a `file_buffer_t` which needs to be initialized and uninitialized separately. It is possible to switch from writing to reading; the opposite way is not supported. Life cycle: `AllocateBitFileBuffer` -> `InitBitFileBuffer` with an initialized `file_buffer_t` instance -> (read, write or other operations) -> `UninitBitFileBuffer` -> `FreeBitFileBuffer` +`bit_file_buffer.h` (`bit_file_buffer_t`): A buffer implementation for bit-wise reading and writing on files or in memory. It provides single-bit and constant-bit-size read/write access. It uses uses a `file_buffer_t` which needs to be initialized and uninitialized separately. It is possible to switch from writing to reading; the opposite way is not supported. Life cycle: `AllocateBitFileBuffer` -> `InitBitFileBuffer` with an initialized `file_buffer_t` instance -> (read, write or other operations) -> `UninitBitFileBuffer` -> `FreeBitFileBuffer`. Notes on usage --- -* Although `bit_file_buffer_t` cannot be changed from reading mode back to writing mode, it is possible to reset the buffer, which discards buffered data -* When `file_buffer_t` is used to write to memory, the underlying buffer will be automatically resized when it is too small -* `file_buffer_t` and `bit_file_buffer_t` flush contents automatically when they are uninitialized. To do so before uninitializing, an explicit flush operation is required +* Although `bit_file_buffer_t` cannot be changed from reading mode back to writing mode, it is possible to reset the buffer, which discards buffered data. +* When `file_buffer_t` is used to write to memory, the underlying buffer will be automatically resized when it is too small. +* `file_buffer_t` and `bit_file_buffer_t` flush contents automatically when they are uninitialized. To do so before uninitializing, an explicit flush operation is required. From 1208278e7afed2203491c6aeb96f326a86e1183d Mon Sep 17 00:00:00 2001 From: Andreas Unterweger Date: Mon, 1 Apr 2019 10:50:56 +0200 Subject: [PATCH 5/6] Convert DCCLI documentation to Markdown --- .../DCCLI/doc/{overview.txt => readme.md} | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-) rename DataCompressor/DCCLI/doc/{overview.txt => readme.md} (61%) diff --git a/DataCompressor/DCCLI/doc/overview.txt b/DataCompressor/DCCLI/doc/readme.md similarity index 61% rename from DataCompressor/DCCLI/doc/overview.txt rename to DataCompressor/DCCLI/doc/readme.md index 9e7045e..073e3e4 100644 --- a/DataCompressor/DCCLI/doc/overview.txt +++ b/DataCompressor/DCCLI/doc/readme.md @@ -1,11 +1,18 @@ +Overview +--- + DCCLI is a command line application which allows compressing and decompressing (referred to as encoding and decoding henceforth) files using DCLib. -Usage: -The list of encoders/decoders is separated by a separate #. Each encoder/decoder must specify either 'encode' or 'decode', followed by the encoder/decoder name. Options can be specified separately after that. They affect only encoder/decoder that precedes them in the command line. Options are specified as = or for boolean options. -Example: input.dat output.dat encode copy # decode copy blocksize=8 +Usage: ` ` + +The list of encoders/decoders is separated by a separate #. Each encoder/decoder must specify either `encode` or `decode`, followed by the encoder/decoder name. Options can be specified separately after that. They affect only encoder/decoder that precedes them in the command line. Options are specified as `=` or `` for boolean options. + +Example: `input.dat output.dat encode copy # decode copy blocksize=8` + +Notes on usage +--- -Notes on usage: -* If a fractional number of bytes (i.e., a number of bits not divisible by eight) is written to the output file, decoding said output file later may lead to errors at the last byte when processing the superfluous bits at the end of the file -* When using only one encoder/decoder, data read from the input file is processed and written directly (buffered) to the output file, requiring no additional memory. If, however, multiple encoders/decoders are used, data read from the input file is processed and written to a temporary buffer. For all but the last encoder/decoder, data is read from this temporary buffer, processed and written to another temporary buffer. For the last encoder/decoder, data from this temporary buffer is read, processed and written to the output file. Since all data is processed by one encoder/decoder after another, all intermediate data will be held in the described temporary buffers. Processing large files can therefore lead to high memory consumption -* The size of the temporary buffers described above may be reduced at compile-time via TEMP_BUFFER_SIZE. However, since the buffers resize themselves automatically, TEMP_BUFFER_SIZE is only their initial size, which is no indicator of the acutal memory consumption when processing larger files with more than one encoder/decoder -* The size of the input and output file buffers may be reduced at compile-time via READ_BUFFER_SIZE and WRITE_BUFFER_SIZE. Both are guaranteed to remain unchanged throughout the execution of the program \ No newline at end of file +* If a fractional number of bytes (i.e., a number of bits not divisible by eight) is written to the output file, decoding said output file later may lead to errors at the last byte when processing the superfluous bits at the end of the file. +* When using only one encoder/decoder, data read from the input file is processed and written directly (buffered) to the output file, requiring no additional memory. If, however, multiple encoders/decoders are used, data read from the input file is processed and written to a temporary buffer. For all but the last encoder/decoder, data is read from this temporary buffer, processed and written to another temporary buffer. For the last encoder/decoder, data from this temporary buffer is read, processed and written to the output file. Since all data is processed by one encoder/decoder after another, all intermediate data will be held in the described temporary buffers. Processing large files can therefore lead to high memory consumption. +* The size of the temporary buffers described above may be reduced at compile-time via `TEMP_BUFFER_SIZE`. However, since the buffers resize themselves automatically, `TEMP_BUFFER_SIZE` is only their initial size, which is no indicator of the acutal memory consumption when processing larger files with more than one encoder/decoder +* The size of the input and output file buffers may be reduced at compile-time via `READ_BUFFER_SIZE` and `WRITE_BUFFER_SIZE`. Both are guaranteed to remain unchanged throughout the execution of the program. From 0d4194e7ccac3296b81501e835e5f309710f61bf Mon Sep 17 00:00:00 2001 From: Andreas Unterweger Date: Mon, 1 Apr 2019 10:51:35 +0200 Subject: [PATCH 6/6] Fixed missing periods --- DataCompressor/common/doc/readme.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/DataCompressor/common/doc/readme.md b/DataCompressor/common/doc/readme.md index 6ad1b69..454f06c 100644 --- a/DataCompressor/common/doc/readme.md +++ b/DataCompressor/common/doc/readme.md @@ -12,6 +12,6 @@ common is a collection of headers for other libraries Notes on usage --- -* `IO_SIZE_BITS` specifies the number of bits used for file-I/O-related operations. In particular, the size of return values for Read/Write functions in dependent libraries are based on it -* If `IO_SIZE_BITS` is the same size as size_t, the Read/Write functions in dependent libraries do not work properly if the MSB of a size_t variable specifying the size to be read/written is used. For example, if `IO_SIZE_BITS` is 32 and `sizeof(size_t)` is 4, the maximum size (parameter value) that the Read/Write function can work with is `2^31 - 1`, i.e., the 32nd bit cannot be used. If it is used, the return value of the functions will be interpreted as an error (since it is interpreted as a negative number) -* Error codes have to be negative in order to distinguish them from return values which signal the amount of bytes read/written (which is positive) +* `IO_SIZE_BITS` specifies the number of bits used for file-I/O-related operations. In particular, the size of return values for Read/Write functions in dependent libraries are based on it. +* If `IO_SIZE_BITS` is the same size as size_t, the Read/Write functions in dependent libraries do not work properly if the MSB of a size_t variable specifying the size to be read/written is used. For example, if `IO_SIZE_BITS` is 32 and `sizeof(size_t)` is 4, the maximum size (parameter value) that the Read/Write function can work with is `2^31 - 1`, i.e., the 32nd bit cannot be used. If it is used, the return value of the functions will be interpreted as an error (since it is interpreted as a negative number). +* Error codes have to be negative in order to distinguish them from return values which signal the amount of bytes read/written (which is positive).