Skip to content

Commit

Permalink
Add a complete description of the chunk format
Browse files Browse the repository at this point in the history
  • Loading branch information
FrancescAlted committed Jul 2, 2020
1 parent 7edd65c commit 8491305
Showing 1 changed file with 35 additions and 3 deletions.
38 changes: 35 additions & 3 deletions README_HEADER.rst → README_CHUNK_FORMAT.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
Blosc Header Format
===================
Blosc Chunk Format
==================

The chunk is composed by a header and a blocks / splits section::

+---------+--------+---------+
| header | blocks / splits |
+---------+--------+---------+

These are described below.

The header section
------------------

Blosc (as of Version 1.0.0) has the following 16 byte header that stores
information about the compressed buffer::
Expand All @@ -12,7 +23,7 @@ information about the compressed buffer::
| +----------versionlz
+--------------version

Datatypes of the Header Entries
Datatypes of the header entries
-------------------------------

All entries are little endian.
Expand Down Expand Up @@ -63,3 +74,24 @@ All entries are little endian.
(``uint32``) Size of internal blocks.
:cbytes:
(``uint32``) Compressed size of the buffer (including this header).

The blocks / splits section
---------------------------

After the header, there come the blocks / splits section. Blocks are equal-sized parts of the chunk, except for the last block that can be shorter or equal than the rest.

At the beginning of the blocks section, there come a list of `int32_t bstarts` to indicate where the different encoded blocks starts (counting from the end of this `bstarts` section)::

+=========+=========+========+=========+
| bstart0 | bstart1 | ... | bstartN |
+=========+=========+========+=========+

Finally, it comes the actual list of compressed blocks / splits data streams. It turns out that a block may optionally (see bit 4 in `flags` above) be further split in so-called splits which are the actual data streams that are transmitted to codecs for compression. If a block is not split, then the split is equivalent to a whole block. Before each split in the list, there is the compressed size of it, expressed as an `int32_t`::

+========+========+========+========+========+========+========+
| csize0 | split0 | csize1 | split1 | ... | csizeN | splitN |
+========+========+========+========+========+========+========+


*Note*: all the integers are stored in little endian.

0 comments on commit 8491305

Please sign in to comment.