Skip to content

coriolinus/idx-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

IDX-RS: Index File Decoder

What is this?

A pure-Rust implementation of a streaming IDX file reader.

IDX File Format

As described in the MNIST Database of Handwritten Digits:

The IDX file format is a simple format for vectors and multidimensional matrices of various numerical types. The basic format is

magic number
size in dimension 0
size in dimension 1
size in dimension 2
.....
size in dimension N
data

The magic number is an integer (MSB first). The first 2 bytes are always 0.

The third byte codes the type of the data:

0x08: unsigned byte
0x09: signed byte
0x0B: short (2 bytes)
0x0C: int (4 bytes)
0x0D: float (4 bytes)
0x0E: double (8 bytes)

The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....

The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).

The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.

Happy hacking.

Example of IDX Format:

TEST SET IMAGE FILE (t10k-images-idx3-ubyte):


[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000803(2051) magic number
0004     32 bit integer  10000            number of images
0008     32 bit integer  28               number of rows
0012     32 bit integer  28               number of columns
0016     unsigned byte   ??               pixel
0017     unsigned byte   ??               pixel
........
xxxx     unsigned byte   ??               pixel

About

Pure-Rust implementation of a streaming IDX file reader. See http://yann.lecun.com/exdb/mnist/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages