Skip to content

NChilada File Format

Thomas Quinn edited this page Jan 14, 2016 · 1 revision

The NChilada File Format represents an effort to use the file system as a database. Rather than one monolithic file, the NChilada creates individual files for every property represented in the data. This makes the format easily extensible and as simulations grow into the billions of particles, many different ~1 GB files seem more manageable than one 100s GB file which you have to read all of every time you want to analyze anything.

Every number in the files is written in XDR (eXternal Data Representation; "standard" format in the language of tipsy and pkgrav/gasoline) for portability. In C, that means including the following files:

 #include <rpc/types.h>
 #include <rpc/xdr.h>

Table of Contents

Description

The format is hierarchical to enable multiple particle types (or "families"). Typically a directory strucuture has the output step as the top directory, for example

 % ls
 simulation.00192

This directory contains the family subdirectories, for example

 % ls simulation.00192
 dark   gas  star

Each of the family directories contain all the property files:

 % ls -lh simulation.00192/dark
 total 29G
 -rw-r--r--  1 user G 2.5G Aug  1 13:05 den
 -rw-------  1 user G 2.5G Aug  1 13:04 iord
 -rw-------  1 user G 2.5G Aug  1 13:03 mass
 -rw-------  1 user G 7.3G Aug  1 13:01 pos
 -rw-------  1 user G 2.5G Aug  1 13:03 pot
 -rw-------  1 user G 2.5G Aug  1 13:03 smoothlength
 -rw-------  1 user G 2.5G Aug  1 13:05 soft
 -rw-------  1 user G 7.3G Aug  1 13:02 vel

Header

Each file contains a 28 byte header that describes some basic information. The header consists of a 4 byte "magic number" (1062053), a double "time", an 8 byte integer "number of particles", a 4 byte int "number of dimensions" (1 or 3 for scalar or vector data) and a 4 byte data type code.

Following this basic header are the minimum and maximum values of that property represented in the file. These are the same number of dimensions as specified in ndim. For vector properties, the extreme values are the extreme for each dimension.

A C struct to describe the header is as follows:

 struct nchilada_dump {
  int    magic;
  double time;
  int    iHighWord;
  int    nbodies;
  int    ndim;
  int    code;
 } ;

A simple xdr function to read the header into this structure is:

 int xdr_NCHeader(XDR *pxdrs,struct nchilada_dump *ph)
 {
    if (!xdr_int(pxdrs,&ph->magic)) return 0;
    if (!xdr_double(pxdrs,&ph->time)) return 0;
    if (!xdr_int(pxdrs,&ph->iHighWord)) return 0;
    if (!xdr_int(pxdrs,&ph->nbodies)) return 0;
    if (!xdr_int(pxdrs,&ph->ndim)) return 0;
    if (!xdr_int(pxdrs,&ph->code)) return 0;
    return 1;
    }

The data type code is

 1:int8: "signed 8-bit integer", 
 2: uint8: "unsigned 8-bit integer"; 
 3: int16: "signed 16-bit integer"; 
 4: uint16: "unsigned 16-bit integer"; 
 5: int32: "signed 32-bit integer"; 
 6: uint32: "unsigned 32-bit integer"; 
 7: int64: "signed 64-bit integer"; 
 8: uint64: "unsigned 64-bit integer"; 
 9: float32: "floating point (single precision)"; 
 10: float64: "floating point (double precision)".

The enum for "code":

 enum NCDataTypeCode {
  int8 = 1,
  uint8,
  int16,
  uint16,
  int32,
  uint32,
  int64,
  uint64,
  float32,
  float64
 };

Example code

The initial file format was implemented in the nchilada cvs repository in the structures/tree_xdr.h. Subsequently, C code has been written to read, process, and write NChilada format simulation outputs.

There's a header file, tipsydefs.h and an example file that uses the header code, splitbox.c. Of note in the C file is the xdr_type function that reads any data element based on the type code it is passed.

The hard part about writing code that uses NChilada format is that there are so many files. splitbox.c reads all the file names from the family directory and dynamically creates file handles for every file.

Working with Salsa

Salsa wants to read NChilada formatted files. As of this writing, Salsa expects an xml file at the same level as the family directories. Additionally, Salsa requires position and velocity (XXX?) files so that it can display something.