# Convert CSV files to HDF5
## 1. Introduction

COMPAS can produce log files in several formats:

- Hierarchical Data Format version 5 (`HDF5`)

- Comma Separated Values (`CSV`)

- Tab Separated Values (`TSV`)

- Plain text: space separated values (`TXT`)

Here we show an example of how to combine the data of a COMPAS simulation that produced multiple non-``HDF5`` files into a single `HDF5` file using the `postProcessing` script in the `postProcessing` directory. 

*Notes:*

- The script can also be called from terminal (for more info read the script). Here we show how to call  it from another script such as an ipython notebook.
- You will need to have run COMPAS at least once, and produced output files with some content, in order  to use this.

### 1.1 Imports

In [26]:
import sys   # for adding path python Script to python environment
import os    # for handling paths
#Note that the script has its own imports

# Set the path to the data conversion script and import the script
compasRootDir = os.environ['COMPAS_ROOT_DIR']  # This environment variable should already be set if you've successfully run COMPAS
defaultsFolder = compasRootDir + 'postProcessing/'

sys.path.append(defaultsFolder)

import postProcessing as postProc


## 2. User-Specified Options

The following parameters are set to their defaults in the postProcessing.py script, but the user should adjust them as needed.

### 2.1 dataRootDir
The path to the data files depends on whether the simulation used regular or HPC COMPAS.

1 - Regular COMPAS uses a single core which creates one `COMPAS_Output` folder with all the data files.    Set *dataRootDir* to this folder.
    
2 - HPC COMPAS splits the run over multiple cores and produces one `COMPAS_Output` folder per core. Set    *dataRootDir* to the parent directory of the output folders. A *tree walker* will grab and collect the    data in the subdirectories.

### 2.2 prefix, delimiter, extension
The file prefixes and extensions, and data delimiters, can be set as desired in the `pythonSubmit`. The values here need to match those settings.

### 2.3 h5name
The desired name of the `HDF5` file that will be created.

In [27]:
dataRootDir    = '.'                # Location of root directory of the data     # defaults to '.'            
prefix         = 'BSE_'             # Prefix of the data files                   # defaults to 'BSE_'  
delimiter      = ','                # Delimeter used in the output csv files     # defaults to ','        
extension      = 'csv'              # Extension of the data files                # defaults to 'csv'
h5Name         = 'COMPAS_Output.h5' # Name of the output h5 file                 # defaults to 'COMPAS_Output.h5' 

### 2.4 filesToCombine

This parameter determines which COMPAS output files the user would like to include in the `HDF5` file. To select only certain output files, uncomment them appropriately (and comment the first line).

*Note:* If a COMPAS run is small enough, a given output file might not be produced (if, for example, there are no double compact objects formed there will be no `BSE_Double_Compact_Objects` file).

In [28]:
filesToCombine = None    # default None means to use all of them (apologies if that's counterintuitive...)

#filesToCombine = [\
#    'SystemParameters',\
#    'CommonEnvelopes',\
#    'DoubleCompactObjects',\
#    'Supernovae',\
#    'RLOF',\
#    'errors',\            
#    'output'\
#]  

## 3. Remarks

Because the script copies and combines all the data into new `CSV` files, you will need a similar amount of available space in order to create these files. They will automatically be removed after the `HDF5` file is created.

The `HDF5` file will be placed in the same directory as the path given. 

The final `HDF5` file groups and contents are printed to `stdout`.

## 4. Example


In [None]:
postProc.main(dataRootDir    = dataRootDir, \
              prefix         = prefix,      \
              delimiter      = delimiter,   \
              extension      = extension,   \
              h5Name         = h5Name,      \
              filesToCombine = filesToCombine)