# Imports

In [1]:
#Needed for reading the data
import h5py as h5
#Needed for checking file existence
import os 
#Needd for array handling slicing
import numpy as np



# Custom imports (i.e. own scripts)

In [2]:
"""

If you use this ipython notebook outside its originaln directory
please set the path to the COMPAS repository and uncomment the lines below
this way it automatically finds the custom scripts that we need to import

"""
#pathCOMPASrepo       = '/home/cneijssel/Documents/COMPASpop/'
#pathToPostProcessing = 'popsynth/Papers/NeijsselEtAl/PostProcessing/1_H5File/
#import sys
#sys.path.append('athCOMPASrepo+pathToPostProcessing)

#The script we use to quickly write an h5file
import WriteH5File as Writer


# Path to Data

In [3]:
#Tip an absolute path is useful to keep
#script working everywhere on your PC
path     = '/home/cneijssel/Desktop/RLOF_test/First_Solar/'
filename = 'COMPASOutput.h5'

# Writing the Data h5file

The writing of the data file presented here assumes you already have
a COMPASOutput.h5 and want to write a reduced version of this.
This is for example needed when you want to publish data and don't
need to give people the full 90GB when you only used 10GB.

Lets first quickly check if we have given the correct path for the input file


In [4]:
if not  os.path.isfile(path+filename):
    raise ValueError("h5 file not found. Wrong path given?")
elif os.path.isfile(path+filename):
    Data  = h5.File(path+filename)

If it gives the following error:
    
OSError: Unable to create file (unable to open file: name = /......../ COMPASOutput.h5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2)

Then there is a big chance the file is already
being read by another program

# The writer function

This script is a class which takes a few input variables when you make an instance,
i.e. first define the class. It will then automatically check if the input makes sense, 
and throw an error if it doesn't. It also gives warnings which if you want you can ignore.

The instance and their defaults in the script:

- pathToOriginal  = 
                    ---STRING---
                    The path to the original datafile you want to use, in this notebook
                    it is the same path as you put at the start (i.e. pathToOriginal=path)
                          

- fileNameOriginal = 
                    ---STRING---
                    The name of the original datafile, usually COMPASOutput.h5, in this notebook
                    it is the same name as you put at the start (i.e. filename=filename)
                    
- pathToNew        = 
                    ---STRING---
                    The path where the new datafile will be written, if you want it next to your 
                    other datafile just write (i.e. pathToNew=path) However, be sure in this case
                    that fileNameNew is different from the original.
                          
- fileNameNew      =
                    ---STRING----
                    The name you want to give to the new datafile, careful if you write to the same
                    folder as the original file. It should raise an error if input and output paths are the same


- seeds            =
                    ---1D array---
                    The seeds you want from the original data file. This is a 1D array
                    i.e. if you have X seeds the shape is (X,)

- groups           =
                    ---LIST OF STRINGS or STRING---
                    A list of groups you want to write to the new file. Reminder, groups means the
                    filenames such as RLOF/systems/doubleCompactObjects. In this notebook you can type
                    Data.keys() to see the groups. If you just want all the files you can pass the string
                    'All' instead list of strings
                    
- dataSets         =
                    ---LIST OF (LISTS OF STRINGS or STRING)---
                    A list of the datasets (i.e. column names) you want per group.
                    If you have three groups you pass a list here with three lists in the same
                    order. If you want all columns of a group just pass the string 'All'
                    Examples will follow. If you want to know the available column name of a group 
                    in this notebook type Data[group].keys(). 

# Examples

### Define the filenames

In [5]:
pathToOriginal   = path
fileNameOriginal = 'COMPASOutput.h5'

pathToNew        = path
fileNameNew      = 'COMPAS_BBH.h5'

### get the seeds

In [6]:
# I only want the seeds of BBHs

DCOgroup  =  Data['doubleCompactObjects']
type1     =  DCOgroup['stellarType1'][...].squeeze()
type2     =  DCOgroup['stellarType2'][...].squeeze()
maskBBH   =  (type1==14) & (type2==14)
seedsBBH  =  DCOgroup['seed'][...].squeeze()[maskBBH]

In [7]:
## I want all the initial parameters from the systems file
## I want only M1, M2 and separation from the doubleCompact objects file

#Remind me of names uncomment next line
# Data.keys()



In [8]:
groups        = ['systems', 'doubleCompactObjects']

#For clarity I list them separately
columnSystems = 'All'
columnDCOs    = ['M1', 'M2', 'separationDCOFormation']

#combine the lists in a list for input
dataSets      = [columnSystems, columnDCOs]

# say you want all columns of all files. Do a quick loop
#Note that this will break if one of the groups is empty say XrayBinaries
#groups   = 'All'
#dataSets = []
#for group in Data.keys():
#    dataSets.extend('All')

In [9]:
h5writer = Writer.createDataSet(pathToOriginal=pathToOriginal, fileNameOriginal=fileNameOriginal,\
                                     pathToNew=pathToNew          , fileNameNew=fileNameNew, \
                                     seeds=seedsBBH, groups=groups, dataSets=dataSets)



Good to go just call writeToFile(), it will print 'done' when done


In [10]:
h5writer.writeToFile()


done


### Test the data file

In [11]:
import PrintAllH5Columns
PrintAllH5Columns.printAllColumnsInH5(pathToNew, filename=fileNameNew)

Filename = doubleCompactObjects
----------------------
	   column name                             unit                length
	   --------------------------------------------------------------------
	   M1                                      b'Msol'                641
	   M2                                      b'Msol'                641
	   separationDCOFormation                  b'AU'                  641
Filename = systems
----------------------
	   column name                             unit                length
	   --------------------------------------------------------------------
	   CE_Alpha                                b'#'                   641
	   ID                                      b'#'                   641
	   LBV_multiplier                          b'#'                   641
	   Metallicity1                            b'#'                   641
	   Metallicity2                            b'#'                   641
	   -------------------------------------------

# The main commands/syntax

Essentially COMPAS sorts its outputs in several topics.
We give each topic a seperate filename with a specific number of columns.
In h5py notation this translates into different groups
each have a specific number of datasets.


Some of the examples were taken from

https://www.christopherlovell.co.uk/blog/2016/04/27/h5py-intro.html

-- create data File

hf = h5py.File('data.h5', 'w')

-- create a file within

g1 = hf.create_group('group1')

-- add a column to a file

g1.create_dataset('data2',data=d1)

you can also add attributes to the file to give meta Data


In [12]:
#I am done here see ya!
Data.close()