# Introduction


The most important number in the COMPAS data is the seed. The seed represents the unique identifier to a specific system in a simulation. Therefore the properties of a single system can be recovered by looking at seeds in different types of files. 

Here we introduce the basics of manipulating the data using the seeds. For example how we get the initial parameters of systems that ended up forming double compact objects

Most often we start using python with 'for loops' and add the systems of interest to a list. However, such loops might take long.
Here we present how we can more efficiently 'slice' the data using boolean masks. These are slightly more demanding but are quick and use intuitive logic.


We assume you already have a h5file with data, if not see in section 1 how you can create the h5file using the csv data of your simulation, or download some data from compas.science.

# careful these cells show examples which  take long if you test them on large data.



# Path to be set by user


In [1]:
pathToData = '/home/cneijssel/Desktop/Test/COMPAS_output.h5'

# Imports

In [2]:
#python libraries
import numpy as np               #for handling arrays
import h5py as h5                #for reading the COMPAS data
import time                      #for timeing computation time

In [3]:
Data  = h5.File(pathToData)
print(Data.keys())
nrSystems = len(Data['SystemParameters']['SEED'][()])
Data.close()

<KeysViewHDF5 ['CommonEnvelopes', 'DoubleCompactObjects', 'RLOF', 'Supernovae', 'SystemParameters']>


The print statement shows the different types of files that are combined in your h5file.
Now the seed is the number that links the information of say the supernovae to the information in the SystemParameters

### Question: What were the initial total masses of the double compact objects

### The classic way when starting with python

In [4]:
def returnTotalMasses(pathData=None):
    Data  = h5.File(pathToData)
    
    totalMasses = []
    
    #for syntax see section 1 with basic syntax
    seedsDCOs     = Data['DoubleCompactObjects']['SEED'][()]
    
    #get info from ZAMS
    seedsSystems  = Data['SystemParameters']['SEED'][()]
    M1ZAMSs       = Data['SystemParameters']['Mass@ZAMS_1'][()]
    M2ZAMSs       = Data['SystemParameters']['Mass@ZAMS_2'][()]

    
    
    for seedDCO in seedsDCOs:
        for nrseed in range(len(seedsSystems)):
            seedSystem = seedsSystems[nrseed]
            if seedSystem == seedDCO:
                M1 = M1ZAMSs[nrseed]
                M2 = M2ZAMSs[nrseed]
                Mtot = M1+M2
                totalMasses.append(Mtot)
    Data.close()
    return totalMasses

In [5]:
start   = time.time()
MtotOld = returnTotalMasses(pathData=pathToData)
end     = time.time()
print(end - start, 'seconds for %s systems' %(nrSystems)) 

120.01296210289001 seconds for 300000 systems


# Steps of optimising the above loop

## 1 - using boolean masks in one file

an array and a list are both series of input. 
However, when you work with arrays you can use numpy to do some optimsed tricks
for example. Adding the entries of two lists

In [6]:
Data  = h5.File(pathToData)

M1ZAMS  = Data['SystemParameters']['Mass@ZAMS_1'][()]
M2ZAMS  = Data['SystemParameters']['Mass@ZAMS_2'][()]
Mtotal  = np.add(M1ZAMS, M2ZAMS)



A useful trick is when you want elements based on a condition.
Where in the classic way we put the condition in a for loop and if statement,
now we will work with an array of booleans or so called masks

In [7]:
#mask which gives total masses below or equal to 40
maskMtot = Mtotal <=40
#apply mask to get the masses
MtotalBelow40 = Mtotal[maskMtot]

The crucial trick is that you can apply this mask to other columns in the same file 
as long as you keep the length of the mask the same to the column that you apply it to.


In [8]:
# seeds of systems with total masses below 40
seeds  = Data['SystemParameters']['SEED'][()]
seedsMtotBelow40 = seeds[maskMtot]

Data.close()

Note that this works because the order of the two columns (seeds and total masses) are
the same. Rephrased, the total mass at the third entry corresponds to the seed at the third entry.

## 2 - using seeds as mask between files

Before we continue it is useful to realise how the COMPAS-popsynth printing works.
Every time you simulate a system the output is printed in different files. 
Hence if you have four systems with seedds 1,2,3,4 then COMPAS will evolve seed 1, print the output, and then continue to evolve system 2 etc.

If all four systems each had one supernova and one double compact object, then the array 'SEED' in each group would always look like [1,2,3,4]. Hence, if you create a mask based on information in one group (like the example above), you could immediately apply it to a column in the order group because they are ordered the same AND of the SAME LENGTH.
However this is usually not the case.

A system might not form a double compact object, maybe only seed 2 and 4.
The trick is that the order in which the seeds are evaluated in COMPAS remains the same.
Hence if you can create a mask that shows all the systems that became a DCO in the SystemParameters file, then after applying the mask the information between the two files
is both ordered the same and of the same length.

In [9]:
#small example
SystemSeeds = np.array([1,2,3,4])
SystemMass1 = np.array([10,20,15,45])
DCOSeeds    = np.array([2,4])

#compare which element of 1-d array are in other
mask = np.in1d(SystemSeeds, DCOSeeds)
print(mask)
print(SystemSeeds[mask])
print(SystemMass1[mask])


[False  True False  True]
[2 4]
[20 45]


The above shows how you can get the initial masses of DCO with seeds 2 and 4

# Optimised loop

In [10]:
def returnTotalMasses2(pathData=None):
    Data  = h5.File(pathToData)
    
    totalMasses = []
    
    #for syntax see section 1 with basic syntax
    seedsDCOs     = Data['DoubleCompactObjects']['SEED'][()]
    #get info from ZAMS
    seedsSystems  = Data['SystemParameters']['SEED'][()]
    M1ZAMSs       = Data['SystemParameters']['Mass@ZAMS_1'][()]
    M2ZAMSs       = Data['SystemParameters']['Mass@ZAMS_2'][()]
    
    MZAMStotal    = np.add(M1ZAMS, M2ZAMS)
    
    maskSeedsBecameDCO  = np.in1d(seedsSystems, seedsDCOs)
    totalMassZAMSDCO    = MZAMStotal[maskSeedsBecameDCO]
    
    Data.close()
    return totalMassZAMSDCO

In [11]:
start   = time.time()
MtotNew = returnTotalMasses2(pathData=pathToData)
end     = time.time()
print(end - start, 'seconds for %s systems' %(nrSystems)) 

0.038631439208984375 seconds for 300000 systems


In [12]:
# test if I was lying (need to turn list into array)
print(np.array_equal(np.array(MtotOld), MtotNew))

True


Note that the above loop can easily be expanded with more conditions.
If you do not want all the DCO initial total masses but only of the double neutron stars, then you just need to reduce the seedsDCO to those only becoming a double neutron star.

In [13]:
def returnTotalMassesDNS(pathData=None):
    Data  = h5.File(pathToData)
    
    totalMasses = []
    
    #for syntax see section 1 with basic syntax
    seedsDCOs     = Data['DoubleCompactObjects']['SEED'][()]
    type1         = Data['DoubleCompactObjects']['Stellar_Type_1'][()]
    type2         = Data['DoubleCompactObjects']['Stellar_Type_2'][()]
    maskDNS       = (type1 == 13) & (type2 == 13)
    seedsDNS      = seedsDCOs[maskDNS]
    
    #get info from ZAMS
    seedsSystems  = Data['SystemParameters']['SEED'][()]
    M1ZAMSs       = Data['SystemParameters']['Mass@ZAMS_1'][()]
    M2ZAMSs       = Data['SystemParameters']['Mass@ZAMS_2'][()]
    
    MZAMStotal    = np.add(M1ZAMS, M2ZAMS)
    
    
    maskSeedsBecameDNS  = np.in1d(seedsSystems, seedsDNS)
    totalMassZAMSDNS    = MZAMStotal[maskSeedsBecameDNS]
    
    Data.close()
    return totalMassZAMSDNS
#returnTotalMassesDNS(pathData=pathToData)

## Warning, not all files have zero or one line per seed

Sometimes a system can have multiple lines in a file. For example a system can experience 2 supernovae. The trick is then to think of a condition which is unique to the system.

For example: 

If you want to link the double compact objects to the stellar types of the systems before their supernovae, then it is ill-phrased sice you van have multiple. The slicing will then not work since the SN array can have multiple instances of the seed.

However, if you ask what were the stellar types of the primaries that went supernova, then the slicing works. Since a star can only go supernova once.