## Data Structure




### cluster particle data

The data of the Horizon run 5 simulation for our study has been split into various "clusters{snap}.hdf5" files where {snap} indicates the snapshot number of simulation. For example, the last snapshot at redshift of 0.625 is 296 for which the file name would be clusters296.hdf5. 

### cluster merger files

These are csv files containing the snapshot number and cluster id of the cluster at that snapshot.
It contains 8 columns, the following are of interest for the study: 

- time: Lookback time in Gyr
- snap: snapshot number of simulation
- HostHaloID: HostHaloID of the cluster at that snapshot
- ClusMass(Msun): Mass of the cluster at that epoch in solar mass.
- Massfraction: fraction of the final mass at that epoch


## Analysis

The <a>params.ini</a> sets the paths. Set the path of <b>outdir</b> to the location of hdf5 files.

You can make the use of HR5_module.py to make the analysis. It contains the functions and classes used to perform the analysis.
In the following we will see the functions available for analysis.

In [5]:
# import the modules
import HR5_cluster as hr5
import numpy as np 
import matplotlib.pyplot as plt
import pandas as pd 

In [18]:
# Get all the IDs of clusters present at the given snapshot
snapshot=296

# Let's define the instance of the class Cluster with the snapshot
clus296 = pd.read_csv('../Data/groups5e13.csv')

clus296.columns

Index(['HostHaloID', 'HostMtot(Msun)'], dtype='object')

In [19]:
# These are the list IDs of the clusters
cluslist = clus296['HostHaloID'].tolist()
print(cluslist)
# Lets pick 10th cluster in the list
clusid = cluslist[10]


[1561636, 1581385, 1664541, 1758257, 1808858, 1827559, 1847383, 1954735, 1983863, 2002628, 2013898, 2071135, 2199507, 2227715, 2246014, 2290169, 2290206, 2507502, 2592446, 2623756, 2734822, 2885792, 2892837, 2937863, 2944981, 3016893, 3069850, 3094112, 3178107, 3200641, 3226680, 3259117, 3284456, 3355500, 3359016, 3540174, 3651767, 3672299, 3689051, 3700933, 3744183, 3748260, 3780410, 3811219, 3882383, 3889787, 3929760, 3929767, 3929805, 3933724, 3945090, 4040811, 4063505, 4068321, 4081915, 4083100, 4147752, 4195553, 4226110, 4289749, 4309863, 4328263, 4415004, 4465008, 4465020, 4477204, 4481843, 4481851, 4549606, 4560890, 4617549, 4622511, 4658833, 4658856, 4714730, 4827650, 4838809, 4899516, 4901245, 4921745, 4966761, 4968706, 5011121, 5028157, 5033760, 5044774, 5049205, 5208836, 5259095, 5265012, 5332403, 5358495, 5375604, 5409575, 5462023, 5493185, 5520106, 5548455, 5573913, 5634590, 5638753, 5699819, 5715147, 5753402, 5816171, 5871163, 5884858, 5957816, 5960768, 5979633, 6143068, 

In [21]:
# Initialize the class instance with the given `snapno` and `clusno`
# The instance will contain all the information of the cluster and functions for further # processing
clus = hr5.Cluster(snapshot,clusid)


In [24]:
# list all the functions and variables for the clus instance by dir(clus)
# the items with '__' are private and can't be accessed, others are public
# we get only public attributes

# Get the list of all attributes and functions of the clus object
all_attributes = dir(clus)

# Filter the list to only include public attributes and functions
public_attributes = [attr for attr in all_attributes if not attr.startswith('_')]

# Print the resulting list of public attributes and functions
print(public_attributes)


['BCG_ID', 'clusID', 'clus_mdm', 'clus_mgas', 'clus_msink', 'clus_mstar', 'clus_mtot', 'clus_ngas', 'clus_nsink', 'clus_nstar', 'clus_nsub', 'clus_pos', 'clus_vel', 'f', 'get_all_parts', 'get_alldat_gal', 'get_galids', 'save_yt_dataset', 'snap']


Attributes are the functions and variables that can be accessed using clus.{attribute_name} where {attribute_name} is the 
name of the attribute. We explain all the 

'BCG_ID'
'clusID'
'clus_mdm'
'clus_mgas'
'clus_msink'
'clus_mstar'
'clus_mtot'
'clus_ngas'
'clus_nsink'
'clus_nstar'
'clus_nsub'
'clus_pos'
'clus_vel'
'f'
'get_all_parts'
'get_alldat_gal'
'get_galids'
'save_yt_dataset'
'snap'

<HDF5 file "clusters296.hdf5" (mode r)>