## This Notebook Outlines the process for extracting particle positions from .h5 files

#### Example for one file

In [5]:
import h5py
import glob

# Get list of all .h5 files
filepaths = glob.glob("C:/Users/arnav/Personal - Arnav Chhajed/Northwestern/genome_organization/examples/arnav/test_output/blocks_*.h5")

# Sort the list to make sure it's in order
filepaths.sort()

# Pick the first one just as an example
filepath = filepaths[0]

with h5py.File(filepath, 'r') as f:
    group_name = list(f.keys())[0]
    pos = f[group_name]['pos'][:]

print(pos.shape)
print(pos)

(100, 3)
[[ 2.92744554  0.7850631   2.93205285]
 [ 2.70731298  1.2725971   2.16629245]
 [ 2.12048737  0.86586159  1.54403145]
 [ 1.73764095  1.48128745  0.99914121]
 [ 1.53747177  2.35696565  0.48012666]
 [ 0.8409037   3.08506179  0.11158837]
 [-0.02374492  2.594173    0.32141344]
 [-0.57029128  3.54348856  0.07646505]
 [-1.37977172  4.03055055 -0.16763047]
 [-2.31156372  3.44182931 -0.52756678]
 [-2.5894045   2.82072849 -1.2902448 ]
 [-2.69497083  1.84811919 -1.49151229]
 [-1.7952914   2.12374301 -1.1508306 ]
 [-1.75560557  2.51386286 -0.24218504]
 [-1.4697546   2.10338905  0.63590078]
 [-0.54362811  2.20973637  0.87234576]
 [ 0.03210329  2.07429792  1.60711569]
 [ 0.16354972  2.96337294  1.65646872]
 [ 1.13962828  3.39449894  1.55075026]
 [ 1.20730272  3.34690344  2.51999296]
 [ 1.8544264   2.47151491  2.46632857]
 [ 2.68444076  1.86427845  2.35460878]
 [ 3.08207841  1.9549712   1.29878713]
 [ 2.1003      2.47309981  1.0980004 ]
 [ 1.54345756  2.08738128  1.92930674]
 [ 1.062541    2

#### Example for extracting positions from a directory of .h5 files

In [6]:
import h5py
import numpy as np

output_dir = "C:/Users/arnav/Personal - Arnav Chhajed/Northwestern/genome_organization/examples/arnav/test_output/"

import glob
import re
from pathlib import Path

#It is convenient to extract the positions from the .h5 files along the order of simulation time

# Get all .h5 block files
block_files = glob.glob(str(Path(output_dir) / "blocks_*.h5"))

# Sort block files numerically based on block number
def extract_start_index(filename):
    match = re.search(r'blocks_(\d+)-\d+\.h5', filename)
    return int(match.group(1)) if match else -1

# Sort by the extracted start index
block_files_sorted = sorted(block_files, key=extract_start_index)

#N = number of particles in the simulation
N = 100

#create empty array to store positions
#You can do this many ways, but preallocating is the most efficient
pos_array = np.empty((len(block_files_sorted), N, 3))

for idx, filepath in enumerate(block_files_sorted):
    with h5py.File(filepath, 'r') as f:
        group_name = list(f.keys())[0]
        pos = f[group_name]['pos'][:]
        pos_array[idx] = pos

print(pos_array.shape)

(100, 100, 3)


## TODO For Arnav

Remember that monomer_types corresponds to the positions.

Monomer_types is of shape (100,) [of shape (N,3)]
pos is of shape (N,3)

Monomer_types[0] tells you the type of the first particle
You get the position of the first particle with pos[0,:]

Now that you have the positions of the particles extracted, you can ask questions about the simulation.

For example, what is the radial distribution of each monomer type? (plot the number of particles on y axis versus the distance that particle is from the origin on the x axis)

What is the spatial distribution of particles by type (plot number of particles vs average distance between particles of the same type or various combinations like type A-A distance, type A-B, etc.)
    - If you wanted to plot distribution of A-A distances, calculate the distance between every A-A pair, bin those distances (for example sum number of pairs between 0.5-1 units apart, do this such that every distance can be grouped into a bin), take the resultant distribution (number of particles in each bin versus the distance the bin spans), and plot the distribution

What do these distributions look like for a few different time points (different .h5 files)?
- make different plot for a few h5 files

What does the average distribution look like?
- create distance distribution histograms for each h5 file, don't plot yet, average all histograms, plot the average
