### **Catalog Loading Example**

Notebook for loading DC2 snapshot catalog data using the following 2 catalogs

* **Galaxies:** baseDC2_snapshot_z1.01_v0.1
* **DM Particles:** global/projecta/projectdirs/lsst/groups/CS/cosmoDC2/Outer_snapshots/z1.01/m000.mpicosmo.247

First cell loads galaxy data, second cell loads DM data.

### **Load Galaxy Data (Example 1)**

To load the full galaxy catalog, you need the GCRCatalogs library. There are several ways to load galaxy positions. The best way to avoid memory errors is to use an iterator from the GCRCatalog reader. The first example iterates through all galaxies below a specified magnitude cut and saves the x, y, and z positions as arrays.

In [None]:
import sys
import h5py
genericio_path = (r"/global/u2/s/samgolds/DC2-analysis/contributed/"
                  "nonlinear_bias/genericio/python/")
gcrcatalogs_path = r"/global/u2/s/samgolds/gcr-catalogs/"

sys.path.append(genericio_path)
sys.path.append(gcrcatalogs_path)

import GCRCatalogs
import numpy as np
import pyccl

# Define helper function for keeping track of progress
def progress_bar(cur_val, final_val):
    """ 
    Function to keep track of progress during computations by displaying
    a progress bar

    Parameters:
    cur_val (int/float): current iteration/value calculation is on
    final_val (int/float): final iteration/value that calculation will take
    """

    bar_length = 20
    percent = float(cur_val) / final_val
    arrow = '-' * int(round(percent * bar_length)-1) + '>'
    spaces = ' ' * (bar_length - len(arrow))

    sys.stdout.write("\rProgress: [{0}]"
                     " {1}%".format(arrow + spaces, int(round(percent * 100))))
    sys.stdout.flush()
    
# Initialize galaxy catalog and cosmology
cat_str = "baseDC2_snapshot_z1.01_v0.1"
cat = GCRCatalogs.load_catalog(cat_str)

COSMO, Z_RED_SHIFT = cat.cosmology, cat.redshift
H0 = 71.0 # For conversions between Mpc and Mpc/h


"""------------------------------------------EXAMPLE 1----------------------------------------------------"""
# Load galaxy positions with r < 24.5 using iterator

# Get catalog iterator
cat_vals = cat.get_quantities(["position_x", "position_y", "position_z",  "Mag_true_r_lsst_z0"],
                              return_iterator=True)
mag_cut = 24.5
x_positions = np.array([], dtype=float)
y_positions = np.array([], dtype=float)
z_positions = np.array([], dtype=float)

for data_val in cat_vals:


    r_Mag = data_val["Mag_true_r_lsst_z0"]
    r_mag = r_Mag+cat.cosmology.distmod(Z_RED_SHIFT).value

    # Remove all entries below mag_cut and convert to Mpc/h for conv_coord
    filtered_indices = np.where(r_mag < mag_cut)[0]
    
    x_positions = np.append(x_positions, data_val["position_x"][filtered_indices])
    y_positions = np.append(y_positions, data_val["position_y"][filtered_indices])
    z_positions = np.append(z_positions, data_val["position_z"][filtered_indices])   

### **Load Galaxy Data (Example 2)**

Similar to first example, but saves positions to an HDF file locally. I already have several of the HDF5 files for various limiting r-band magnitudes saved in the following locations, so you won't need to run this unless you want a different limiting magnitude or more than just the positional data.

* **r < 24.5:**  *'/global/cscratch1/sd/samgolds/gal_cat_24_5.h5'*
* **r < 23:**  *'/global/cscratch1/sd/samgolds/gal_cat_23.h5'*
* **r < 21:**  *'/global/cscratch1/sd/samgolds/gal_cat_21.h5'*

In [21]:
"""------------------------------------------EXAMPLE 2----------------------------------------------------"""
# Load galaxy positions with r < 24.5 using iterator and save to HDF file

# Get catalog iterator

mag_cut = 24.5

def get_n_galaxies(mag_cut):
    """
    Function to determine the number of galaxies below a specific magnitude cut to pre-allocate memory for HD5.
    There is probably some better way to avoid having to do this (i.e. dynamically change size of HDF file), but
    this works and doesn't take too long to run on the galaxies.
    """
    
    cat_vals = cat.get_quantities(["position_x", "position_y", "position_z",  "Mag_true_r_lsst_z0"], 
                                  return_iterator=True)
    n_gal = 0
    
    for data_val in cat_vals:
        
        r_Mag = data_val["Mag_true_r_lsst_z0"]
        r_mag = r_Mag+cat.cosmology.distmod(Z_RED_SHIFT).value

        # Remove all entries below mag_cut and convert to Mpc/h for conv_coord
        filtered_indices = np.where(r_mag < mag_cut)[0]

        n_gal += len(filtered_indices)
        
    return n_gal


print("Computing Number of Galaxies Below Mag Cut")
n_gal = get_n_galaxies(mag_cut)

# Open an HDF file and proceed as in example 1, but save results
with h5py.File('/global/cscratch1/sd/samgolds/gal_cat_24_5.h5' , 'w') as ff: 
    
    pos = ff.create_dataset("Position", dtype=("f8"), shape=(n_gal, 3))
    cur_index = 0

    # Get catalog iterator
    cat_vals = cat.get_quantities(["position_x", "position_y", "position_z",  "Mag_true_r_lsst_z0"], 
                                  return_iterator=True)
    
    for data_val in cat_vals:
        
        r_Mag = data_val["Mag_true_r_lsst_z0"]
        r_mag = r_Mag+cat.cosmology.distmod(Z_RED_SHIFT).value

        # Remove all entries below mag_cut
        filtered_indices = np.where(r_mag < mag_cut)[0]

        positional_data = np.vstack((data_val["position_x"][filtered_indices],
                                     data_val["position_y"][filtered_indices], 
                                     data_val["position_z"][filtered_indices])).T
        
        # Write to dataset
        n_pos = len(positional_data)
        pos[cur_index:cur_index+n_pos] = positional_data
        
        cur_index += n_pos
        
        # Update progress
        progress_bar(cur_index, n_gal)

### **Load DM Data**

Explicitly loading the DM particles requires using genericio. The DM catalog is divided into 256 different files which need to be iterated over to load all the data. I load the data to an HDF5 file as before which can be accssed at *'/global/cscratch1/sd/samgolds/dm_cat.h5'*.

In [None]:
import genericio

def load_dark_matter_positions(file_str):
    """
    Loads and applies preliminary filters to dark matter position data in Mpc
    from a specified catalog and then applies a Gaussian kernel

    Parameters:
    file_str (string): location of file to load dark matter particles from
    N (int):
    
    Returns:
    pos_mat (N*N*N array): grid containing number of particles in box 
    """
    
    # Load data from catalog using genericio
    x_data = np.array(genericio.gio_read(file_str, "x")[0])*100/H0
    y_data = np.array(genericio.gio_read(file_str, "y")[0])*100/H0
    z_data = np.array(genericio.gio_read(file_str, "z")[0])*100/H0

    

    return np.vstack((x_data, y_data, z_data)).T

dark_matter_file_str = ("/global/projecta/projectdirs/lsst/groups/CS/"
                        "cosmoDC2/Outer_snapshots/z1.01/m000.mpicosmo.247")

N_part = 10765080312 # Number of total DM particles in the catalog (pre-determined)

with h5py.File('/global/cscratch1/sd/samgolds/dm_cat.h5' , 'w') as ff:    
    
    pos = ff.create_dataset("Position", dtype=("f8"), shape=(10765080312, 3))
    cur_index = 0

    # Iterate through all 256 DM catalog files 
    for i in range(256):

        # Load positions
        positional_data =  load_dark_matter_positions(dark_matter_file_str+"#"+str(i))
        
        # Save positions
        n_pos = positional_data.shape[0]
        pos[cur_index:cur_index+n_pos] = positional_data
        cur_index += n_pos

        # Update progress
        progress_bar(i, 256)

### **Computing Galaxy and Matter Power Spectra Using Loaded Data**

I compute power spectra using nbodykit, which utilizes streaming for h5 files (this is very useful in avoiding memory errors when computing matter cross and auto power spectra as the DM file is very large). It can probably compute 2-point correlation functions as well.

In [None]:
from nbodykit.lab import *
from nbodykit import style

box_size = 3000*100/H0

# Grid size to compute power spectra on
N = 1536

# Location of saved h5 positions for galaxies and DM
gal_cat_str ='/global/cscratch1/sd/samgolds/gal_cat_24_5.h5'
dm_cat_str = '/global/cscratch1/sd/samgolds/dm_cat.h5'


def load_mesh_h5(cat_str, N):
    """Helper function to create nbodykit mesh from h5 file"""
    
    print("Initializing H5 Catalog")
    f =  HDFCatalog(cat_str)
    f.attrs['BoxSize'] = box_size
    
    print("Constructing Mesh")
    return  f.to_mesh(Nmesh=N, compensated=True)    


# Construct galaxy mesh (should be very fast)
mesh_gal = load_mesh_h5(gal_cat_str, N)
mesh_dm = load_mesh_h5(dm_cat_str, N)

# Compute power spectra (for galaxy-galaxy this could take around an hour, for galaxy/matter and matter-matter could take up to 5)
print("Computing Power:")
P_3D_gg = FFTPower(mesh_gal, mode='1d', dk=0.01, kmin=0.001)
P_3D_mm = FFTPower(mesh_dm, mode='1d', dk=0.01, kmin=0.001)
P_3D_mg = FFTPower(mesh_dm, second=mesh_gal, mode='1d', dk=0.01, kmin=0.001)