<font size = 6><h1 align = center>Monte Carlo Fun </h1 ></font>      

<font size = 4><h2 align = center> Derek Sikorski </font><h2>

---
---
---


<font size = 5><h1 align = center>File Summary</h1 ></font>      

**Purpose:** 
This file is meant to handle MCing data used for the Hyperion SMF project. The data broadly includes:
1. COSMOS2020 photometry
2. Assorted ground-based spectroscopy
3. HST grism data

**Outputs:**
This code produces two separate catalogs of MCed redshifts. One is for the COSMOS photometry while the other is for any other is for any other assortment of observations. The logic is that the photoz's are drawn completely randomly for all galaxies in the sample whereas the galaxies with other observations maybe be chosen by a variety of functions. Therefore, it is easiest to simply replace the redshifts of the galaxies with extra observations in place.

**General Logic**


---
---

In [3]:
import numpy as np
from scipy.stats import skewnorm
from scipy.optimize import minimize
import matplotlib.pyplot as plt
from astropy.io import fits
import os
from tqdm.notebook import tqdm

<font size = 5><h1 align = center>MC Functions</h1 ></font>    

---> Define the MC function and my PDF (a skew-normal in this case)

In [4]:
def MCz(niter, zs, weights, z_range, MC_fn, plot_field="", plot_zrange="", verbose=False, **kwargs):
    """
    Performs a Monte Carlo on the redshift distribution of input galaxies

    INPUTS:
        - niter (int)   = Number of MC iterations to run
        - zs (array)    = List of median redshift values
        - weights (array)   = List of the MC weights for each object. A new z is drawn from the PDF if random_number >= weight
        - z_range (array)   = Range of redshifts to keep
        - MC_fun (fn)   = Python function used to generate the new redshift values for the galaxies
        - plot_field (str)    = Path to the directory where plots should be saved. If left as "", then no plots are saved
        - plot_zrange (str)    = Path to the directory where plots of galaxies in z_range should be saved.
        - verbose (bool)    = If you want to print the status bar via tqdm.notebook
        - **kwargs = For the MC_fun
        
    OUTPUTS:
        - (array) --> Indices in redshift array of objects falling in z_range at least once
        - (array) --> 2D array of redshifts of shape (len(zs), niter)
        - (array) --> 2D array of of indices which failed the MC draw (i.e. new z was drawn)
    """
    ## Setup for MC
    new_zs = [] # Fill with new redshifts
    iterable = tqdm(range(niter)) if verbose else range(niter)      # What to iterate over in for-loop based on 'verbose' option
    drawn_idxs = [] # Fill with idxs that were drawn

    ## Run the MC iterations
    for n in iterable:
        z_in = np.copy(zs)      # Copy of redshifts to manipulate

        ## Pick galaxies to get new z's and change where need
        new_idxs = np.where( np.random.random(size=len(zs)) >= weights )    # Which z's to change
        nz = MC_fn(*kwargs.values())            # Pick new z's
        z_in[new_idxs] = nz[new_idxs]    # Replace redshifts where needed

        new_zs.append(z_in)      # Add to list of redshifts   
        drawn_idxs.append(new_idxs)

    new_zs = np.array(new_zs)   # MCed redshifts

    ## Find which galaxies fell in z-range at least once ###
    z_bool = ((z_range[0]< new_zs) & (new_zs < z_range[1])).any(axis=0)
    good_idxs = np.where(z_bool)[0]     # Where the condition is met

    return good_idxs, new_zs.transpose(), drawn_idxs

In [5]:
def my_PDF(alpha, omega, loc):
    """
    PDFs to draw the new redshifts from. Skewed-normal based on the confidence interval from COSMOS2020

    INPUTS:
        - alpha (array)    = shape parameters
        - omega (array)   = scale parameters
        - loc (array)   = location parameters
    OUTPUTS:
        - (array)   = New redshift values.
    """
    z_vals = skewnorm.rvs(a=alpha, loc=loc, scale=omega) # Find new zs based on skew-normal
    return z_vals

----
---
---

<font size = 5><h1 align = center>Fit Skew-Normal Distributions</h1 ></font>      

---> The first step is to fit skew-normal distributions to each of the COSMOS2020 objects that I want. So, I need to first define the functions for this

In [6]:
def costFn(params, z_params):
    """Calculate the residual based on the area between the 16th/84th percentile, and difference between median"""
    # Unpack the parameters
    alpha, omega, loc = params
    
    # Create the skew-normal distribution
    dist = skewnorm(alpha, scale=omega, loc=loc)

    # Calculate the CDF at points 16th, 50th, and 84th percentiles
    cdf_16, cdf_50, cdf_84 = dist.cdf(z_params[1]),  dist.cdf(z_params[0]), dist.cdf(z_params[2])
    
    # The areas we want to match
    area_left = cdf_50 - cdf_16
    area_right = cdf_84 - cdf_50
    
    # Objective is to make both areas equal to 0.34
    return (area_left - 0.34)**2 + (area_right - 0.34)**2 + (dist.median()-z_params[0])**2

In [7]:
def fitDist(zmed, l68, u68, plot_path = "", cid = None):
    """Fit for skew-normal parameters based on the 3-point redshift statistics"""

    wi = 2/3*(u68 - l68)    # Initial omega
    var = ((zmed-l68)**2 + (u68-zmed)**2)/2     # variance estimate

    # Estimate location and shape based on which area which std dev is smaller
    if (zmed - l68) < (u68-zmed): li, ai = l68 -wi/4, 50*var**0.5       # upper error is larger
    else:   li, ai =  u68 + wi/4, -50*var**0.5          # lower error is larger

    # Minimize the objective function
    result = minimize(costFn, [ai, wi, li], tol=1e-14, options={'maxiter':100},
                    args=([zmed, l68, u68]), bounds = ((-12, 12), (0,5), (0,7)))
    
    af, wf, lf = result.x   # Optimized parameters
    residual = costFn(result.x, [zmed, l68, u68])

    if plot_path != "": fit_Plot(af, wf, lf, zmed, l68, u68, plot_path, cid)

    return af, wf, lf, residual

In [8]:
def fit_Plot(af, wf, lf, zmed, l68, u68, plot_path, cid):
    """Plot for the fitting function above"""
    dist = skewnorm(af, scale=wf, loc=lf)
    rvs = dist.rvs(10000)

    delta =af / np.sqrt(1 +af**2)
    gamma = (4-np.pi)/2 * (delta*np.sqrt(np.pi/2))**3 / (1-2*delta**2/np.pi)**1.5
    mu = dist.mean()
    final_med = dist.median()

    # Final skew-normal distribution
    p_l68 = dist.cdf(zmed) - dist.cdf(l68)
    p_u68 = dist.cdf(u68) - dist.cdf(zmed)
    xs = np.linspace(zmed-3*(zmed-l68), zmed+3*(u68-zmed), 1000)
    plt.figure(figsize=(7, 5))
    plt.title(rf"C20 ID = {int(cid)}    $\alpha$={round(af, 2)}       $\omega$={round(wf, 2)}      $\xi$={round(lf, 2)}    $\gamma$={round(gamma, 2)}")
    plt.hist(rvs, density=True, bins=100, label=f"N = {10000} draws", alpha=0.75)
    plt.plot(xs, dist.pdf(xs))
    plt.vlines(mu, ymin=0, ymax=2, color='red', label=f"Mean = {round(mu, 3)}")
    plt.vlines(final_med, ymin=0, ymax=2, color='orange', label=f"Median = {round(final_med, 3)}    ({zmed})")
    plt.vlines(l68, ymin=0, ymax=1, color='g', label=f"Lower-bound  ({round(p_l68,3)})")
    plt.vlines(u68, ymin=0, ymax=1, color='m', label=f"Upper-bound  ({round(p_u68, 3)})")
    plt.legend()
    plt.savefig(plot_path)
    plt.close()

Great, now we're ready to actually fit this to the COSMOS data. First, we need to trim the data down to only those galaxies we're interested in.

We start by trimming galaxies that are missing one of the 3 three-point stats

In [13]:
## READ IN COSMOS DATA ##

cosmos_file = fits.open(r"C:/Users/sikor/OneDrive/Desktop/BigData/COSMOS2020/COSMOS2020_CLASSIC_R1_v2.0.fits")
c20p = cosmos_file[1].data

## FIND BAD GALAXIES ##
bad_ids = np.where((np.isnan(c20p["lp_zPDF"]) == True) |        # No redshift from lephare
                   (np.isnan(c20p["lp_zPDF_l68"]) == True) |    # No lower-68-percentile from lephare
                   (np.isnan(c20p["lp_zPDF_u68"]) == True))[0]  # no upper-68-percentile from lephare

c20p_cut = np.delete(c20p, bad_ids)

print(f"Number of galaxies = {len(c20p)}")
print(f"Number of bad galaxies = {len(bad_ids)}")

Number of galaxies = 1720700
Number of bad galaxies = 19258


Next, let's cut based on RA, Dec, LP-type, and IRAC Magnitude

In [14]:
## CUT DATA TO POTENTIALLY USABLE ##
ra_range = (149.6, 150.52)  
dec_range = (1.74, 2.73)
IRAC_cut = 25.4


g_idxs = np.where((c20p_cut["ALPHA_J2000"] >= ra_range[0]) & (c20p_cut["ALPHA_J2000"] <= ra_range[1])       # RA
                & (c20p_cut["DELTA_J2000"] >= dec_range[0]) & (c20p_cut["DELTA_J2000"] <= dec_range[1])     # DEC
                & ((c20p_cut["IRAC_CH1_MAG"] <= IRAC_cut) | (c20p_cut["IRAC_CH2_MAG"] <= IRAC_cut)) # IRAC
                & ((c20p_cut["lp_type"] == 0) | (c20p_cut["lp_type"] == 2)))        # LePhare type

g_c20p = c20p_cut[g_idxs]

print(f"Number of galaxies = {len(g_c20p)}")
print(f"Number of bad galaxies = {len(c20p_cut) - len(g_c20p)}")

Number of galaxies = 287363
Number of bad galaxies = 1414079


Finally, we make one more cut based on how close the 16th and 84th percentile of the redshift PDF is to our redshift range of interest. In our case, we care about $2\leq z \leq 3$, so we cut based on that.

Specifically, we want to include galaxies which are 3 times $\sigma_-$ above $z=3$, or 3 times $\sigma_+$ below $z=2$.

In [14]:
## CUT BASED ON REDSHIFT PROXIMATEY ##
z_min, z_max = 2, 3     # Redshift range
n_sig = 3           # Number of sigma acceptable

med, l68, u68 = g_c20p["lp_zPDF"], g_c20p["lp_zPDF_l68"], g_c20p["lp_zPDF_u68"]
sig_l, sig_u = med - l68, u68 - med

in_zrange = np.where(( (med<=z_max) & (u68+n_sig*sig_u >=z_min) ) | ( (med>=z_min) & (l68-n_sig*sig_l <= z_max) )  )

small_c20p = g_c20p[in_zrange]

print("Number of remaining galaxies = ", len(small_c20p))

Number of remaining galaxies =  99999


Great. We have our sample of galaxies. Let's go ahead and do our fits now:

In [19]:
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) 
# save_fits = np.zeros((small_c20p.shape[0], 5))
save_fits = np.load('zFits/zFits.npy')

for idx in tqdm(range(small_c20p.shape[0])):

    c_index = small_c20p["ID"][idx]

    if np.any(save_fits[:,0] == c_index):
        continue
    else:
        zm, l68, u68 = small_c20p["lp_zPDF"][idx], small_c20p["lp_zPDF_l68"][idx], small_c20p["lp_zPDF_u68"][idx]
        a, w, l, r = fitDist(zm, l68, u68)

        save_fits = np.append(save_fits, np.array([[c_index, a, w, l, r]]), axis=0)

# np.save(f"zFits/zFits.npy", save_fits)

  0%|          | 0/99999 [00:00<?, ?it/s]

---
---
---


<font size = 5><h1 align = center>Generate Only photoz catalogs</h1 ></font>      

---> I want to make MC catalogs for just the COSMOS catalog. So, for each iteration, I simply draw from the p(z) for that photometric object in the COSMOS catalog. Later on, I can replace these when needed

First, I need to find the C20 Galaxies with good skew-normal fits. This is based on how far the median of the skew-normal is from the median redshift reported by LePhare

In [21]:
### FIND GOOD FITS ###
all_fits = np.load("zFits/zFits.npy")
max_sep = 0.1   # Max allowable redshift separation

# Pack up the separations
med_diffs = []
for f in tqdm(all_fits):
    skn = skewnorm(f[1], scale=f[2], loc=f[3])
    z_med = c20p["lp_zPDF"][int(f[0])-1]
    med_diffs.append(skn.median() - z_med) 

# Find bad separations and delete
b_fits = np.where(np.abs(med_diffs) > max_sep)[0]    
g_fits = np.delete(all_fits, b_fits, axis=0)

print("Number of good fits = ", len(g_fits))
print("Number of bad fits = ", len(b_fits))

Number of good fits =  70049
Number of bad fits =  234


I can now MC the photometric sources by just drawing random values from the skew-normal

In [26]:
#### RUN THE MC ####
# ========================================================
# ========================================================
for run in range(1):
    niter = 100      # Number of iterations

    z_range = [2,3]         # Redshift range for 

    phot_med = c20p["lp_zPDF"][g_fits[:,0].astype(int)-1]

    # ========================================================
    # ========================================================
    # ========================================================

    ## MC ##
    phot_ids, new_pzs = MCz(niter ,phot_med, np.zeros(len(g_fits)), z_range, my_PDF,
                         verbose=True, alpha=g_fits[:,1], omega=g_fits[:,2], loc=g_fits[:,3])

    ## Update bad galaxies ##

    ## WRITE TO RESULT FILE ##

    # Update dtypes
    dtypes = [c20p.dtype.descr[0]] + [(f"MC_iter{n}", ">f8") for n in range(niter)]

    # Make array to fill
    write_arr = np.zeros(shape=(len(g_fits)), dtype=dtypes)

    write_arr["ID"] = c20p["ID"][g_fits[:,0].astype(int)-1]
    for n in range(niter):
        write_arr[f"MC_iter{n}"] = new_pzs[:,n]

    np.save(rf"C:/Users/sikor/OneDrive/Desktop/BigData/COSMOS2020/C20_MC_100_{run}.npy", write_arr)

  0%|          | 0/100 [00:00<?, ?it/s]

Sweet, we now have our MC iterations for the photometric sources

---
---
---

<font size = 6><h1 align = center>Spectroscopy</h1 ></font>      

---> Now, I'll run MCs for the spectra (ground-based and grism). However, some objects are repeated in both spec-catalogs, so I need to first pull out those objects as they are handled differently. For now, I'll just load in the data...

For each object, I still need to know if the object has a well-defined p(z) to draw from in the COSMOS2020 catalog, or if it is in the catalog in the first place. If it's not a COSMOS object, I can't use it because it has no photometry (thus no physical properties). If it's missing the p(z), it's not useful for the MC process.

In [9]:
# Load Photoz's to check p(z) for spectra
cosmos_file = fits.open(r"C:/Users/sikor/OneDrive/Desktop/BigData/COSMOS2020/COSMOS2020_CLASSIC_R1_v2.0.fits")
c20p = cosmos_file[1].data

In [10]:
### LOAD SPECTRA ###

# GROUND-BASED (GB)
specz_cat = np.loadtxt("./Data/master_specz_COSMOS_BF_v4b.cat", dtype=object)   # Load in the data
# Fix up the formatting for the spec data-file:
new_array = []
for idx in range(specz_cat.shape[1]):
    try:
        col = specz_cat[:,idx].astype(np.float32)
    except:
        col = specz_cat[:,idx]
    new_array.append(col)

c20s = np.array(new_array, dtype=object)
c20s = np.transpose(c20s)

print(f"Number of C20 spectra: {c20s.shape[0]}")

# ----------------------------------------------------------------------
# ----------------------------------------------------------------------

# GRISM
# Need blended flags from griz_cat
griz_cat = np.loadtxt("./Data/HST_Hyp_zcat.v1.2.cat",  usecols=range(16), dtype=object)   # Load in the data
new_array = []
for idx in range(griz_cat.shape[1]):
    try:
        col = griz_cat[:,idx].astype(np.float32)
    except:
        col = griz_cat[:,idx]
    new_array.append(col)

griz = np.array(new_array, dtype=object)
griz = np.transpose(griz)

print(f"Number of Grism redshifts: {griz.shape[0]}")

Number of C20 spectra: 42776
Number of Grism redshifts: 12764


We now check:
- Are the objects in COSMOS
- Do the objects have IRAC1 or IRAC2 <= the irac cut
- Do the objects have well-defined redshift PDFs in COSMOS

In [15]:
## MAKE MAG CUT AND CHECK MASS ##
    # Find idx in original C20 cat and cut based on mag
    # Check if in COSMOS catalog

## GB

spec_cids = c20s[:,0].astype(int) - 1   
spec_gals = c20p[spec_cids]
g_spec = np.where(  ((spec_gals["IRAC_CH1_MAG"] <= IRAC_cut) | (spec_gals["IRAC_CH2_MAG"] <= IRAC_cut))     # IRAC
                  & (c20s[:,0] > 0)        # Is in COSMOS
                 & (spec_gals['lp_zPDF'] == spec_gals['lp_zPDF'])       # Median is defined
                 & (spec_gals['lp_zPDF_l68'] == spec_gals['lp_zPDF_l68'])   # l68 is defined
                 & (spec_gals['lp_zPDF_u68'] == spec_gals['lp_zPDF_u68'])  )    # upper 68 is defined

c20s = c20s[g_spec]

## GRISM

griz_cids = griz[:,4].astype(int)  -1
griz_gals = c20p[griz_cids]
g_griz = np.where(((griz_gals["IRAC_CH1_MAG"] <= IRAC_cut) | (griz_gals["IRAC_CH2_MAG"] <= IRAC_cut))
                  & (griz[:,4] > 0)        # Is in COSMOS
                 & (griz_gals['lp_zPDF'] == griz_gals['lp_zPDF'])       # Median is defined
                 & (griz_gals['lp_zPDF_l68'] == griz_gals['lp_zPDF_l68'])   # l68 is defined
                 & (griz_gals['lp_zPDF_u68'] == griz_gals['lp_zPDF_u68']) ) # upper 68 is defined
griz = griz[g_griz]

print("POST CUT")
print(f"Number of C20 spectra: {c20s.shape[0]}")
print(f"Number of Grism redshifts: {griz.shape[0]}")

POST CUT
Number of C20 spectra: 37976
Number of Grism redshifts: 11856


<font size = 5><h1 align = center>Finding common objects</h1 ></font>      

---> Some objects have an associated ground-based and grism-based redshift. I need to handle these differently for the MC, so I'll separate those out first.

In [16]:
## FIND COMMON OBJECTS ##
gids = []       # idx in grism catalog of common object
sids = []       # idx in spectrum catalog of common object 

sim_objs = []   # Keep track of info of the object for MC use --> [C20_ID, zs, qf_s, zg, qf_gz]

for g_id, c_id in enumerate(griz[:,4].astype(int)):
    if c_id > 0:    # First, make sure it's a cosmos object
        t = np.where(c_id == c20s[:,0])[0] # Find which objects in the spec-catalog have the cosmos id

        if len(t) != 0: # Object is found in both
            for ts in t:
                gids.append(g_id)   # Add grism id
                sids.append(ts)   # Add spec id
                sim_objs.append([c_id, griz[g_id][10], griz[g_id][11], griz[g_id][13], griz[g_id][15]])  # Useful data for later


# Create unique catalogs
spec_unique = np.delete(c20s, sids, axis=0)     
griz_unique = np.delete(griz, gids, axis=0)
sim_objs = np.array(sim_objs, dtype=float)

print(f"Unique Grism Objects = {len(griz_unique)}")
print(f"Unique Spec Objects = {len(spec_unique)}")
print(f"Common Objects = {len(sim_objs)}")

Unique Grism Objects = 9267
Unique Spec Objects = 35428
Common Objects = 2589


<font size = 5><h1 align = center>Run MC for Spectra</h1 ></font>      

---> So, we now have three classes of spectra to run an MC for. We have unique GB spectra, unique Grism spectra, and ones that have a mix

In [17]:
## PREP STORAGE ARRAY ## 
niter = 100

dtypes = [c20p.dtype.descr[0]] + [(f"MC_iter{n}", ">f8") for n in range(niter)]

# Make array to fill
spec_mc = np.zeros(shape=(len(spec_unique) + len(griz_unique) + len(sim_objs)), dtype=dtypes)

which_z = np.zeros(shape=(len(spec_unique) + len(griz_unique) + len(sim_objs)), dtype=dtypes)

In [18]:
### Narrow down the spec-targets ###

qfs = spec_unique[:,13] % 10        # Find last digit of qf

spec_use_idxs = np.where( (ra_range[0]<= spec_unique[:,4]) & (spec_unique[:,4] <= ra_range[1])          # RA check
                 & (dec_range[0] <= spec_unique[:,6]) & (spec_unique[:,6] <= dec_range[1])      # Dec check
                 & (  ((qfs >=2.)&(qfs<3.))  |  ((qfs>=9.)&(qfs<10.)) | ((qfs>=3.)&(qfs<5.)))   )[0] # QF check         


spec_use = spec_unique[spec_use_idxs]     # Trim the spec catalog to only include galaxies I care about
print("Good spectra: ", len(spec_use))

Good spectra:  17934


In [48]:
### Create small catalog of spec/ grism fits ###
s_fits = []

for s_id, s in tqdm(enumerate(spec_use), total=len(spec_use)):
    c_id = int(s[0])
    
    # Check if already been fit
    f_ids = np.where(g_fits[:,0].astype(int) == c_id)[0]
    if len(f_ids) != 0:    # Already been fit
        s_fits.append(g_fits[f_ids[0]])
        continue
    
    # Hasn't been fit
    else:
        a, w, l, r = fitDist(c20p["lp_zPDF"][c_id-1], c20p["lp_zPDF_l68"][c_id-1], c20p["lp_zPDF_u68"][c_id-1])
        #Check to see if fit converged
        f_med = skewnorm(a,scale=w,loc=l).median()
        if np.abs(f_med - c20p["lp_zPDF"][c_id-1] ) < 0.1:
            s_fits.append([c_id, a,w,l,r])
        else:
            continue

s_fits = np.array(s_fits)
np.save("zFits/sFits.npy", s_fits)

print("Number of spectra = ", len(spec_use))
print("Number of converged fits = ", len(s_fits))

  0%|          | 0/17934 [00:00<?, ?it/s]

Number of spectra =  17934
Number of converged fits =  17924


In [19]:
### Pack the fits for use ###
s_params = []
b_params = []
s_fits = np.load("zFits/sFits.npy")
for idx, c_id in enumerate(spec_use[:,0]):
    idxs = np.where(s_fits[:,0] == c_id)[0]
    if len(idxs) != 0:
        i = idxs[0]
        s_params.append([s_fits[i][1],s_fits[i][2], s_fits[i][3]])
    else :
        b_params.append(idx)
    

s_params = np.array(s_params)
final_spec = np.delete(spec_use, b_params, axis=0)

print(final_spec.shape)

(17924, 32)


In [29]:
#### RUN THE MC ####
# ========================================================
# ========================================================
z_range = [2,3]         # Redshift range for 


spec_z = final_spec[:,11]         # orginal spec-z

# Set the MC weights based on the quality flags
qfs = final_spec[:,13] % 10      
spec_weights = np.select( [(qfs >=2.)&(qfs<3.),(qfs>=9.)&(qfs<10.), (qfs>=3.)&(qfs<5.) ],
                [0.7, 0.7, 0.993],
                default=0)
# ========================================================
# ========================================================
# ========================================================

## MC ##
spec_ids, new_szs, drawn_idxs = MCz(niter, spec_z, spec_weights, z_range, my_PDF,
                     verbose=True, alpha = s_params[:,0], omega = s_params[:,1], loc = s_params[:,2])

## WRITE TO RESULT FILE ##

# Update dtypes
dtypes = [c20p.dtype.descr[0]] + [(f"MC_iter{n}", ">f8") for n in range(niter)]

# Make array to fill
write_arr = np.zeros(shape=(len(final_spec)), dtype=dtypes)

write_arr["ID"] = final_spec[:,0]
for n in range(niter):
    write_arr[f"MC_iter{n}"] = new_szs[:,n]
    new_ids = drawn_idxs[n]

    which_z[f"MC_iter{n}"][:len(new_szs)] = np.ones(len(new_szs))
    which_z[f"MC_iter{n}"][:len(new_szs)][new_ids] = np.zeros(len(new_ids))

np.save(r"C:/Users/sikor/OneDrive/Desktop/BigData/COSMOS2020/C20_spec_MC_100.npy", write_arr)

spec_mc["ID"][:len(new_szs)] = final_spec[:,0]     # update with cosmos IDs
which_z[f"ID"][:len(new_szs)] = final_spec[:,0]


for n in range(niter):
    spec_mc[f"MC_iter{n}"][:len(new_szs)] = new_szs[:,n]

  0%|          | 0/100 [00:00<?, ?it/s]

In [21]:
### Narrow down the griz-targets ###

qfs = griz_unique[:,15]  

griz_use_idxs = np.where( (ra_range[0]<= griz_unique[:,1]) & (griz_unique[:,1] <= ra_range[1])          # RA check
                 & (dec_range[0] <= griz_unique[:,2]) & (griz_unique[:,2] <= dec_range[1])      # Dec check
                 & (  (qfs==3)  |  (qfs==4) | (qfs==5) )        # QF check
                 & (griz_unique[:,5]== 0))[0]          # Not a blended object

griz_use = griz_unique[griz_use_idxs]     # Trim the spec catalog to only include galaxies I care about
print("Good Grism spectra: ", len(griz_use))

Good Grism spectra:  3768


In [55]:
### Create small catalog of spec/ grism fits ###
griz_fits = []

for g_id, g in tqdm(enumerate(griz_use), total=len(griz_use)):
    g_id = int(g[4])
    
    # Check if already been fit
    f_ids = np.where(g_fits[:,0].astype(int) == g_id)[0]
    if len(f_ids) != 0:    # Already been fit
        griz_fits.append(g_fits[f_ids[0]])
        continue
    
    # Hasn't been fit
    else:
        a, w, l, r = fitDist(c20p["lp_zPDF"][g_id-1], c20p["lp_zPDF_l68"][g_id-1], c20p["lp_zPDF_u68"][g_id-1])
        #Check to see if fit converged
        f_med = skewnorm(a,scale=w,loc=l).median()
        if np.abs(f_med - c20p["lp_zPDF"][g_id-1] ) < 0.1:
            griz_fits.append([g_id, a,w,l,r])
        else:
            continue

griz_fits = np.array(griz_fits)
np.save("zFits/gFits.npy", griz_fits)

print("Number of spectra = ", len(griz_use))
print("Number of converged fits = ", len(griz_fits))

  0%|          | 0/3768 [00:00<?, ?it/s]

Number of spectra =  3768
Number of converged fits =  3767


In [22]:
### Pack the fits for use ###
g_params = []
b_params = []
griz_fits = np.load("zFits/gFits.npy")

for idx, c_id in enumerate(griz_use[:,4]):
    idxs = np.where(griz_fits[:,0] == c_id)[0]
    if len(idxs) != 0:
        i = idxs[0]
        g_params.append([griz_fits[i][1],griz_fits[i][2], griz_fits[i][3]])
    else :
        b_params.append(idx)
    

g_params = np.array(g_params)
final_griz = np.delete(griz_use, b_params, axis=0)
print(final_griz.shape)

(3767, 16)


In [45]:
#### RUN THE MC ####
# ========================================================
# ========================================================
z_range = [2,3]         # Redshift range for 


griz_z = final_griz[:,13].astype(float)         # orginal spec-z
griz_width = 46/14100*(1+griz_z)       # Width of the normal distribution to draw from



# Set the MC weights based on the quality flags
qfs = final_griz[:,-1]  
griz_weights = np.select( [qfs==5, qfs==4, qfs==3 ],
                [0.925, 0.818, 0.668],
                default=0)

# ========================================================
# ========================================================
# ========================================================
spec_mc["ID"][len(new_szs):len(new_szs) + len(final_griz)] = final_griz[:,4]     # update with cosmos IDs
which_z["ID"][len(new_szs):len(new_szs) + len(final_griz)] = final_griz[:,4]

for n in tqdm(range(niter)):

    gzs = np.random.normal(griz_z, griz_width)

    ## MC ##
    griz_ids, new_g, new_ids = MCz(1, gzs, griz_weights, z_range, my_PDF, 
                        verbose=False, alpha = griz_fits[:,1], omega = griz_fits[:,2], loc = griz_fits[:,3])
    
    new_gzs = new_g.flatten()
    
    spec_mc[f"MC_iter{n}"][len(new_szs):len(new_szs) + len(final_griz)] = new_gzs
    
    which_z[f"MC_iter{n}"][len(new_szs):len(new_szs) + len(final_griz)] = 2*np.ones(len(new_ids))
    which_z[f"MC_iter{n}"][len(new_szs):len(new_szs) + len(final_griz)][new_ids] = np.zeros(len(new_ids))




# WRITE TO RESULT FILE ##

# # Update dtypes
# dtypes = [c20p.dtype.descr[0]] + [(f"MC_iter{n}", ">f8") for n in range(niter)]

# # Make array to fill
# write_arr = np.zeros(shape=(len(griz)), dtype=dtypes)

# write_arr["ID"] = griz[:,0]
# for n in range(niter):
#     write_arr[f"MC_iter{n}"] = new_gzs[:,n]

# np.save(r"C:/Users/sikor/OneDrive/Desktop/BigData/COSMOS2020/grizli_MC_1000.npy", write_arr)

  0%|          | 0/100 [00:00<?, ?it/s]

In [81]:
## Pack parameters of spectra from cosmos catalog ##
bad_com = [] # where parameters have a nan
com_fits = []
for idx, c_id in tqdm(enumerate(sim_objs[:,0].astype(int)), total=len(sim_objs[:,0]), ):

    ra, dec = c20p["ALPHA_J2000"][c_id-1], c20p["DELTA_J2000"][c_id-1]
    if (ra >= ra_range[1]) or (ra <= ra_range[0]) or (dec >= dec_range[1]) or (dec<= dec_range[0]):
        continue

    if c_id == -99:
        continue
    
    # Check if already been fit
    f_ids = np.where(g_fits[:,0].astype(int) == c_id)[0]
    if len(f_ids) != 0:    # Already been fit
        com_fits.append(g_fits[f_ids[0]])
        continue
    
    # Hasn't been fit
    else:
        a, w, l, r = fitDist(c20p["lp_zPDF"][c_id-1], c20p["lp_zPDF_l68"][c_id-1], c20p["lp_zPDF_u68"][c_id-1])
        #Check to see if fit converged
        f_med = skewnorm(a,scale=w,loc=l).median()
        if np.abs(f_med - c20p["lp_zPDF"][c_id-1] ) < 0.1:
            com_fits.append([c_id, a,w,l,r])
        else:
            continue


com_fits = np.array(com_fits)
bad_com = np.array(bad_com)
np.save("zFits/cFits.npy", com_fits)

print("Number of spectra = ", len(sim_objs))
print("Number of converged fits = ", len(com_fits))

  0%|          | 0/2589 [00:00<?, ?it/s]

Number of spectra =  2589
Number of converged fits =  2584


In [24]:
### Pack the fits for use ###
g_params = []
b_params = []
com_fits = np.load("zFits/cFits.npy")


for idx, c_id in enumerate(sim_objs[:,0].astype(float).astype(int)):
    idxs = np.where(com_fits[:,0] == c_id)[0]
    if len(idxs) != 0:
        i = idxs[0]
        g_params.append([com_fits[i][1],com_fits[i][2], com_fits[i][3]])
    else :
        b_params.append(idx)
    

g_params = np.array(g_params)
final_c = np.delete(sim_objs, b_params, axis=0)
print(final_c.shape)

(2584, 5)


In [73]:
#### RUN THE MC ####
# ========================================================
# ========================================================
z_range = [2,3]         # Redshift range for 


## Weights

# Spectra weights
qfs = final_c[:,2] % 10      # 
spec_weights = np.select( [(qfs >=2.)&(qfs<3.),(qfs>=9.)&(qfs<10.), (qfs>=3.)&(qfs<5.) ],
                [0.7, 0.7, 0.993],
                default=0)

# Grizli weights 
qfg = final_c[:,-1]  
griz_weights = np.select( [qfg==5, qfg==4, qfg==3 ],
                [0.925, 0.818, 0.668],
                default=0)

# Combine
sim_weights = np.c_[spec_weights, griz_weights]

# Keep track of which flag is higher
max_id = np.argmax(sim_weights, axis=1)

# Sort the weights
sim_weights = np.sort(sim_weights, axis=1)


spec_mc["ID"][len(new_szs) + len(final_griz):len(new_szs) + len(final_griz) + len(final_c)] = final_c[:,0]     # update with cosmos IDs

which_z["ID"][len(new_szs) + len(final_griz):len(new_szs) + len(final_griz) + len(final_c)] = final_c[:,0]     # update with cosmos IDs

# ========================================================
# ========================================================
# ========================================================

for n in tqdm(range(niter)):

    # Draw random number for each object:
    mc_rns = np.random.random(size=len(sim_weights))

    # Choose specz, griz, or photoz
    z_choice = []   
    for rn_idx, rn in enumerate(mc_rns):
        sw = sim_weights[rn_idx]    # weights for this spectrum

        if rn < sw[1]: 
            z_choice.append(max_id[rn_idx]) # Choose better spectrum

        elif (rn >=sw[1]) and (rn < sw[1]+sw[0]*(1-sw[1])):
            z_choice.append(int(not(max_id[rn_idx])))    # Choose worse spectrum

        else:
            z_choice.append(2)  # Choose photoz

    z_choice = np.array(z_choice)
    
    # Make random grism redshifts
    g_rand = np.random.normal(final_c[:,3], 46/14100*(1+final_c[:,3]) )

    # Pick which redshift to use
    z_meds = np.select([z_choice == 0, z_choice == 1, z_choice == 2], 
                       [final_c[:,1],  g_rand, 2])
    
    # For "which_z" storage only
    wz = np.select([z_choice==0, z_choice==1, z_choice==2],
                   [1, 2, 0])
    
    # Assign weights
    ws = [0 if zi == 2 else 1 for zi in z_choice]


    ## MC ##
    _, new_sim , _ = MCz(1, z_meds, ws, z_range, my_PDF,
                       verbose=False, alpha = g_params[:,0], omega = g_params[:,1], loc = g_params[:,2])
    
    new_simz = new_sim.flatten()
    # new_simz[bad_com] = -99
    
    spec_mc[f"MC_iter{n}"][len(new_szs) + len(final_griz):len(new_szs) + len(final_griz) + len(final_c)] = new_simz
    which_z[f"MC_iter{n}"][len(new_szs) + len(final_griz):len(new_szs) + len(final_griz) + len(final_c)] = wz


  0%|          | 0/100 [00:00<?, ?it/s]

In [76]:
spec_mc = spec_mc[:len(new_szs) + len(final_griz) + len(final_c)]
which_z = which_z[:len(new_szs) + len(final_griz) + len(final_c)]

In [77]:
np.save(r"C:/Users/sikor/OneDrive/Desktop/BigData/COSMOS2020/MC_spec.npy", spec_mc)
np.save(r"C:/Users/sikor/OneDrive/Desktop/BigData/COSMOS2020/MC_which.npy", which_z)