## **01.a Building Roman-Rubin simulation with Y1 error**
#### Authors: **Amanda Farias (afariassantos2@gmail.com), Iago Lopes (iagolops2012@gmail.com), Bruno Moraes (bruno.a.l.moraes@gmail.com)**
#### Creation date: **08/09/2024**
#### Last Verifed to Run: **11/21/2024** (by @iago)


The objective of this notebook is to load data from the NERSC and incorporate simulated errors for Y1 into this dataset for project "**Investigating cosmological biases due to errors in the shape of observed tomographic redshift distributions** [[393]](https://lsstdesc.slac.stanford.edu/DESCPub/app/PB/show_project?pid=393)". We use the *photerr* code found in this [github repo](https://github.com/jfcrenshaw/photerr/tree/main) which came from the [DESC project 00165](https://portal.lsstdesc.org/DESCPub/app/PB/pub/show_publication?pubid=165) and is based on the Science Drivers to Reference Design and Anticipated Data Products (see: __[ArXiv:0805.2366](https://arxiv.org/abs/0805.2366)__ for details. For extended sources, the calculation is based on [van den Busch et al.(2020)](https://ui.adsabs.harvard.edu/abs/2020A%26A...642A.200V/abstract) and [Kuĳken et al. (2019)](https://www.aanda.org/articles/aa/full_html/2019/05/aa34918-18/aa34918-18.html).
$~$
##### Logistics: This notebook is intended to be run through the Jupyter Lab NERSC interface available in __[Jupyter nersc](https://jupyter.nersc.gov/)__.

#### Setting Up the Environment
To run these notebooks, you need to use an existing environment or create a new one.
<br> $~$
##### Creating a New Environment
If you need to create a new environment, use the following command: <br>
``conda create --name <environment-name>``<br>
$~$
##### Enabling the Conda Environment as a Python Kernel
Now, we'll enable the conda environment as a python kernel. <br>
``conda activate <environment-name>`` <br>
``python -m ipykernel install --user --name <environment-name> --display-name rail`` <br>
$~$
##### RAIL installation based on __[RAIL read the docs](https://rail-hub.readthedocs.io/en/latest/source/installation.html)__
``git clone https://github.com/LSSTDESC/rail.git`` <br>
``cd rail`` <br>
``conda activate [env]`` <br>
``conda install -n [env] -c conda-forge --file conda-reqs.txt`` <br>
``pip install .[algos]``

In [None]:
import sys
import tables_io
import numpy as np
import pandas as pd
from tqdm import tqdm
import scipy.interpolate
import matplotlib.pyplot as plt
from photerr import LsstErrorModel

%matplotlib inline

<div class="alert alert-block alert-warning">
<b>ATTENTION:</b> In order to run this notebook on your NERSC account, please update the kernel to use your username.
</div> 

In [None]:
nersc_name = 'iago'

In [None]:
run = True # if you want to run the calculations
save_file = True # if you want to save the .csv file
path_file = "/pscratch/sd"+ nersc_name[0] + "/" + nersc_name + "/roman_rubin.csv"  # output path

In [None]:
path_origin_data = (
    "/global/cfs/cdirs/lsst/projecta/lsst/groups/CS/roman_rubin_2023_v1.1.3"
)

pixels = [
    10050,10051,10052,10053,10177,10178,10179,
    10180,10181,10305,10306,10307,10308,10429,
    10430,10431,10432,10549,10550,10551,10552,
    10665,10666,10667,10668,10777,10778,10779,
    9921,9922,9923,9924,9925,
] # all pixels

filts = ("u", "g", "r", "i", "z", "y")
short_mag_cols = [f"LSST_obs_{filt}" for filt in filts]
mag_cols = [f"mag_{filt}_lsst" for filt in filts]
mag_cols_err = [f"mag_err_{filt}_lsst" for filt in filts]
short_mag_cols_err = [f"mag_{filt}_lsst_err" for filt in filts]
    
dic_rename = {}
for i,filt in enumerate(filts):
    dic_rename[short_mag_cols[i]] = mag_cols[i]
    dic_rename[short_mag_cols_err[i]] = mag_cols_err[i]
    
cols_to_keep = ['redshift','ra','dec','diskHalfLightRadiusArcsec', 'spheroidHalfLightRadiusArcsec', 'bulge_frac', 'totalEllipticity','galaxy_id']
cols_to_keep = cols_to_keep + mag_cols
cols_to_keep

In [None]:
errModel = LsstErrorModel(extendedType="auto", ndFlag=99, nYrObs=1, renameDict={"u": mag_cols[0],
                                      "g": mag_cols[1],
                                      "r": mag_cols[2],
                                      "i": mag_cols[3],
                                      "z": mag_cols[4],
                                      "y": mag_cols[5]
                                     })


For the major and minor axis we are using the following formulas:

- **Major Axis**:
  $$
  a_{gal} = \frac{s}{\sqrt{q}}
  $$

- **Minor Axis**:
  $$
  b_{gal} = s \sqrt{q}
  $$

Where:
- \( s \) is the weighted size of the galaxy:
  $$
  s = s_b f_b + s_d (1 - f_b)
  $$

- \$( q = \frac{1 - e}{1 + e}$\), where \( e \) is the ellipticity.


Ref: [van den Busch (2020)](https://arxiv.org/pdf/2007.01846) eqs (4) and (5).

In [None]:
# FIXME: This is copy-pasted from a notebook by Sam Schmidt, give precise credit somewhere.

cols_to_write = ["redshift", "galaxy_id", "ra", "dec"]
for filt in filts:
    cols_to_write.append(f"mag_{filt}_lsst")
    cols_to_write.append(f"mag_err_{filt}_lsst")

df = pd.DataFrame([])
z_interval = [0, 1, 2, 3] # FILES ARE SEPARATED BY REDSHIFT RANGE

if run:
    for pix in tqdm(pixels):  # looking each pixel in files
        print(f"working in pix: {pix}")

        for index in range(
            len(z_interval) - 1
        ):  # looking each redshift intervals in files
            hd5_pix = tables_io.read(
                f"{path_origin_data}/roman_rubin_2023_z_{z_interval[index]}_"
                f"{z_interval[index+1]}_cutout_{pix}.hdf5"
)


            keys = list(hd5_pix.keys())[:-1]  # each file has these keys, so we have to look each one
            for key in keys:
                df_origin = pd.DataFrame(hd5_pix[key])  # our data
                
                if df_origin.empty: 
                    break
                else:
                    df_temp = df_origin.rename(columns = dic_rename)
                    df_temp = df_temp[cols_to_keep]
                    df_temp = df_temp[
                        (df_temp["mag_i_lsst"] < 26.5) & (df_temp["mag_i_lsst"] > 17.5) 
                    ]  # Cutting the data into  CUTTING THE DATA!!!  

                df_temp["pix"] = pix
                df = pd.concat([df, df_temp])

        print(f'size: {len(df["redshift"])}')
        
        
#calculating the minor and major axis of each object
s_b = df['spheroidHalfLightRadiusArcsec']
s_d = df['diskHalfLightRadiusArcsec']
f_b = df['bulge_frac']
e = df['totalEllipticity']

s = s_b*f_b + s_d*(1-f_b)
q = (1-e)/(1+e)

major = np.array(s/np.sqrt(q))
minor = np.array(s*np.sqrt(q))

df['major'] = major
df['minor'] = minor

df = errModel(df, random_state = 42)

In [None]:
df = df.rename(columns = dic_rename)
df

In [None]:
if save_file:
    df.to_csv(path_file, sep=" ", index=False)