Application to Euclid data: spectroscopic and photometric galaxy simulations
This repository contains a suite of tools to link astrophysical simulation codes, which are combined in a pipeline to generate galaxy photometry and spectroscopy representative of the expected quality of the Euclid Mission data. This simulated data set can be used to help develop and benchmark spectroscopic and photometric redshift estimation methods, and to assess whether their redshift accuracy meets the official Mission Requirements for successful cosmological measurements.
Corresponding Author: Bruno Moraes (University College London), email@example.com
Table of Contents
- Getting Started
- Example: Realistic Euclid spectroscopic data set creation
The European Space Agency Euclid Mission aims to measure the global properties of the Universe to unprecedented accuracy, with particular emphasis on the properties of the mysterious Dark Energy that is driving the acceleration of the expansion of the Universe. These properties can be inferred from the statistical distribution of galaxies in the Universe and from the effects of the matter distribution on their observed shapes through gravitational lensing. However, this requires extreme precision and accuracy on shape and positional measurements of galaxies.
In particular, measuring their radial distances from us is one of the most challenging problems in modern observational cosmology. The way we infer those distances is through the Doppler effect: due to the expansion of the Cosmos, galaxies are receding from us and their light is consequently shifted towards longer wavelengths (the "red" side of the electromagnetic spectrum). This redshift is directly related to a galaxy's distance, and by measuring it from the properties of the received light, we can reconstruct its position.
In astronomy, there are two main methods for measuring galaxy redshifts. Measuring spectroscopic redshifts consists in observing the full spectral energy distribution (SED) of a galaxy and identifying features that allow a secure redshift determination. A galaxy's spectrum is a consequence of a series of relatively well-understood physical phenomena, mostly concerning the nuclear and chemical reactions inside stars and the types and ages of stellar populations within the galaxy in question. Atomic emission and absorption lines give rise to very distinct peaks and troughs in a galaxy SED. By identifying such a feature, its wavelength can be compared to the known wavelength of such a transition observed in Earth’s laboratories, and this yields the value of the redshift by the cosmological Doppler effect.
Photometric redshift measurements, on the other hand, try to reconstruct the redshift value from only a handful of numbers representing the integrated light flux in broadband filters. Degeneracies abound, making results less precise and possibly biased, but they circumvent the need of a spectrograph and can also reach fainter magnitudes, as light is integrated in broad wavelength ranges. Robust photometric redshifts depend much more on a correct model for spectral templates and understanding of the global properties and types of galaxies, and less on the detection of specific features.
When generating a large realistic simulated spectroscopic and photometric data set to be used as a test bed for redshift estimation, we need to ensure that it is representative of the expected quality of Euclid data. Another requirement is to have a realistic distribution of galaxies in several photometric observational parameters. We want our simulated data to follow representative redshift, color, magnitude and spectral type distributions. These quantities depend on each other in intricate ways; correctly capturing the correlations is essential if we want to have a realistic assessment of the accuracy and improvements of our proposed methods.
We also require realistic spectral energy distributions (SEDs) and emission-line strengths. Euclid will observe an estimated 50 million spectra through slitless spectroscopy. The required sensitivity is defined in terms of the significance of the detection of the Hα Balmer transition line. These requirements imply a detection rate that depends on magnitude and redshift, therefore demanding that we simulated realistic Hα line width and strength, which depend on the properties of the continuum of the spectral distribution. In addition to continuum and line properties, extinction of light by dust within each galaxy needs to be simulated.
All necessary Python dependencies are included in the Anaconda distribution. The code is compatible with Python 2 and 3.
COSMOSSNAP is a FORTRAN 77 package to generate spectrophotometric simulations based on real data from the Hubble Telescope COSMOS survey. It uses galaxy properties catalogs from COSMOS observations, combined with many follow-up ground observations, to generate a synthetic catalog reproducing the relevant observational properties, whilst associating a "true" consistent redshift and SED to each galaxy. Details are described in Jouvel et al. (2009).
Previous public links to COSMOSSNAP are deprecated. To obtain a copy of the software and installation instructions, please contact the Corresponding Author.
TIPS is a software package designed to perform simulations of astronomical slitless spectroscopy observations. It is the pixel simulator of the NISP slitless spectrograph of the ESA Euclid space mission (http://sci.esa.int/euclid). It is based on the aXeSIM code, which is part of the aXe sofware package developed for the support of the Hubble Space Telescope slitless spectroscopic observation modes.
To install TIPS, clone and follow instructions from https://gitlab.in2p3.fr/in2p3_euclid/tips
ISAP (Interactive Sparse Astronomical Data Analysis Packages) is a collection of packages in IDL and C++ related to sparsity and its application in astronomical data analysis. You will only need it if you intend to estimate spectroscopic redshifts or denoise galaxy spectra. Instructions for downloading and installing ISAP can be found on its main webpage.
You will also need an IDL installation and license for this functionality to work. IDL is a numerical analysis software analogous to Matlab and is often used in astrophysics, especially in legacy code.
You will definitely need a computational cluster if you intend to run COSMOSSNAP or TIPS on a non-trivial amount of data. This repository provides PBS scripts to run the different steps of the data processing in such a setting. These were tested in a cluster using the C Shell (I know, sorry...), so you will need to adapt some lines to Bash if that's what your system uses. This should be easy to do if you have moderate experience with shell scripting.
To use the functionality developed in this repository, simply clone it into your preferred location and ensure that your system's PYTHONPATH points to it. Setting paths for the Requirements will be necessary only for the specific functionality you intend to use.
All calls to data in any of the examples refer to the relative location a 'data' folder. To set this up, create an empty 'data' folder in ./dedale_d51/, download this tarball (for example, with wget) and untar it within the data folder. BEWARE: This currently contains several GBs of data. This alternative tarball contains only the data needed to run the example below. For alternative ways to distribute the full data set, contact the Corresponding Author.
In this example, we demonstrate the use of most of the functionality included in the repository. There are 4 main steps:
1. Run COSMOSSNAP to generate a simulated galaxy catalog with broadband photometry and their associated clean SEDs. 2. Run TIPS on the galaxy spectra to obtain realistic Euclid-like noisy SEDs. 3. Select the galaxy population that will be detected by Euclid slitless spectroscopy. 4. Run Darth Fader on the resulting spectra to measure redshifts.
Steps 1, 2 and 4 require the use of a computational cluster. In-between these steps, a non-trivial amount of data processing and formatting is required, which is part of the functionality provided in this repository. These intermediate steps are also described in this Example.
To run COSMOSSNAP, you need to define a configuration file and an error file. You can find the ones used in this example here
To launch the code in batch runs in a PBS cluster, the command is:
$ python example/qsub_batch_cosmossnap.py
This python script generates multiple PBS files for batches of the base data. The generic form of the COSMOSSNAP command is:
$ $COSMOSSNAPDIR/source/cosmossnap -c /path/to/config.para -SNAP_OUT /path/to/outfile.out -STAR [YES/NO] -AGN [YES/NO] -SPECTRA_OUT /path/to/spectra.fits -SPECTRA_LAMBDA Lmin,Lmax,dL -LINES_STARTEND min,max
- '-c' is the path to the configuration file. Error configurations and broadband filters are defined there. If you want to add filters, copy the files in the appropriate format to COSMOSSNAP/filt/.
- '-SNAP_OUT' is the path to the output photometric subcatalogs.
- '-STAR' indicates whether stars are excluded from the catalog.
- '-AGN' indicates whether Active Galactic Nuclei (AGNs) are excluded from the catalog.
- '-SPECTRA_OUT' produces FITS files with the corresponding spectra, and saves them to the path given.
- '-SPECTRA_LAMBDA' defines the wavelength range and resolution.
- '-LINES_STARTEND' runs the code on a subselection of the base catalog between the lines given. There is no randomness in the sampling of the catalog, i.e. a given entry number will always correspond to the same galaxy.
For more detail on how to run COSMOSSNAP and what it produces, please look at COSMOSSNAP's README. For information purposes, we include a copy here.
It is necessary to reformat the files to the TIPS input format. For that we use a batch PBS creator written in python, which can be called as:
$ python example/qsub_batch_cosmossnap_to_tips.py
This is a simple script which, given a number of batches, creates a PBS script for each batch, launches it on a cluster node and deletes it on the fly. The PBS scripts call a python script that performs the necessary operations on the FITS files. The generic call is
python ../scripts/process_cosmossnap_to_tips.py ../data/cosmossnap/example_run_phot_%d.out ../data/cosmossnap/example_run_spec_%d.fits ../data/cosmossnap/output/
where %d are integers indicating the file in each PBS run. Bear in mind that the paths in the call are relative paths.
As mentioned in the requirements section, TIPS is a pixel simulator developed to provide simulated images of Euclid's NISP near-infrared slitless spectrometer. It has functionality to simulate full 2D images, with realistic astrophysical and detector effects. TIPS also provides functionality to directly output 1D spectra, bypassing some of the time-consuming steps of generating the full images. This simplification is still realistic enough for most interesting applications. Here, we focus on this latter use case. For more details on the full image simulator, consult Zoubian et al. (2014) and this example.
To run TIPS, we use a PBS launcher to run batches of jobs with a predefined number of spectra:
$ qsub example/qsub_launcher_tips_array.sh
This command will launch the python script scripts/run_tips_on_block.py on a single cluster node. This script spawns subprocesses to use the node's cores, and each subprocess calls the TIPS 1D spectrum simulation script:
$ python $TIPSDIR/tips/scripts/mk_spc_1d.py path_to_spc_file sigma_src number_exposure sky_background path_to_tips_conf working_directory
- 'path_to_spc_file' is the path to the spectroscopic input of a run, coming from COSMOSSNAP postprocessing.
- 'sigma_src' is the size of the simulated 2d galaxy image in arcseconds. We use 1.5" for all our simulations.
- 'number_exposure' is the number of images observed that compose the full integrated 2d image. We use 4 for 'Euclid Wide' and 20 for 'Euclid Deep'.
- 'sky_background' is the constant flux due to diffuse astrophysical sources such as zodiacal light
- 'path_to_tips_conf' is the path to TIPS configuration file.
- 'working_directory' is the path where TIPS creates - several - intermediate files and folders.
During TIPS runs, scripts/run_tips_on_block.py saves the intermediate files in a local folder on the cluster node to minimize multiple transfers through the cluster network. The 1D spectra are saved to a local folder on the login node at the end of the full process.
For more details on TIPS outputs and options, check the documentation in the $TIPSDIR/tips/scripts/mk_spc_1d.py script.
As a last step of the generating process, we want to ensure that the population of simulated galaxies represents the characteristics of those that will be detected by Euclid.
Firstly, Euclid requirements and methods are driven by the detection of the Hα emission line from the Balmer series. Its spectrograph is designed such that, for galaxies within redshifts [0.9, 1.8], the Hα line will fall in the range of observed wavelengths (which corresponds to the near-infrared section of the electromagnetic spectrum). We apply this selection criterion with the help of the COSMOSSNAP catalog, which contains the wavelength of the observed Hα line. We exclude all galaxies - and corresponding spectra - whose line falls outside the observed wavelength range.
Secondly, TIPS fails silently and produces an empty spectrum when the flux of the galaxy is too faint to be detected in the 2D slitless spectroscopy image. We filter out all the 'NaN' and match back to the COSMOSSNAP property catalog to exclude those galaxies.
These operations are performed and discussed in the jupyter notebook, which needs to be run in order to generate the final products. The notebook can either be run in standard interactive mode, or as part of a pipeline from the command line:
$ jupyter nbconvert --to notebook --execute example/2017-12-07_Euclid_spectroscopic_selection.ipynb
The figure below illustrates the effect of those two selection steps. The notebook generates a final galaxy property catalog and spectroscopic catalogs for both the Wide and Deep components of Euclid's spectroscopic survey.
That's it! You can now apply your favorite redshift estimation method to realistic simulated Euclid data!