#Photo-z Data Analysis

##Purpose

This notebook is intented to provide a complete analysis of the data generated during the photoz run of J-PAS spectra. There is
about 1Gb of **output** data to analyze, plus xGb of input data that "we already know very well"...

The idea is to explain the biases and uncertainties in the physical properties of distant galaxies (stellar mass, stellar age,
stellar metallicity and dust content) retrieved via SED fitting. It is known now that biases and uncertainties for near galaxies
($z=0$) are consequence of the well known degeneracies age-mass-metallicity-extinction. Now, for very far galaxies ($z\geq 1$)
the problem is expected to be more complicated because two main factors:

* The effective wavelength coverage will be diminished as a consequence of the redshift.
* The lack of evolution indicators in the UV, being this range dominated by newborn massive stars and by very old remnants of
  intermediate stars.

An *a priori* taste of this bittersweet can be taken if the distribution of the *angles* between SSPs at different redshifts (yes, up to the
age of the universe at those redshifts) is shown as follows:

As you can see...

##The procedure

We generated the whole output data by carrying a theorical-theorical analysis (TTA), which consist in contructing a bunch of synthetic data
(to mimic the observed universe) and with **well known physical properties**, and then we try to retrieve those properties using our home-made
spectral synthesis code: `DynBaS` (see Magris et al. (2014), in prep.).

##The inputs

We explore a considerable amount of the Star Formation History (SFH) space parameter generated using the
<a href="http://arxiv.org/abs/1108.4719">Chen et al. (2012)</a> recipe and the Stellar Population Synthesis (SPS) models of
<a href="http://arxiv.org/abs/astro-ph/0309134">Bruzual & Charlot (2003)</a> (BC03, hereafter). The recipe can be summarized as follows:

* A Star Formation Rate (SFR) that consist in:
  * An exponentially declining continuum star forming fase, parametrized by the initial onset of star formation, $t_\text{form}$,
    and the $e$-folding time, $\tau$. This SFR can undergo a second fase that we call "truncation", in which the SFR suddenly begins
    to slow down faster than before. It is parametrized with another exponential starting at the truncation time, $t_\text{trunc}$,
    and a $e$-folding time much more short (usually $<1$ Gyr).
  * A burst of star formation that blends with the continuum part at any time, characterized by a starting time, $t_\text{burst}$, a
    duration time $t_\text{ext}$, and an amplitude, $A\equiv M_\text{burst} / M_\text{cont}$.
* The dust content is parametrized following <a href="http://arxiv.org/abs/astro-ph/0003128">Charlot & Fall (2000)</a>: the $V$-band
  optical depth, $\tau_V$, and the fraction of it that affects stars older than $10^7$ yr, $\mu$.
* The metallicities are interpolated from those available BC03 models, producing this way mono-metallic mock galaxies.

For more details on the distributions of these parameters see <a href="http://arxiv.org/abs/1108.4719">Chen et al. (2012)</a>.

###The problem SEDs

Each mock galaxy produced is then SED-fitted at different cosmic epochs. This mean, we pass each SED of the same mock galaxy, as seen at
different redshifts (i.e., same galaxy with different ages, in the observer frame) and then compare the (known) input properties with the
output of `DynBaS`.

##The outputs

In [1]:
#LOAD THE BULK IN/OUT-PUT DATA

import data_loader as dl

indir       = "inputs/photoz3/"
z           = append(arange(0.0, 1.0, 0.05), arange(1.0, 3.1, 0.1))
zform_table = genfromtxt(indir+"set3.zform", dtype = None, names = True)
zform_table = zform_table[lexsort((zform_table["zform"], zform_table["Galaxy"]))]
SFH_IDs     = zform_table["Galaxy"]
zform       = {SFH_ID : zform for SFH_ID, zform in zip(SFH_IDs, zform_table["zform"])}
IDs_counts  = genfromtxt(indir+"set3.counts", dtype = None)
IDs, counts = IDs_counts["f0"], IDs_counts["f1"]
data_z      = {ID : dl.load_data(ID, count, n_trials = 50) for ID, count in zip(IDs, counts) if ID}
z2ID        = {value : ID for value, ID in zip(z, IDs)}
ID2z        = {ID : value for value, ID in zip(z, IDs)}

del zform_table

SFRs_z = {SFH_ID : {ID : loadtxt(indir+"{0}_{1}.sfr".format(SFH_ID, ID)) for ID in IDs if ID2z[ID]<zform[SFH_ID]} for SFH_ID in SFH_IDs}

In [18]:
#MAKE SOME PLOTS

from matplotlib import rc

rc("text", usetex = False)

fig, axs = subplots(5, 1, figsize = (20, 10), sharex = True, sharey = True)

xlim(0., 3.)
ylim(-1., +1.)

xlabel(r"Redshift", fontsize = 12)

for i, par in enumerate(["M", "log_t_M", "log_t_L", "log_Z_M", "Av"]) :
    ave_res = array([mean(data_z[ID].residuals[par]) for ID in IDs])
    std_res = array([std(data_z[ID].residuals[par]) for ID in IDs])
    
    axs[i].set_ylabel(r"residual " + par, fontsize = 12)
    axs[i].errorbar(z, ave_res, std_res, fmt = "-", ecolor = "#FF2100", color = "#6D6D6D", lw = 1.5)
    

In [None]:
#ANALIZE AN INDIVIDUAL SFH

SFH_ID = "SSAG3022"
mask   = lambda ID : SFRs_z[SFH_ID][ID][:, 0] >= SFRs_z[SFH_ID][ID][:, 0].max() - 1e9

t_obs      = np.array([SFRs_z[SFH_ID][ID][:, 0].max() for ID in sorted(SFRs_z[SFH_ID])])
redshifts  = np.array([ID2z[ID] for ID in sorted(SFRs_z[SFH_ID])])
mass_total = np.array([trapz(SFRs_z[SFH_ID][ID][:, 1], SFRs_z[SFH_ID][ID][:, 0]) for ID in sorted(SFRs_z[SFH_ID])])
mass_1Myr  = np.array([trapz(SFRs_z[SFH_ID][ID][mask(ID), 1], SFRs_z[SFH_ID][ID][mask(ID), 0]) for ID in sorted(SFRs_z[SFH_ID])])
residuals  = np.array([median(data_z[ID].get_SFH_residuals("{0}_{1}.jpas".format(SFH_ID, ID))["M"]) for ID in sorted(SFRs_z[SFH_ID])])

##The outcome