# PCA on continuums and emission lines

This notebook demonstrates the process of PCA on the continuum spectra and on the emission lines. First we have to import the file with the necessary programs

In [None]:
from LineSpec import pca_programs as pg
import numpy
import scipy.interpolate

The spectral datasets I have used for the program are ordered in a _python dictionary_ that contains the following keys:
  -  'Spectrum': The measured spectrum, convolved with the right velocity dispersion kernel and transformed back to restframe. This is a 2d _array_, in which the first column is the wavelength and the second column is the flux data.
  -  'Model': The fitted continuum flux. For the fitting we made use of the spectral templates of Bruzual & Charlot (2003). The fitting procedure is detailed in Csörnyei & Dobos (2020). For the model spectra the same wavelength grid was used as for the 'Spectrum', thus this should be 1d _array_. If there is no available model flux for the given galaxy, this key must be set either 'None' or an _array_ filled with zeros.

These are keys are the core components of the _dictionary_, these must be present for the scripts to work.

In [None]:
spectra = numpy.load('all_spec.npy').item()

The line data is stored in a separate python _dictionary_, which does not have keys, every log in the dictionary contains only one 1d _array_ with ten values. These data are the equivalent widths of the emission lines listed in Csörnyei & Dobos (2020) ordered according to their wavelenght (increasing order).

In [None]:
line_data = numpy.load('all_lines.npy').item()

As a first step a common wavelength grid has to be defined, which is covered by each of the spectra. The spectra then will be resampled to this grid, then simultaneously normalised as well, according to the normalisation procedure detailed by Beck et al. (2016).

In [None]:
new_wl = numpy.linspace(3724, 6761, 5062)

In [None]:
normed_spec = pg.normalize_model(spectra, new_wl) # The spectra here are also organized into a dictionary

As a first step, the average spectrum has to be calculated. This will be subtracted from every normalised continuum model, then apply PCA on the obtained residual spectra, to fiend the highest variance spectrum components.

In [None]:
avg_spec = pg.average_spec(normed_spec)

In [None]:
red_spec = pg.get_reduced_spectra(normed_spec, avg_spec)

After subtracting the average from the spectra, we apply PCA on them. The output of the script are two _arrays_, one containing the eigenspectra (the first five, as it presently set), while the other contains the corresponding principal component coefficients for each of the galaxies. These values will be used for modelling in the further steps.

In [None]:
eig5, PCs = pg.run_pca(red_spec, new_wl, 5)

The resulting eigenspectra and principal components both contain information of every type of galaxies. To infer the eigenvectors of emission line equivalent widths, we have to separate the passive or weak line galaxies from the strong emission line galaxies, which exhibit all ten emission lines (for details, see Csörnyei & Dobos (2020)). This step is done below, where we sort the line equivalent width arrays and the continuum principal components into different _dictionaries_.

In [None]:
emiss_lines, emiss_pcs, no_lines, no_pcs = pg.separate_emission_galaxies(spectra, line_data, PCs)

As a result, we obtained separate arrays for the emission line equivalent widths and the continuum principal component coefficients, which then will be used for modelling the distributions of the galaxy catalog to set up a realistic mock catalog generator. Before that, we also have to apply PCA on the emission lines as well separately, taking only those spectra into account, which exhibited all ten emission lines.

In [None]:
V, E_PCs = run_line_pca(emiss_lines, 10)