# Walkthrough the whole process of acquiring and cross-matching data
In this notebook, we will go through the stages required to transform our spectra into a useful dataset, complemented by available photometry and information.

## 1. Ensure that the correct data path is known to the system

In [None]:
import os

os.environ["FORS2DATALOC"]

In [None]:
if os.environ["FORS2DATALOC"] == "":
    os.environ["FORS2DATALOC"] = os.path.abspath(os.path.join("..", "..", "src", "data"))
os.environ["FORS2DATALOC"]

It is strongly recommended to add the following to your `.bashrc` or `.bash_aliases` file:
```bash
export FORS2DATALOC="[path to this repository]/src/data"
```
Then log out and log back in, or `source` the file, and the environment variable will be set and should be set automatically each time you start a session.

## 2. Explore available data
FITS tables for FORS2 and GALEX data are queried automatically.
FITS table from 9-band KiDS must be queried externally from the ESO archives website and saved with the appropriate name. It should, however, be part of the data cloned from the GitHub repository.

In [None]:
from process_fors2.fetchData import queryTargetInSimbad

### Simbad query
For illustration purposes - we notice the `MAIN_ID` field that gives us the handle to the target in Simbad. It is already hard-coded in our package.

In [None]:
simbadtable = queryTargetInSimbad()

In [None]:
simbadtable

### Vizier query
This is how we obtain data related to the spectra that come with this package. The function can be used to query other objects but defaults to argument values that are hard-coded in the package.

In [None]:
from process_fors2.fetchData import DEFAULTS_DICT, getFors2FitsTable

DEFAULTS_DICT

In [None]:
os.path.isfile(DEFAULTS_DICT["FITS location"])

In [None]:
fors2table_vizier = getFors2FitsTable()

In [None]:
fors2table_vizier

In [None]:
os.path.isfile(DEFAULTS_DICT["FITS location"])

The table has been queried from Vizier and correctly written to the disk.

### GALEX query

In [None]:
os.path.isfile(DEFAULTS_DICT["GALEX FITS"])

In [None]:
from process_fors2.fetchData import queryGalexMast

In [None]:
galextable_mast = queryGalexMast()

In [None]:
galextable_mast

In [None]:
os.path.isfile(DEFAULTS_DICT["GALEX FITS"])

The table has been queried from MAST and correctly written to the disk.

### 9-band photometry from KiDS
This data is not as easily available from astroquery and shall be downloaded from the ESO Archives website, then saved with an appropriate name, such as the one in defaults parameters.
The existing file was obtained with a query centered on the cluster region, in a $12' \times 12'$ box, keeping only galaxies with a filter on the parameter `SG_FLAG`.

In [None]:
os.path.isfile(DEFAULTS_DICT["KiDS FITS"])

In [None]:
from process_fors2.fetchData import readKids

In [None]:
kidstable_eso = readKids()

In [None]:
kidstable_eso

In [None]:
kidstable_eso.columns

## 3. Check spectra
Spectra from galaxies in the field described above are shipped within this package. Here, we manipulate them to obtain a final file that gathers all available data, cross-matched, thus combining spectroscopy and photometry information for those galaxies.

In [None]:
os.listdir(DEFAULTS_DICT["FORS2 spectra"])

In [None]:
os.listdir(DEFAULTS_DICT["Starlight spectra"])

In [None]:
from process_fors2.fetchData import fors2ToH5

In [None]:
os.path.isfile(DEFAULTS_DICT["FORS2 HDF5"])

In [None]:
fors2ToH5()

In [None]:
import numpy as np

uniques, counts = np.unique(fors2table_vizier["ID"], return_counts=True)
uniques[counts > 1]

In [None]:
_sel = fors2table_vizier["ID"] == 72
fors2table_vizier[_sel]

In [None]:
os.path.isfile(DEFAULTS_DICT["FORS2 HDF5"])

In [None]:
from process_fors2.fetchData import starlightToH5

In [None]:
os.path.isfile(DEFAULTS_DICT["Starlight HDF5"])

In [None]:
starlightToH5()

In [None]:
os.path.isfile(DEFAULTS_DICT["Starlight HDF5"])

There, we have generated HDF5 files containing catalog data + available spectra ; we have also noticed one caveat of the script and checked that no data would be conflicting. Let's decode the files that were created !

In [None]:
from process_fors2.fetchData import readH5FileAttributes

In [None]:
sl_df = readH5FileAttributes(DEFAULTS_DICT["Starlight HDF5"])

In [None]:
sl_df

In [None]:
sl_df[sl_df["num"] == 72]

In [None]:
import h5py

In [None]:
with h5py.File(DEFAULTS_DICT["Starlight HDF5"], "r") as sl_in:
    for tag in sl_in:
        print(tag)

In [None]:
with h5py.File(DEFAULTS_DICT["Starlight HDF5"], "r") as sl_in:
    for tag in list(sl_in.keys())[:1]:
        group = sl_in.get(tag)
        for attr in group:
            print(attr)

In [None]:
import matplotlib.pyplot as plt

with h5py.File(DEFAULTS_DICT["Starlight HDF5"], "r") as sl_in:
    for tag in list(sl_in.keys())[:4]:
        group = sl_in.get(tag)
        wl = np.array(group.get("wl"))
        fl = np.array(group.get("fl"))
        fl_ext = np.array(group.get("fl_ext"))
        plt.plot(wl, fl, label="Flux corrected for dust extinction")
        plt.plot(wl, fl_ext, label="Flux not corrected for dust extinction")
        plt.xscale("log")
        plt.yscale("log")
        plt.xlabel("Wavelength [Ang.]")
        plt.ylabel("Flux [arbitrary units]")
        plt.legend()
        plt.show()

In [None]:
with h5py.File(DEFAULTS_DICT["FORS2 HDF5"], "r") as sl_in:
    for tag in list(sl_in.keys())[:4]:
        group = sl_in.get(tag)
        wl = np.array(group.get("wl"))
        fl = np.array(group.get("fl"))
        msk = np.array(group.get("mask"))
        msk = np.where(msk > 0, True, False)
        plt.plot(wl, fl, label="Observed flux")
        plt.plot(wl[msk], fl[msk], lw=0.5, label="Masked portions of the flux")
        plt.xscale("log")
        plt.yscale("log")
        plt.xlabel("Wavelength [Ang.]")
        plt.ylabel("Flux [arbitrary units]")
        plt.legend()
        plt.show()

We have showed that our `hdf5` files contain all informations from the initial table + all available spectra from oberstaions (FORS2) or SPS-extrapolation (Starlight) - plus mask information and with/without dust extinction.

## 4. Merge catalogs
We will now generate a single `hdf5` file that gathers all appropriate data from the tables above and the spectra. This will be used as inputs for various studies.

In [None]:
from process_fors2.fetchData import crossmatchFors2KidsGalex

In [None]:
filename = "resulting_merge_from_walkthrough.h5"
outfile = os.path.abspath(os.path.join(".", filename))

In [None]:
crossmatchFors2KidsGalex(outfile)