#Pre-Process GLADE+ Galaxy Catalog

The [GLADE+ Catalog](https://glade.elte.hu/) contains information on ~22.5 million galaxies and ~750,000 quasars. To use this notebook, you will first need to download the catalog as a text file. Here the text file is saved as `GLADE+.txt`.

This code reduces the 40-column spreadsheet to just two columns:
>`Pix`: The pixel location of the celestial object represented by a healpy map with an nside value of 2048. This location is derived from the `RA` (Right Ascension) and `Dec` (Declination) variables in the original dataset.

>`Z_bin`: The "bucket" to which the celestial body is assigned based on the redshift bins created using the [Tomographer interactive tool](http://tomographer.org/guest/jobs#). These bins help categorize galaxies based on their redshift values.

###Imports
>Healpy: for converting `RA` and `Dec` from angle coordinates to pixel coordinates.

>Numpy: for converting between degrees and radians in the pixel coordinate calculation.

>Pandas: for extracting the `z_bin` intervals.

>Tqdm: for tracking progress of the file reading.

In [None]:
import healpy as hp
import numpy as np
import pandas as pd
from tqdm import tqdm

###Get Redshift Bin Cutoffs

In [None]:
z_bins = pd.read_csv('Tomographer_GLADE+.csv')
z_bins = z_bins['z']

###Determine the Correct Redshift Bin for Objects

In [None]:
def get_z_bin(z_val):
    for i in range(len(z_bins)):
        if z_val < z_bins[i]:
            return i

###Set `nside` Parameter for Pixel Coordinate Calculation

`nside` is the resolution parameter determining the number of pixels in the dust map. Higher Nside values result in finer resolution but require more computational resources. Since we use `nside=2048` to generate the dust intensity models, it is important we use the same value here to place the galaxies at the correct relative positions.

In [None]:
nside = 2048

###Contruct `.csv` File
The following code extracts the `RA` (column 8) and the `Dec` (column 9) variables from the original text file and uses these values to calculate the pixel location of the object. It also extracts the redshift (column 27) and uses this value to determine the redshift bin.

In [None]:
with open('GLADE+.txt') as f:
    txt = f.readlines()

with open('GLADE+_2048.csv', 'w') as f:
    f.write('Pix,Z_Bin\n')
    for line in tqdm(txt):
        line = line.strip()
        e = line.split()
        if e[27] != 'null':
            # get RA and Dec values
            ra = e[8]
            dec = e[9]
            # calculate pixel location
            theta = np.deg2rad(-(float(dec)) + 90.)
            phi = np.deg2rad(360. - float(ra))
            pix = hp.ang2pix(nside, theta, phi)
            # get z_bin using extracted z-value
            z_bin = get_z_bin(float(e[27]))
            # write data to file
            f.write(str(pix))
            f.write(',')
            f.write(str(z_bin))
            f.write('\n')