# Quantifying galaxy environment

notebook by _Alex Malz (GCCL@RUB)_, _Kara Ponder (UC Berkeley)_, _Ben Moews (Edinburgh)_, add your name here

In [None]:
from astropy.io import fits
import corner
import galenv
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd

%matplotlib inline
np.seed = 42

We want to get the spectra of galaxies matching conditions found [here](http://www.gama-survey.org/dr3/schema/table.php?id=31).

In [None]:
with fits.open('SpecObj.fits') as hdul:
    hdul.info()
#     print(hdul[1].header)
    df = pd.DataFrame(np.array(hdul[1].data).byteswap().newbyteorder())
    print(df.columns)
    df.index = df['CATAID']

## Select spectra by redshift and field

## Galaxy redshifts

Each galaxy in the GAMA catalog has a spectroscopically confirmed redshift.  
We're going to match these redshifts to the snapshots of the particle data.

In [None]:
z_SLICS = np.array([0.042, 0.080, 0.130, 0.221, 0.317, 0.418, 0.525, 0.640, 0.764, 0.897, 
           1.041, 1.199, 1.372, 1.562, 1.772, 2.007, 2.269, 2.565, 2.899])
z_mids = (z_SLICS[1:] + z_SLICS[:-1]) / 2.
z_bins = np.insert(z_mids, 0, min(df['Z']))
z_bins = np.append(z_mids, max(df['Z']))
plt.hist(df['Z'], bins=z_bins)
plt.semilogy()
plt.xlabel('redshift')
plt.ylabel('number of galaxies')

The histogram of redshift is skewed by the use of `z=10` as a placeholder for not having a secure redshift.  
GAMA has a quality flag we can use to filter for redshifts that were considered of sufficient quality for science use, which they define as `NQ > 2`.

In [None]:
moar_bins = np.arange(z_bins[0], z_bins[-1] + z_bins[1], z_bins[1])
for i in range(5):
    quality = df.loc[df['NQ'] == i+1, 'Z']
    plt.hist(quality, alpha=0.5, label=str(i+1), bins=moar_bins)
plt.legend(loc='upper right')
plt.semilogy()
plt.xlim(moar_bins[0], moar_bins[-1])
plt.xlabel('Z')
plt.ylabel('number of galaxies')
plt.title('redshift distributions by quality flag "NQ"')

## Galaxy environment

GAMA observed galaxies in four disjoint regions of the sky.
Since environment is about the immediate vicinity of each galaxy, we'll have to divide the galaxies by region, effectively building our redshift-environment-color distribution separately for each region before combining those findings.

In [None]:
corner.corner(np.array([df['RA'], df['DEC']]).T, labels=['RA', 'DEC'], show_titles=True)

In [None]:
RA_bin_ends = [0., 80., 160., 200., 360.]
subsamples, lens = [], []
for i in range(len(RA_bin_ends)-1):
    subsamples.append(df.loc[(df['RA'] >= RA_bin_ends[i]) & (df['RA'] < RA_bin_ends[i+1]) 
                             & (df['NQ'] > 2) & (df['Z'] >= z_bins[1]) & (df['Z'] < z_bins[2]), 
                             ['CATAID', 'RA', 'DEC', 'Z', 'NQ']])
    lens.append(len(subsamples[-1]))

In [None]:
subset = np.argmin(lens)
print(lens[subset])

In [None]:
data = np.vstack((subsamples[subset]['DEC'], [subsamples[subset]['RA']])).T
print(data.shape)
print(data[42])

## Galaxy environment

Within each field, we can quantify the density of the local region around each galaxy, which is really what the notion of "galaxy environment" is getting at.
We're going to use the number of neighboring galaxies at each of several given distances in angular coordinates, so as not to incur the computational cost of calculating the distances between all galaxies.

In [None]:
help(galenv)

### Choose some reasonable radii

The distance measure is normalized to the radius of Earth, but our angular positions are in degrees

In [None]:
chosen_ind = np.random.randint(0, len(data), 1)[0]
try_distances = np.geomspace(0.05, 0.2, 10)
res = []
for dist in try_distances:
    res.append(len(galenv.nn_finder(data, data[chosen_ind], dist)))

In [None]:
plt.plot(try_distances, res)

# Next steps

## construct redshift-environment-SED/color relationship

In [None]:
# with fits.open(just_fn) as hdul:
#     arr = np.array(hdul[0].data).byteswap().newbyteorder()
#     metadata = hdul[0].header