# Converting Galaxy Magnitudes to Photometric Redshifts

To convert the magnitudes into a photometric redshift, we will be *estimating* that redshift. Most of the `Estimators` in RAIL have an *inform* stage, and an *estimation* stage.
The inform stage trains an model on how to do the conversion, so that stage will need to be given both the magnitude data, and the true redshifts of the galaxies.
We can then pass a new set of magnitudes (the ones we're actually interested in) to the *estimator*, along with the model that the informer created. The estimator can then apply the model to the new magnitudes in order to calculate a redshift value.


Steps/description

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tables_io
from rail.utils.path_utils import find_rail_file

from rail import interactive as ri

trainFile = find_rail_file("examples_data/testdata/test_dc2_training_9816.hdf5")
testFile = find_rail_file("examples_data/testdata/test_dc2_validation_9816.hdf5")
print(trainFile)

In [None]:
training_data = tables_io.read(trainFile)
print(type(training_data), training_data.keys())
training_data = training_data["photometry"]
training_data = tables_io.convert(training_data, "pandasDataFrame")
print(training_data.info())

`training_data` is now a Pandas DataFrame, containing information on 10,225 galaxies. It has magnitude information for the *ugrizy* bands, including errors, and the true redshift of these galaxies.

We'll now also load in the test data, which contains the magnitudes for the galaxies we actually want to calculate redshifts for. Just as a showcase, we'll leave the test data in the format given by `tables_io`. Either method can be used with RAIL functions, but they can require slightly different methods of passing the data.

In [None]:
test_data = tables_io.read(testFile)
print(test_data["photometry"].keys())

## Random Gauss

This estimation algorithm doesn't use any of the magnitude information to estimate a redshift, but instead just pulls a random value out of a Gaussian distribution. As such, while it has an informer stage, that stage doesn't do anything, so we can skip it.

Naturally since this estimator just picks random values, it's not very accurate, but we'll use it to get a feel for the shape of the data.

In [None]:
# Print the docstring for the estimator
ri.estimation.algos.random_gauss.random_gauss_estimator?

In [None]:
# run the random gauss estimator with default options
rg_result_default = ri.estimation.algos.random_gauss.random_gauss_estimator(
    input=test_data
)

print(
    rg_result_default
)  # it returns a dictionary with the key "output" pointing to a qp.Ensemble

We will extract data from the output ensemble in a few ways:
- Calculate the probability density function (pdf) for a specific galaxy (row), at specific points (the grid) (`ensemble[row_number].pdf(grid)`)
- Access the mode of the pdf for each galaxy (`ensemble.ancil["zmode"]`)

In [None]:
# replace this with the above rg_result default
result = ri.estimation.algos.random_gauss.random_gauss_estimator(input=test_data)

zgrid = np.linspace(0, 3.0, 301)
galid = 9529
truez = test_data["photometry"]["redshift"][galid]
single_gal = np.squeeze(result["output"][galid].pdf(zgrid))
single_zmode = result["output"].ancil["zmode"][galid]

plt.plot(zgrid, single_gal, color="k", label="single pdf")
plt.axvline(single_zmode, color="k", ls="--", label="mode")
plt.axvline(truez, color="r", label="true redshift")
plt.legend(loc="upper right")
plt.xlabel("redshift")
plt.ylabel("p(z)")
plt.show()

In [None]:
# rename this result to be more specific, discuss how the addition of one parameter
# changes the result
result = ri.estimation.algos.random_gauss.random_gauss_estimator(
    input=test_data, rand_width=0.5
)

zgrid = np.linspace(0, 3.0, 301)
galid = 9529
truez = test_data["photometry"]["redshift"][galid]
single_gal = np.squeeze(result["output"][galid].pdf(zgrid))
single_zmode = result["output"].ancil["zmode"][galid]

plt.plot(zgrid, single_gal, color="k", label="single pdf")
plt.axvline(single_zmode, color="k", ls="--", label="mode")
plt.axvline(truez, color="r", label="true redshift")
plt.legend(loc="upper right")
plt.xlabel("redshift")
plt.ylabel("p(z)")
plt.show()

In [None]:
# change from "result" to one of the ones above (or a comparison of both?)
# discuss what this graph is and how it shows both the gaussian shape and how bad a
# random gaussian selection is
plt.figure(figsize=(8, 8))
plt.scatter(
    test_data["photometry"]["redshift"],
    result["output"].ancil["zmode"].flatten(),
    s=1,
    c="k",
    label="simple NN mode",
)
plt.plot([0, 3], [0, 3], "r--")
plt.xlabel("true redshift")
plt.ylabel("simple NN photo-z")
plt.show()

## Something Else

In [None]:
ri.estimation.algos.k_nearneigh.k_near_neig_informer?

In [None]:
model = ri.estimation.algos.k_nearneigh.k_near_neig_informer(
    input=training_data, hdf5_groupname=""
)
print(model)

In [None]:
ri.estimation.algos.k_nearneigh.k_near_neig_estimator?

In [None]:
# model is missing from the docstirng?
result = ri.estimation.algos.k_nearneigh.k_near_neig_estimator(
    input=test_data, model=model["model"]
)

In [None]:
zgrid = np.linspace(0, 3.0, 301)
galid = 9529
truez = test_data["photometry"]["redshift"][galid]
single_gal = np.squeeze(result["output"][galid].pdf(zgrid))
single_zmode = result["output"].ancil["zmode"][galid]

plt.plot(zgrid, single_gal, color="k", label="single pdf")
plt.axvline(single_zmode, color="k", ls="--", label="mode")
plt.axvline(truez, color="r", label="true redshift")
plt.legend(loc="upper right")
plt.xlabel("redshift")
plt.ylabel("p(z)")
plt.show()

In [None]:
plt.figure(figsize=(8, 8))
plt.scatter(
    test_data["photometry"]["redshift"],
    result["output"].ancil["zmode"].flatten(),
    s=1,
    c="k",
    label="simple NN mode",
)
plt.plot([0, 3], [0, 3], "r--")
plt.xlabel("true redshift")
plt.ylabel("simple NN photo-z")
plt.show()