# Exploring the BPZ Test Data

_Alex Malz & Phil Marshall_

We have a small dataset to test our `qp` approximations on: 30,000 photometric redshift 1D posterior PDFs, in "gridded" format, from Melissa Graham (UW, LSST). In this notebook we visualize these distributions, and develop machinery to evaluate our approximations on the whole set in "survey mode." 

## Set-up, Ingest

In [None]:
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

import qp

In [None]:
# fake_zs = np.arange(0., 1., 1./350)
z = np.arange(0.01, 3.51, 0.01, dtype='float')
zrange = 3.51-0.01

In [None]:
with open('bpz_euclid_test_10_2.probs', 'rb') as data_file:
    lines = (line.split(None) for line in data_file)
    lines.next()
    # lines.next()
    pdfs = np.array([[float(line[k]) for k in range(1,len(line))] for line in lines])
    pdf_shape = np.shape(pdfs)
    print(np.sum(pdfs, axis=1)[:100] / zrange)
    norm_factor = zrange / pdf_shape[1]
    pdfs /= norm_factor#np.ones(pdf_shape[0])#norm_factors
    print(np.sum(pdfs * zrange, axis=1)[:100])
data_file.close()

We'll need an array of redshifts too, the $z$ in $p(z)$. This is 350 elements long, for the data we are using.

## Visualizing the BPZ $p(z)$'s

Let's plot a few PDFs, using the `qp` class.

In [None]:
print(pdfs[1])

In [None]:
plt.plot(z, pdfs[1])
plt.xlabel('redshift $z$', fontsize=16);

In [None]:
G = qp.PDF(gridded=(z, pdfs[1]))
G.plot()

In [None]:
G.sample(100, vb=True)
G.plot()

## Approximating the BPZ $p(z)'s$

First let's try drawing some samples:

Quantiles cannot be computed directly from gridded PDFs - we need to make a GMM model first, and use this to instantiate a PDF object using that GMM as `truth`.

In [None]:
samples = G.sample(1000, vb=False)
# S = qp.PDF(samples=samples)
# S.plot()
G.plot()

In [None]:
mixmod = G.mix_mod_fit(n_components=3)
G.plot()

In [None]:
print(mixmod.ppf(np.array([0.1, 0.2, 0.95])))

In [None]:
M = qp.PDF(truth=mixmod)
M.plot()