# Exploring the BPZ Test Data

_Alex Malz & Phil Marshall_

We have a small dataset to test our `qp` approximations on: 30,000 photometric redshift 1D posterior PDFs, in "gridded" format, from Melissa Graham (UW, LSST). In this notebook we visualize these distributions, and develop machinery to evaluate our approximations on the whole set in "survey mode." 

## Set-up, Ingest

In [None]:
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

import qp

In [None]:
with open('bpz_euclid_test_10_2.probs', 'rb') as data_file:
    lines = (line.split(None) for line in data_file)
    lines.next()
    # lines.next()
    pdfs = np.array([[float(line[k]) for k in range(1,len(line))] for line in lines])
data_file.close()

In [None]:
print(np.shape(pdfs))

We'll need an array of redshifts too, the $z$ in $p(z)$. This is 350 elements long, for the data we are using.

In [None]:
# fake_zs = np.arange(0., 1., 1./350)
z = np.arange(0.01, 3.51, 0.01, dtype='float')

## Visualizing the BPZ $p(z)$'s

Let's plot a few PDFs, using the `qp` class.

In [None]:
print(pdfs[1])

In [None]:
plt.plot(z, pdfs[1])
plt.xlabel('redshift $z$', fontsize=16);

In [None]:
G = qp.PDF(gridded=(z, pdfs[1]))
G.plot()

## Approximating the BPZ $p(z)'s$

First let's try drawing some samples:

In [None]:
samples = G.sample(100, vb=True)
S = qp.PDF(samples=samples)
S.plot()

This throws an error: 
```
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-45-e72df3b030ba> in <module>()
----> 1 samples = G.sample(100, vb=True)
      2 S = qp.PDF(samples=samples)
      3 S.plot()

/Users/pjm/work/bayesian/qp/qp/pdf.py in sample(self, N, infty, using, vb)
    286                 weights = self.histogram[1]
    287 
--> 288             ncats = len(weights)
    289             cats = range(ncats)
    290             sampbins = [0]*ncats

UnboundLocalError: local variable 'weights' referenced before assignment
```
Sampling from gridded representations seems not to work. Let's try making some quantiles instead:

In [None]:
quantiles = G.quantize(number=10)
Q = qp.PDF(quantiles=quantiles)
Q.plot()

Quantiles cannot be computed directly from gridded PDFs - we need to make a GMM model first, and use this to instantiate a PDF object using that GMM as `truth`.