# Exploring a `qp` file

This notebook takes you through what the data structure of an `Ensemble` looks like, and what a `qp` HDF5 file contains.

In [3]:
import qp
import h5py
import tables_io
import numpy as np
from scipy import stats

## What's in a `qp` file?

First, let's read in an `Ensemble` from an HDF5 file using `qp` and take a look at the metadata.

In [31]:
ens_i = qp.read("interp-ensemble.hdf5")
ens_i

Ensemble(the_class=interp,shape=(10, 50))

In [32]:
ens_i.metadata

{'pdf_name': array([b'interp'], dtype='|S6'),
 'pdf_version': array([0]),
 'xvals': array([-1.        , -0.87755102, -0.75510204, -0.63265306, -0.51020408,
        -0.3877551 , -0.26530612, -0.14285714, -0.02040816,  0.10204082,
         0.2244898 ,  0.34693878,  0.46938776,  0.59183673,  0.71428571,
         0.83673469,  0.95918367,  1.08163265,  1.20408163,  1.32653061,
         1.44897959,  1.57142857,  1.69387755,  1.81632653,  1.93877551,
         2.06122449,  2.18367347,  2.30612245,  2.42857143,  2.55102041,
         2.67346939,  2.79591837,  2.91836735,  3.04081633,  3.16326531,
         3.28571429,  3.40816327,  3.53061224,  3.65306122,  3.7755102 ,
         3.89795918,  4.02040816,  4.14285714,  4.26530612,  4.3877551 ,
         4.51020408,  4.63265306,  4.75510204,  4.87755102,  5.        ])}

Now that we know for sure this file contains an `Ensemble`, let's use `tables_io` to read in the data in the file and get a look at how it's formatted. 

In [43]:
file_tab_i = tables_io.read("./interp-ensemble.hdf5")
file_tab_i.keys()

odict_keys(['ancil', 'data', 'meta'])

It is an ordered dictionary with three keys: **meta** for metadata, **data** for objdata, and **ancil** for ancillary data. Let's take a look at each of these dictionaries to see how they're formatted:

In [61]:
file_tab_i["meta"]

OrderedDict([('pdf_name', array([b'interp'], dtype='|S6')),
             ('pdf_version', array([0])),
             ('xvals',
              array([[-1.        , -0.87755102, -0.75510204, -0.63265306, -0.51020408,
                      -0.3877551 , -0.26530612, -0.14285714, -0.02040816,  0.10204082,
                       0.2244898 ,  0.34693878,  0.46938776,  0.59183673,  0.71428571,
                       0.83673469,  0.95918367,  1.08163265,  1.20408163,  1.32653061,
                       1.44897959,  1.57142857,  1.69387755,  1.81632653,  1.93877551,
                       2.06122449,  2.18367347,  2.30612245,  2.42857143,  2.55102041,
                       2.67346939,  2.79591837,  2.91836735,  3.04081633,  3.16326531,
                       3.28571429,  3.40816327,  3.53061224,  3.65306122,  3.7755102 ,
                       3.89795918,  4.02040816,  4.14285714,  4.26530612,  4.3877551 ,
                       4.51020408,  4.63265306,  4.75510204,  4.87755102,  5.        ]]))])

In [44]:
file_tab_i["data"]

OrderedDict([('yvals',
              array([[3.84729931e-22, 1.45447239e-19, 3.77974403e-17, 6.75191079e-15,
                      8.29083747e-13, 6.99805751e-11, 4.06035506e-09, 1.61941410e-07,
                      4.43975721e-06, 8.36696446e-05, 1.08388705e-03, 9.65178258e-03,
                      5.90797206e-02, 2.48586039e-01, 7.18989314e-01, 1.42947162e+00,
                      1.95360176e+00, 1.83528678e+00, 1.18516614e+00, 5.26092288e-01,
                      1.60528458e-01, 3.36704974e-02, 4.85461093e-03, 4.81134752e-04,
                      3.27782998e-05, 1.53501817e-06, 4.94137729e-08, 1.09342729e-09,
                      1.66317981e-11, 1.73898526e-13, 1.24985604e-15, 6.17492215e-18,
                      2.09705770e-20, 4.89549569e-23, 7.85579903e-26, 8.66545669e-29,
                      6.57052311e-32, 3.42464720e-35, 1.22698465e-38, 3.02182853e-42,
                      5.11573344e-46, 5.95324006e-50, 4.76218523e-54, 2.61858437e-58,
                      9.8976993

In [45]:
file_tab_i["ancil"]

OrderedDict([('ids', array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]))])

We can get a similar data structure from the `Ensemble` itself by using the method `build_tables`, which is what is called to create a dictionary of the three main data tables before writing an `Ensemble` to file. 

In [42]:
tables_i = ens_i.build_tables()
tables_i.keys()

dict_keys(['meta', 'data', 'ancil'])

We can compare the metadata tables generated by the `build_tables` method to the ones read in from file by `tables_io`: 

In [47]:
print("From build_tables method:")
print(tables_i["meta"])
print("From file:")
print(file_tab_i["meta"])

From build_tables method:
{'pdf_name': array([b'interp'], dtype='|S6'), 'pdf_version': array([0]), 'xvals': array([[-1.        , -0.87755102, -0.75510204, -0.63265306, -0.51020408,
        -0.3877551 , -0.26530612, -0.14285714, -0.02040816,  0.10204082,
         0.2244898 ,  0.34693878,  0.46938776,  0.59183673,  0.71428571,
         0.83673469,  0.95918367,  1.08163265,  1.20408163,  1.32653061,
         1.44897959,  1.57142857,  1.69387755,  1.81632653,  1.93877551,
         2.06122449,  2.18367347,  2.30612245,  2.42857143,  2.55102041,
         2.67346939,  2.79591837,  2.91836735,  3.04081633,  3.16326531,
         3.28571429,  3.40816327,  3.53061224,  3.65306122,  3.7755102 ,
         3.89795918,  4.02040816,  4.14285714,  4.26530612,  4.3877551 ,
         4.51020408,  4.63265306,  4.75510204,  4.87755102,  5.        ]])}
From file:
OrderedDict({'pdf_name': array([b'interp'], dtype='|S6'), 'pdf_version': array([0]), 'xvals': array([[-1.        , -0.87755102, -0.75510204, -0.6326

They are essentially identical. One thing to note is that `build_tables` can also encode any strings in the **ancil** table, if you provide it with the appropriate arguments. This is useful if you will be writing to HDF5 files. 

## Creating a `qp` file from scratch 

Now let's try to create an `Ensemble` file from scratch, and see if we can read it in as an `Ensemble`. First, we need a metadata table (dictionary) with the appropriate keys. Let's make it an **interpolation** parameterized `Ensemble` as well, so it will need: "pdf_name", "pdf_version", and "xvals". Note that for this to be a `Table-like` object, there are a few things we have to make sure to do:
1. All values must be iterable, so they all must be arrays
2. They must all have the same length, or first dimension. Since "xvals" will inherently have more values than any other value in the metadata here, we make it a 2D array, so the first dimension is 1. 

In [57]:
xvals = np.array([np.linspace(0,5,10, )])
new_meta = {"pdf_name": np.array(["interp".encode()]),"pdf_version": np.array([0]),"xvals": xvals}
new_meta

{'pdf_name': array([b'interp'], dtype='|S6'),
 'pdf_version': array([0]),
 'xvals': array([[0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
         2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ]])}

So far, that looks like it matches the format of our `Ensemble` tables above. Now let's make a **data** table with the "yvals" for 3 distributions.

In [50]:
yvals = np.array([[0,1,2,3,4,4,3,2,1,0],[0,0.1,0.2,0.3,0.4,0.4,0.3,0.2,0.1,0],[0,1,2,3,4,5,4,2,1,0]])
new_data = {"yvals": yvals}
new_data

{'yvals': array([[0. , 1. , 2. , 3. , 4. , 4. , 3. , 2. , 1. , 0. ],
        [0. , 0.1, 0.2, 0.3, 0.4, 0.4, 0.3, 0.2, 0.1, 0. ],
        [0. , 1. , 2. , 3. , 4. , 5. , 4. , 2. , 1. , 0. ]])}

In [52]:
ancil = np.linspace(0,2,3)
new_ancil = {"ids": ancil}
new_ancil

{'ids': array([0., 1., 2.])}

In [62]:
data_tables = {"meta":new_meta, "data": new_data, "ancil": new_ancil}
tables_io.write(data_tables, "new-interp-ensemble.hdf5")

'new-interp-ensemble.hdf5'

Now the tables wrote to file, but is it a `qp` file? Let's check:

In [58]:
qp.is_qp_file("new-interp-ensemble.hdf5")

True

Yay! We've successfully created a `qp` file. Let's try reading it in as an `Ensemble` to make sure. 

In [60]:
new_ens = qp.read("new-interp-ensemble.hdf5")
new_ens

Ensemble(the_class=interp,shape=(3, 10))