# Iterator Example

`qp` has a built in method to create a generator object that can be used to iterate through a `qp` file. In this notebook we will test this out using `Ensembles` from a file. 

In [1]:
import numpy as np
import os
import qp

Let's read in our file and see what the `Ensemble` looks like.

In [2]:
# the path to the file
QP_DIR = os.path.abspath(os.path.dirname(qp.__file__))
data_file = "../../tests/test_data/test.hdf5"

In [3]:
ens = qp.read(data_file)
print(ens)

Ensemble(the_class=mixmod,shape=(100, 3))


We have an `Ensemble` of 100 Gaussian mixed model distributions, with 3 Gaussian components each. That's a lot to handle at once. However, instead of reading in the whole file at once we can use the `iterator` method to create a generator, which we can then use to iterate through a subset of `Ensembles` at a time. We would still want to know how many distributions are in the file, though, so we know what chunk size to pick. To do that we can use the `qp.data_length` function:

In [4]:
qp.data_length(data_file)

100

Since we have 100 distributions, let's pick a chunk size of 10:

In [5]:
itr = qp.iterator(data_file, chunk_size=10)
type(itr)

generator

Now that we have our generator, we can iterate through each set of 10 `Ensembles` and get whatever we need from them. Let's check that the PDFs of the chunks we get match the PDFs for the chunk we expect. We'll evaluate the PDF at `test_vals` for each of the chunks.

In [6]:
test_vals = np.linspace(0., 1., 11)

In [7]:
for start, end, ens_i in itr:
    print(f"Chunk indices are: ({start}:{end})")
    if np.allclose(ens[start:end].pdf(test_vals), ens_i.pdf(test_vals)):
        print(f"The PDF values match")
    else:
        print(f"The PDF values for the iterated chunk do not match the values for the chunk from the whole Ensemble")

Chunk indices are: (0:10)
The PDF values match
Chunk indices are: (10:20)
The PDF values match
Chunk indices are: (20:30)
The PDF values match
Chunk indices are: (30:40)
The PDF values match
Chunk indices are: (40:50)
The PDF values match
Chunk indices are: (50:60)
The PDF values match
Chunk indices are: (60:70)
The PDF values match
Chunk indices are: (70:80)
The PDF values match
Chunk indices are: (80:90)
The PDF values match
Chunk indices are: (90:100)
The PDF values match


You can also do this all in one line, as shown below. This time we use a chunk size of 11 to demonstrate how the iteration behaves when the number of distributions is not evenly divided by the given chunk size:

In [8]:
for start, end, ens_chunk in qp.iterator(data_file, chunk_size=11):
    print(f"Indices are: ({start}, {end})")
    print(ens_chunk)

Indices are: (0, 11)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (11, 22)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (22, 33)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (33, 44)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (44, 55)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (55, 66)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (66, 77)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (77, 88)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (88, 99)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (99, 100)
Ensemble(the_class=mixmod,shape=(1, 3))


You can also use the `iterator` function in parallel. It takes `rank` and `parallel_size` arguments. 