# Iterate Over Tabular Data

**Author:** Eric Charles

**Last Run Successfully:** April 26, 2022


This notebook demonstrates three ways to iterate over tabular data:

1. Using the `tables_io.iteratorNative` function

2. Using the `rail.core.data.TableHandle` data handle object

3. Using the `rail.core.stage.RailStage` functionality

In [None]:
# Basic imports
import os
import rail
import tables_io
from rail.core.stage import RailStage
from rail.core.data import TableHandle

Get access to the RAIL DataStore, and set it to allow us to overwrite data.

Allowing overwrites will prevent errors when re-running cells in the notebook.

In [None]:
DS = RailStage.data_store
DS.__class__.allow_overwrite = True

Set up the path to the test data.

In [None]:
from rail.core.utils import find_rail_file
pdfs_file = find_rail_file("examples_data/testdata/test_dc2_training_9816.hdf5")

Get access to the data directly, using the DataStore.read_file function.

This will load the entire table from the file we are reading.

In [None]:
data = DS.read_file('input', TableHandle, pdfs_file)

In [None]:
print(data())

## tables_io.iteratorNative function

This will open the HDF5 file, and iterate over the file, returning chunks of data

In [None]:
# set up the iterator, and see what type of objec the iterator is
x = tables_io.iteratorNative(pdfs_file, groupname='photometry', chunk_size=1000)
print(x)
for xx in x:
    print(xx[0], xx[1], xx[2]['id'][0])

## rail.core.data.TableHandle data handle object

This will create a TableHandle object that points to the correct file, which can be use to iterate over that file.

In [None]:
th = TableHandle('data', path=pdfs_file)
x = th.iterator(groupname='photometry', chunk_size=1000)
print(x)
for xx in x:
    print(xx[0], xx[1], xx[2]['id'][0])

## rail.core.stage.RailStage functionality

This will create a RailStage pipeline stage, which takes as input an HDF5 file, 
so the `input_iterator` function can be used to iterate over that file.

In [None]:
from rail.core.util_stages import ColumnMapper

In [None]:
cm = ColumnMapper.make_stage(input=pdfs_file, chunk_size=1000, hdf5_groupname='photometry', columns=dict(id='bob'))
x = cm.input_iterator('input')
for  xx in x:
    print(xx[0], xx[1], xx[2]['id'][0])