# Working with collections and object selections

- RDataFrame reads collections as the special type [ROOT::RVec](https://root.cern/doc/master/classROOT_1_1VecOps_1_1RVec.html) - e.g. a branch containing an array of floating point numbers can be read as a `ROOT::RVec<float>`.

- C-style arrays (with variable or static size), `std::vectors` and many other collection types can be read this way. 

- When reading ROOT data, column values of type `ROOT::RVec<T>` perform no copy of the underlying array.

- `RVec` is a container similar to `std::vector` (and can be used just like a `std::vector`) but it also offers a rich interface to operate on the array elements in a vectorised fashion, similarly to Python's NumPy arrays.

In [1]:
import ROOT

treename = "myDataset"
filename = "../../data/collections_dataset.root"
df = ROOT.RDataFrame(treename, filename)

print(f"Columns in the dataset: {df.GetColumnNames()}")



Columns in the dataset: { "E", "nPart", "px", "py" }


To quickly inspect the data we can export it as a dictionary of `numpy` arrays thanks to the `AsNumpy` RDataFrame method. 

Note that for each row, `E` is an array of values:

In [2]:
npy_dict = df.AsNumpy(["E"])

for row, vec in enumerate(npy_dict["E"]):
    print(f"\nRow {row} contains:\n{vec}")


Row 0 contains:
[1.30000009e+05 9.38279986e-01 9.39570896e-01 9.39570896e-01
 9.38279986e-01 9.39570896e-01 9.38279986e-01 9.39570896e-01
 9.39570896e-01 9.38279986e-01 9.39570896e-01 9.38279986e-01
 9.38279986e-01 9.46127220e-01 9.42971799e-01 1.06159914e+05
 1.55559735e+04 6.24647739e+03 7.82957493e+02 1.07818414e+03
 1.56453232e+02 1.96152407e+01 1.17119611e+04 3.84401191e+03
 3.08882265e+02 4.74075176e+02 8.68160556e+02 2.10023672e+02
 1.42949194e+02 1.35040429e+01 1.33994463e+01 2.80286398e+00
 3.41293143e+00 2.58320273e+02 2.15754872e+02 5.89137994e+01
 8.09246766e+02 2.58900920e+00 2.13855368e-01 9.38925202e+00]

Row 1 contains:
[1.30000009e+05 9.39570896e-01 9.38279986e-01 9.39570896e-01
 9.39570896e-01 9.38279986e-01 9.38279986e-01 9.39570896e-01
 9.38279986e-01 9.38279986e-01 9.39570896e-01 9.39570896e-01
 9.38279986e-01 9.43326970e-01 9.51940387e-01 9.40538388e-01
 9.43493281e-01 9.38381843e-01 9.45394108e-01 9.39982503e-01
 1.22217219e+05 7.39572446e+03 2.16711040e+00 5.97

### Define a new column with operations on RVecs

In [3]:
df1 = df.Define("good_pt", "sqrt(px*px + py*py)[E>100]")

`sqrt(px*px + py*py)[E>100]`:
- `px`, `py` and `E` are the columns, the elements of those columns are `RVec`s

- Operations on `RVec`s, such as sum, product, sqrt, preserve the dimensionality of the array

- `[E>100]` selects the elements of the array that satisfy the condition

- `E > 100`: boolean expressions on `RVec`s such as `E > 100` return a mask, that is an array with information which values pass the selection (e.g. `[0, 1, 0, 0]` if only the second element satisfies the condition)

### Now we can plot the newly defined column values in a histogram

In [4]:
c = ROOT.TCanvas()
h = df1.Histo1D(("pt", "pt", 16, 0, 4), "good_pt")
h.Draw()
c.Draw()