# Exercise: RDataFrame basics

The file `../data/example_file.root` contains a `TTree` dataset (called `"dataset"`) with two scalar columns `a` and `b`.<br>
There is a normal distribution hidden in there, but to display it you have to plot the natural logarithm of `a` only for entries in which `b <= 0.5`.

### Useful links

- [RDataFrame class reference](https://root.cern/doc/master/classROOT_1_1RDataFrame.html)
- [RDataFrame tutorials](https://root.cern.ch/doc/master/group__tutorial__dataframe.html)

In [56]:
import ROOT as rt

In [57]:
!rootls -t ../data/example_file.root

[1mTTree  [0mAug 25 14:32 2021 dataset;1 "dataset" 
  b     "b/D"   16071
  a     "a/D"   16071
  vec2  "vec2"  47968
  vec1  "vec1"  47968
  [1mCluster INCLUSIVE ranges:
[0m   - # 0: [0, 1999]
  [1mThe total number of clusters is 1
[0m

In [58]:
treename = "dataset"
filename = "../data/example_file.root"
df0 = rt.RDataFrame(treename, filename)
df0.Display().Print()

+-----+-----------+----------+-----------+-----------+
| Row | a         | b        | vec1      | vec2      | 
+-----+-----------+----------+-----------+-----------+
| 0   | 0.977711  | 0.999742 | -3.220121 | 0.894402  | 
+-----+-----------+----------+-----------+-----------+
| 1   | 2.280201  | 0.484974 | -1.808350 | 0.080087  | 
|     |           |          | 0.236065  | 0.479906  | 
|     |           |          | -3.977131 | 0.519888  | 
|     |           |          | -0.293643 | 0.317273  | 
+-----+-----------+----------+-----------+-----------+
| 2   | 0.563482  | 0.392314 |           |           | 
+-----+-----------+----------+-----------+-----------+
| 3   | 3.042156  | 0.333539 | 0.727539  | 0.796610  | 
|     |           |          | -3.812584 | 0.331128  | 
|     |           |          | -2.874165 | -0.002779 | 
+-----+-----------+----------+-----------+-----------+
| 4   | 28.574399 | 0.648126 | -4.706250 | 0.427770  | 
|     |           |          | -4.449087 | -0.800848 |

In [59]:
df = df0.Filter("b<=0.5", "b > 0.5 filter")

In [60]:
df = df.Define("ln_a", "log(a)")

In [61]:
df.Display().Print()

+-----+----------+----------+-----------+-----------+-----------+
| Row | a        | b        | ln_a      | vec1      | vec2      | 
+-----+----------+----------+-----------+-----------+-----------+
| 1   | 2.280201 | 0.484974 | 0.824264  | -1.808350 | 0.080087  | 
|     |          |          |           | 0.236065  | 0.479906  | 
|     |          |          |           | -3.977131 | 0.519888  | 
|     |          |          |           | -0.293643 | 0.317273  | 
+-----+----------+----------+-----------+-----------+-----------+
| 2   | 0.563482 | 0.392314 | -0.573619 |           |           | 
+-----+----------+----------+-----------+-----------+-----------+
| 3   | 3.042156 | 0.333539 | 1.112566  | 0.727539  | 0.796610  | 
|     |          |          |           | -3.812584 | 0.331128  | 
|     |          |          |           | -2.874165 | -0.002779 | 
+-----+----------+----------+-----------+-----------+-----------+
| 5   | 0.311502 | 0.207780 | -1.166350 |           |           | 


In [None]:
c = rt.TCanvas("c", "canvas", 800, 600)
h = df.Histo1D(("h", "natural dist", 100, -4, 4), "ln_a")
h.Draw()
c.Draw()