# The ROOT file

* With ROOT, objects can be written to files

* ROOT provides its own file class, [TFile](https://root.cern/doc/master/classTFile.html), to interact with these files

* ROOT files are _binary_ and can be transparently _compressed_ to reduce disk usage

* ROOT files have a logical “file-system-like” structure

  * E.g. a directory hierarchy

Let's start with importing ROOT as usual

In [10]:
# To-do: import the ROOT module
import ROOT


This is how you create a `TFile`:

In [11]:
f = ROOT.TFile("my_file.root", "RECREATE")

<center><img src="images/tfile1.png"><center>

and how you close it (note that when `f` is destroyed, the file is closed automatically):

In [12]:
f.Close()

In [14]:
# Try it!
# To-do: Create a root file named "try_it.root" with the option that a new file is created and opened only if it doesn't exist
f_try = ROOT.TFile("try_it.root", "RECREATE")

# To-do: Now close it using the function Close()
f_try.Close()

- The file we've just created is empty, let's actually write something in it this time.

- We will write a histogram object in it. 

- Note how we create the histogram after creating the file, we write the histogram and we finally close the file.

In [15]:
# Example code
# Type with me!

f = ROOT.TFile("my_file.root", "RECREATE")

h = ROOT.TH1F("my_histo", "Example histogram",100, -4, 4)
f.WriteObject(h,h.GetName())

f.Close

<cppyy.CPPOverload at 0x74faeb107a40>

In [17]:
# Try it!
# To-do: Open the "try_it.root" file that we have created for updating the content
f_try = ROOT.TFile("try_it.root", "UPDATE")

# To-do: Fill it with the histogram that we created in the example but store it with the name "try_histo"
h1 = ROOT.TH1F("try_histo", "Example histogram", 100, -4, 4)
f_try.WriteObject(h,"try_histo")

# To-do: Close the file using the function Close()
f_try.Close()


The `"my_histo"` argument of the `TH1F` constructor is the name of the histogram, and it is also how it will be identified inside the file, we'll see that in a minute.

We should now have a file called `my_file.root` in the current directory. We will check that by using the `%%bash` magic, which allows us to run bash commands from a cell:

In [18]:
%%bash
ls -l my_file.root

-rw-r--r-- 1 jovyan jovyan 512 Jun  3 16:28 my_file.root


In [19]:
# Try it!
# To-do: check if the try_it.root file exists
!ls -l try_it.root

-rw-r--r-- 1 jovyan jovyan 3800 Jun  3 16:32 try_it.root


We can also use the `rootls` command to inspect the contents of the ROOT file. See how the file contains an object called `my_histo` of type `TH1F`.

In [20]:
%%bash
rootls -l my_file.root

TH1F  Jun 03 16:28 2025 my_histo;1 "Example histogram" 


In [21]:
# Try it!
# To-do: Check the content of try_it.root. Verify that the histogram is stored under the name "try_histo"
!rootls -l try_it.root

[1mTH1F  [0mJun 03 16:32 2025 try_histo;1 "Example histogram" 


Finally, let's see how we can programmatically retrieve the histogram we just wrote in the file. 

We can access the histogram by its name using `TFile::Get()`.

In [22]:
f = ROOT.TFile("my_file.root") # READ is the default mode

h = f.Get("my_histo")
print(h)

Name: my_histo Title: Example histogram NbinsX: 100


Info in <TFile::Recover>: my_file.root, recovered key TH1F:my_histo at address 220


In [26]:
# Try it!
# To-do: Fetch the histogram from try_it.root with Get()
f_try = ROOT.TFile("try_it.root", "READ")
h_try = f_try.Get("try_histo")

print(h_try)

Name: my_histo Title: Example histogram NbinsX: 100


# The HEP dataset

High Energy Physics data is made of many statistically independent collision events. 

Laying data into an "event class", then serialise and write out `N` instances of the class into a file would be very inefficient. 

In ROOT, a dataset is organised columns that can store elements of any C++ type:
* fundamental types: `int`, `float`
* C++ standard collections: `std::vector`, `std::map`
* User created C++ classes

The ROOT dataset is represented by the `TTree` class and is often simply called a tree. Columns in the dataset are instances of the `TBranch` class (often referred to as "branches").

<center><img src="images/dataset.png"></center>

- A `TTree` dataset can be written to a `TFile` (just like any other C++ object). 

- The ROOT format is logically and physically (on disk) a columnar format. 

- Different columns can be read from disk independently. 

- This translates into faster IO performance with respect to other dataset formats (HDF5, SQL).

In [None]:
%%bash
rootls -l data/example_file.root