# Converting CSV to ROOT Format: A Simple C++ Example

In this notebook, we’ll walk through converting a CSV file into ROOT format using C++. We’ll download an ATLAS Open Data CSV file, transform it into ROOT format, and then use ROOT to produce some plots.

<CENTER>
    <a href="http://opendata.atlas.cern" class="icons"><img src="../../images/ATLASOD.gif" style="width:40%"></a>
</CENTER>

## Required Libraries

First, we need to include some essential libraries for handling files, trees, and system commands in ROOT.

In [1]:
#include "Riostream.h"
#include "TString.h"
#include "TFile.h"
#include "TTree.h"
#include "TSystem.h"
#include <stdio.h>
#include <stdlib.h>

We also turn on JSROOT for interactive visualization.

In [2]:
%jsroot on

## Downloading the CSV File

We will download the ATLAS Open Data CSV file. This dataset comes from the 8 TeV release and is typically used in web-apps like the Histogram Analyzer.

We download the CSV ATLAS Open Data file. This is the input of a web-app called Histogram analysers, for the 8 TeV release

In [3]:
// These two lines can be commented out if the CSV file is already provided. 
// They are used to remove any existing outreach.csv file and then download a fresh copy.

gSystem->Exec("rm outreach.csv");
gSystem->Exec("curl -L -o outreach.csv https://github.com/atlas-outreach-data-tools/notebooks-collection-opendata/raw/master/8-TeV-examples/cpp/outreach.csv");

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 8596k  100 8596k    0     0  1699k      0  0:00:05  0:00:05 --:--:-- 2346k


‼️ **Note**: After downloading the file, you will need to remove the first line from outreach.csv, as it contains column names that ROOT may not handle directly.

## Defining the CSV File in ROOT

Next, we inform ROOT that the downloaded file is in CSV format, and the values are separated by commas.

In [4]:
TString dir = gSystem->UnixPathName(__FILE__);
dir.ReplaceAll("outreach.C","");
dir.ReplaceAll("/./","/");

## Creating the ROOT File
Now, we define a new ROOT file to store the converted data from the CSV file.

In [5]:
TFile *f = new TFile("outreach.root","RECREATE");

## Setting Up the ROOT Tree

This is an important step. We define the structure of the new ROOT tree to match the columns in the CSV file, to ensure that the data is stored correctly.

In [6]:
TTree *tree = new TTree("ntuple","data from csv file");
// The CSV file contains the following columns: type, Channel, NJets, MET, Mll, LepDeltaPhi, METLLDeltaPhi, SumLepPt, BTags, weight
tree->ReadFile("outreach.csv","type/I:Channel/I:NJets/I:MET/F:Mll/F:LepDeltaPhi/F:METLLDeltaPhi/F:SumLepPt/F:BTags/F:weight/F",',');
f->Write();



‼️ **Note**: You may see a warning when running this line. This occurs because the first line of the CSV contains column names, but ROOT safely ignores this.

## Verifying the Output

Let's check the size of the generated files. You’ll notice that the ROOT file is significantly smaller (~37% of the original CSV size), which is an advantage when dealing with large datasets.

In [7]:
system("ls -lhrt outreach.*");

-rw-r--r--  1 marianaiv  staff   8.4M Sep 11 16:31 outreach.csv
-rw-r--r--  1 marianaiv  staff   3.0M Sep 11 16:31 outreach.root


## Loading the ROOT File
We can now open the newly created ROOT file and start analyzing the data it contains.

In [8]:
TFile *_file0 = TFile::Open("outreach.root");

## Creating Basic Plots

Let’s generate some simple plots to visualize the data from our ROOT file.

### 3D Scatter Plot: Missing Transverse Energy (MET) vs. Invariant Mass of Lepton Pairs (Mll) vs. Angular Separation of Leptons (LepDeltaPhi)
In this plot, we’ll visualize the relationships between three key variables:

- **MET (Missing Transverse Energy)**: The energy not accounted for in the visible particles, often due to neutrinos or other invisible particles.
- **Mll**: The invariant mass of the two leptons in the event.
- **LepDeltaPhi**: The angular separation between the two leptons in the event.

In [9]:
TCanvas *c3D = new TCanvas("c3D","c3D",10,10,400,400);
ntuple->Draw("MET:Mll:LepDeltaPhi","MET>0.");
c3D->Draw();

This 3D scatter plot helps to see if there is any correlation between these variables in events where MET is greater than 0. For example, large values of MET may indicate events involving neutrinos or other unseen particles.

### 2D Heatmap: Invariant Mass of Lepton Pairs (Mll) vs. Missing Transverse Energy (MET) Weighted by Event Weight
Next, we’ll plot a 2D heatmap where the color intensity represents the event weight. This plot will show the relationship between Mll and MET, helping us understand how different events are distributed across these variables.

In [10]:
TCanvas *cz = new TCanvas("cz","cz",10,10,400,400);
ntuple->Draw("Mll:MET","weight>-999","colz");
cz->Draw();

The heatmap is particularly useful for identifying regions of interest, such as where certain mass ranges of leptons coincide with large missing energy, which could indicate new physics or particle decays.

## 2D Heatmap: Invariant Mass of Lepton Pairs (Mll) vs. Angular Separation of Leptons (LepDeltaPhi)
Finally, let’s look at how the invariant mass of lepton pairs correlates with the angular separation between the leptons. This can be useful in studying the kinematics of the event, particularly when searching for new physics or specific particle decay signatures.

In [11]:
TCanvas *c2D = new TCanvas("c2D","c2D",10,10,400,400);
ntuple->Draw("Mll:LepDeltaPhi","MET>0.","colz");
c2D->Draw();

Here, we filter the events where MET is greater than zero, which may indicate unseen particles (like neutrinos). The plot allows us to analyze if there is a pattern between the angular separation of the leptons and their invariant mass.

## Further Analysis

Feel free to experiment with more complex analyses based on the other examples available in this repository. You can explore additional variables, and create more sophisticated plots.