# Analysis of an open dataset with TDataFrame
<hr style="border-top-width: 4px; border-top-color: #34609b;">
This ROOTbook produces a plot of the dimuon invariant mass spectrum starting from a subset of the CMS collision events of Run2010B. 
Every entry in the dataset represents a muon pair. The columns available and their types are:
- *E1*   double
- *eta1* double
- *phi1* double
- *px1*  double
- *py1*  double
- *pz1*  double
- *C2*   Long64_t
- *E2*   double
- *eta2* double
- *phi2* double
- *px2*  double
- *py2*  double
- *pz2*  double
- *C2*   Long64_t

Dataset Reference:<br>
McCauley, T. (2014). Dimuon event information derived from the Run2010B public Mu dataset. CERN Open Data Portal. DOI: [10.7483/OPENDATA.CMS.CB8H.MFFA](http://opendata.cern.ch/record/700).

## Objective of this exercises
The objective is to illustrate how to use TDataFrame to produce the plot of the dimuon invariant mass spectrum, apply some cuts and inspect their efficiency and create some control plots.
In some sense, this is a minimal form of what could be conceptually called an "analysis".

### TDataFrame creation
Let's create a *TDataFrame*, which is located in the *ROOT::Experimental* namespace. The filename(s) can be specified as a string (with a path or a glob) or a list of strings.

In [1]:
auto fileName = "https://root.cern/files/teaching/CMS_Open_Dataset.root";
auto tdf = ROOT::Experimental::TDataFrame("data", fileName);

### Definition of the "analisys" cuts
We need to apply some quality cuts to our muons:
- Central muons, with a pesudorapidity smaller than 2.3
- Muons of opposite charge :)
- Muons with transverse momentum greater than 10 GeV
In addition, we will define the charge cut as a C++ lambda and the other two as strings.
The dataframe needs to be saved in a variable: we'll need that later.

In [2]:
auto chargeCut = [](Long64_t c1, Long64_t c2){ return c1 != c2;};
auto etaCutStr = "fabs(eta1) < 2.3 && fabs(eta2) < 2.3";
auto ptCutStr = "pt1 > 10 && pt2 > 10";
auto tdf_f = tdf.Filter(chargeCut, {"C1", "C2"}, "Opposite Charge")
                .Filter(etaCutStr, "|Eta| < 2.3")
                .Filter(ptCutStr, "Pt > 10");



In [3]:
### Creation of a column holding the invariant mass of the dimuon system
Here we create a new column, which does not exist in the original dataset. We can create it with the usual invariant mass formula, packed in a string.

 dd

input_line_75:2:2: error: expected expression
 ### Creation of a column holding the invariant mass of the dimuon system
 ^


In [None]:

auto invMassFormula = "sqrt(pow(E1+E2, 2) - (pow(px1+px2, 2) + pow(py1+py2, 2) + pow(pz1+pz2, 2)))";
auto tdf_fd = tdf_f.Define("invMass", invMassFormula);

auto pt1_h = tdf.Histo1D("pt1");
auto pt2_h = tdf.Histo1D("pt2");

auto invMass_h = tdf_fd.Histo1D({"invMass","CMS Opendata;#mu#mu mass [GeV];Events",512,5,110},"invMass");
auto pi = TMath::Pi();
auto phis_h = tdf_fd.Histo2D({"", "", 64, -pi, pi, 64, -pi, pi}, "phi1", "phi2");


TCanvas muonsPts;
pt1_h->Draw("PL PLC PMC");
pt2_h->Draw("Same PL PLC PMC");
muonsPts.Draw();

TCanvas phis;
phis_h->Draw("col");
phis.Draw();

tdf.Report();

TCanvas invMass;
invMass_h->Draw();
invMass.SetLogy();
invMass.SetLogx();
invMass.SetGrid();
invMass.Draw();

A little extra: interactive ROOT JavaScript visualisation.