![ROOT Logo](http://root.cern.ch/img/logos/ROOT_Logo/website-banner/website-banner-%28not%20root%20picture%29.jpg)
<br />
# **NanoAOD files processed with Distristruted RDataFrame in Python**
<hr style="border-top-width: 4px; border-top-color: #34609b;">


[`df102_NanoAODDimuonAnalysis`](https://root.cern.ch/doc/master/df102__NanoAODDimuonAnalysis_8py.html) ROOT tutorial running with PyRDF.

The NanoAOD-like input files are filled with 66 mio. events from CMS OpenData containing muon candidates part of 2012 dataset ([DOI: 10.7483/OPENDATA.CMS.YLIC.86ZZ](http://opendata.cern.ch/record/6004) and [DOI: 10.7483/OPENDATA.CMS.M5AD.Y3V3](http://opendata.cern.ch/record/6030)).

The macro matches muon pairs and produces an histogram of the dimuon mass spectrum showing resonances up to the Z mass. Note that the bump at 30 GeV is not a resonance but a trigger effect.

Some more details about the dataset:
- It contains about 66 millions events (muon and electron collections, plus some other information, e.g. about primary vertices)
- It spans two compressed ROOT files located on EOS for about a total size of 7.5 GB.

Date: April 2019<br>
Author: Stefan Wunsch (KIT, CERN)<br>
Adapted to PyRDF: Javier Cervantes Villanueva (CERN)

**Requirements: ROOT-HEAD (Use the Bleeding Edge in the SWAN configuration)**

In [1]:
import PyRDF
import ROOT

# Configure PyRDF to run on Spark splitting the dataset into 32 partitions
PyRDF.use("AWS", {'npartitions': '32'})

# Add python module
#sc.addPyFile('./PyRDF.zip')
# Create dataframe from NanoAOD files
files = [
    "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root",
    "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012C_DoubleMuParked.root"
]

df = PyRDF.RDataFrame("Events", files);

Welcome to JupyROOT 6.22/07


In [3]:
import logging
# For simplicity, select only events with exactly two muons and require opposite charge
df_2mu = df.Filter("nMuon == 2", "Events with exactly two muons");
df_os  = df_2mu.Filter("Muon_charge[0] != Muon_charge[1]", "Muons with opposite charge");

# Compute invariant mass of the dimuon system
# Uses InvariantMass, provided by RVec (see the reference for other RVec helper functions)
df_mass = df_os.Define("Dimuon_mass", "InvariantMass(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")

# Make histogram of dimuon mass spectrum
h = df_mass.Histo1D(("Dimuon_mass", "Dimuon_mass", 30000, 0.25, 300), "Dimuon_mass")

# Report is not supported yet in the Spark backend
# report = df_mass3.Report()

# Produce pdf of the plot
logging.basicConfig(level=logging.INFO)

ROOT.gStyle.SetOptStat(0)
ROOT.gStyle.SetTextFont(42)

c = ROOT.TCanvas("c", "", 800, 700);
c.SetLogx(); c.SetLogy();
h.SetTitle("");
h.GetXaxis().SetTitle("m_{#mu#mu} (GeV)"); h.GetXaxis().SetTitleSize(0.04);
h.GetYaxis().SetTitle("N_{Events}"); h.GetYaxis().SetTitleSize(0.04);
h.Draw();

label = ROOT.TLatex()
label.SetNDC(True);

label.DrawLatex(0.175, 0.740, "#eta");
label.DrawLatex(0.205, 0.775, "#rho,#omega");
label.DrawLatex(0.270, 0.740, "#phi");
label.DrawLatex(0.400, 0.800, "J/#psi");
label.DrawLatex(0.415, 0.670, "#psi'");
label.DrawLatex(0.485, 0.700, "Y(1,2,3S)");
label.DrawLatex(0.755, 0.680, "Z");
label.SetTextSize(0.040); label.DrawLatex(0.100, 0.920, "#bf{CMS Open Data}");
label.SetTextSize(0.030); label.DrawLatex(0.630, 0.920, "#sqrt{s} = 8 TeV, L_{int} = 11.6 fb^{-1}");

c.SaveAs("dimuon_spectrum.pdf");
# Report is not supported yet in the Spark backend
#report.Print();

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:root:Before lambdas invoke. Number of lambdas: 32
INFO:root:New lambda - 0
INFO:root:New lambda - 1
INFO:root:New lambda - 2
INFO:root:New lambda - 3
INFO:root:New lambda - 4
INFO:root:New lambda - 5
INFO:root:New lambda - 6
INFO:root:New lambda - 7
INFO:root:New lambda - 8
INFO:root:New lambda - 9
INFO:root:New lambda - 10
INFO:root:New lambda - 11
INFO:root:New lambda - 12
INFO:root:New lambda - 13
INFO:root:New lambda - 14
INFO:root:New lambda - 15
INFO:root:New lambda - 16
INFO:root:New lambda - 17
INFO:root:New lambda - 18
INFO:root:New lambda - 19
INFO:root:New lambda - 20
INFO:root:New lambda - 21
INFO:root:New lambda - 22
INFO:root:New lambda - 23
INFO:root:New lambda - 24
INFO:root:New lambda - 25
INFO:root:New lambda - 26
INFO:root:New lambda - 27
INFO:root:New lambda - 28
INFO:root:New lambda - 29
INFO:root:New lambda - 30
INFO:root:New lambda - 31
INFO:root:All lambdas have been 

(32, 17.27684760093689, 45.67579483985901, 18.204992055892944)


Info in <TCanvas::Print>: pdf file dimuon_spectrum.pdf has been created


The previous cell computed the result and saved it as a pdf image. Now we can visualize the plot online without triggering the event loop again, since the results have been cached:

In [4]:
%jsroot on

ROOT.gStyle.SetTextFont(42)

c2 = ROOT.TCanvas("c2", "", 800, 700);
c2.SetLogx(); c2.SetLogy();

h.SetTitle("");
h.GetXaxis().SetTitle("m_{#mu#mu} (GeV)"); h.GetXaxis().SetTitleSize(0.04);
h.GetYaxis().SetTitle("N_{Events}"); h.GetYaxis().SetTitleSize(0.04);
h.SetStats(False)
h.Draw();

label = ROOT.TLatex()
label.SetNDC(True);

label.DrawLatex(0.175, 0.740, "#eta");
label.DrawLatex(0.205, 0.775, "#rho,#omega");
label.DrawLatex(0.270, 0.740, "#phi");
label.DrawLatex(0.400, 0.800, "J/#psi");
label.DrawLatex(0.415, 0.670, "#psi'");
label.DrawLatex(0.485, 0.700, "Y(1,2,3S)");
label.DrawLatex(0.755, 0.680, "Z");
label.SetTextSize(0.040); label.DrawLatex(0.100, 0.920, "#bf{CMS Open Data}");
label.SetTextSize(0.030); label.DrawLatex(0.630, 0.920, "#sqrt{s} = 8 TeV, L_{int} = 11.6 fb^{-1}");

label.DrawClone("Same")
c2.Draw()