# Reading histograms

One of the most common tasks will be translating [ROOT](https://root.cern.ch) histograms into the HEPData format. `hepdata_lib` will help you with that, and this notebook will demonstrate how to do that.

As explained in the [Getting started notebook](Getting_started.ipynb), a `Submission` needs to exist or be created. Here, we'll just create one without any additional information:

In [1]:
from hepdata_lib import Submission
submission = Submission()

Welcome to JupyROOT 6.14/00


The plot will be a `Table`, in this example Figure 4a from page 12 (upper left) of the [publication](https://cms-results.web.cern.ch/cms-results/public-results/publications/B2G-17-009/index.html), which shows the distribution of the reconstructed B quark mass for the data as well as the individual simulated processes. Let's add all this, some more details as well as the actual plot (for thumbnail creation) to the `Table`:

In [2]:
from hepdata_lib import Table
table = Table("Figure 4a")
table.description = "Distribution in the reconstructed B quark mass, after applying all selections to events with no forward jet, compared to the background distributions estimated before fitting. The plot refers to the low-mass mB analysis. The expectations for signal MC events are given by the blue histogram lines. Different contributions to background are indicated by the colour-filled histograms. The grey-hatched error band shows total uncertainties in the background expectation. The ratio of observations to background expectations is given in the lower panel, together with the total uncertainties prior to fitting, indicated by the grey-hatched band."
table.location = "Data from Figure 4 (upper left), located on page 12."
table.keywords["observables"] = ["N"]
table.add_image("example_inputs/CMS-B2G-17-009_Figure_004-a.pdf")

The individual plot components are stored in different ROOT files, one for the individual background processes (one histogram per process plus the total), another one for the data, and the third for the signal process. All histograms here are of type [TH1](https://root.cern.ch/doc/master/classTH1.html), but you can also read in 2-dimensional [TH2](https://root.cern.ch/doc/master/classTH2.html) using `read_hist_2d(...)` instead of `read_hist_1d(...)`:

In [3]:
from hepdata_lib import RootFileReader

reader = RootFileReader("example_inputs/mlfit_lm_1000.root")
reader_data = RootFileReader("example_inputs/Data_cat0_singleH.root")
reader_signal = RootFileReader("example_inputs/BprimeBToHB1000_cat0_singleH.root")

TotalBackground = reader.read_hist_1d("shapes_prefit/cat0_singleH/total_background")
TT = reader.read_hist_1d("shapes_prefit/cat0_singleH/TT")
QCD = reader.read_hist_1d("shapes_prefit/cat0_singleH/QCDTT")
WJets = reader.read_hist_1d("shapes_prefit/cat0_singleH/WJets")
ZJets = reader.read_hist_1d("shapes_prefit/cat0_singleH/ZJets")

Data = reader_data.read_hist_1d("h_bprimemass_SRlm")

signal = reader_signal.read_hist_1d("h_bprimemass_SRlm")


[1mRooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby[0m 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt



   The StreamerInfo of class RooExpensiveObjectCache read from file example_inputs/mlfit_lm_1000.root
   has the same version (=2) as the active class but a different checksum.
   You should update the version to ClassDef(RooExpensiveObjectCache,3).
   Do not try to write objects with the current class definition,
   the files will not be readable.

the on-file layout version 2 of class 'RooExpensiveObjectCache' differs from 
the in-memory layout version 2:
   map<TString,ExpensiveObject*> _map; //
vs
   map<TString,RooExpensiveObjectCache::ExpensiveObject*> _map; //


The content of the histograms is stored as a dictionary, with keys `x` (bin center), `y` (bin value or for `TH2` the bin center of the 2nd dimension), `z` (`TH2` only: bin value), as well as the bin errors `dy` (`dz` for `TH2`). Furthermore, the lower and upper bin edges (`x_edges`, for `TH2` also `y_edges`) are stored for each bin:

In [4]:
TotalBackground.keys()

dict_keys(['x', 'y', 'x_edges', 'dy'])

The `RootFileReader` automatically recognises if the histogram has symmetric or assymmetric errors based on [TH1::GetBinErrorOption()](https://root.cern.ch/doc/master/classTH1.html#ac6e38c12259ab72c0d574614ee5a61c7). Symmetric errors are returned if this returns `TH1::kNormal`, in this case (as for the example here) the errors are a plain `float` per bin, otherwise a `tuple` of `float`. The bin edges are always stored as `tuple`:

In [5]:
from __future__ import print_function
for key in TotalBackground.keys():
    print(key, type(TotalBackground[key]), type(TotalBackground[key][0]))

x <class 'list'> <class 'float'>
y <class 'list'> <class 'float'>
x_edges <class 'list'> <class 'tuple'>
dy <class 'list'> <class 'float'>


Now define the variables:

In [6]:
from hepdata_lib import Variable, Uncertainty

# x-axis: B quark mass
mmed = Variable("$M_{bH}$", is_independent=True, is_binned=False, units="GeV")
mmed.values = signal["x"]

# y-axis: N events
sig = Variable("Number of signal events", is_independent=False, is_binned=False, units="")
sig.values = signal["y"]

totalbackground = Variable("Number of background events", is_independent=False, is_binned=False, units="")
totalbackground.values = TotalBackground["y"]

tt = Variable("Number of ttbar events", is_independent=False, is_binned=False, units="")
tt.values = TT["y"]

qcd = Variable("Number of qcd events", is_independent=False, is_binned=False, units="")
qcd.values = QCD["y"]

wjets = Variable("Number of wjets events", is_independent=False, is_binned=False, units="")
wjets.values = WJets["y"]

zjets = Variable("Number of zjets events", is_independent=False, is_binned=False, units="")
zjets.values = ZJets["y"]

data = Variable("Number of data events", is_independent=False, is_binned=False, units="")
data.values = Data["y"]

For the data as well as the background total, we will also provide the associated uncertainties:

In [7]:
from hepdata_lib import Uncertainty

unc_totalbackground = Uncertainty("total uncertainty", is_symmetric=True)
unc_totalbackground.values = TotalBackground["dy"]

unc_data = Uncertainty("Poisson errors", is_symmetric=True)
unc_data.values = Data["dy"]

totalbackground.add_uncertainty(unc_totalbackground)
data.add_uncertainty(unc_data)

Now we can add the variables to the `Table` and the `Table` to the `Submission`, and create the files. Please refer to the [Getting started notebook](Getting_started.ipynb) for a complete example.

In [8]:
table.add_variable(mmed)
table.add_variable(sig)
table.add_variable(totalbackground)
table.add_variable(tt)
table.add_variable(qcd)
table.add_variable(zjets)
table.add_variable(wjets)
table.add_variable(data)

submission.add_table(table)

submission.create_files("example_output",remove_old=True)

In [9]:
!head example_output/figure_4a.yaml

dependent_variables:
- header:
    name: Number of signal events
    units: ''
  values:
  - value: 0.95335
  - value: 0.83057
  - value: 3.5395
  - value: 1.8212
  - value: 0.04857
