# Example for reading a correlation matrix

This example illustrates how to include a correlation matrix in your HEPData entry. 

## Creating example input data
First, we create an example input. You can safely skip reading this part of the code in detail. Just have a look at the plot below to see what the input data looks like so that you can later check it agrees with what you see in HEPData.

The input is a two-dimensional ROOT histogram containing the correlation coefficients between a number of signal region bins. The histogram is saved into a ROOT file, which we will subsequently read using the tools provided in `hepdata_lib`.

In [1]:
import numpy as np
import ROOT as r

# Create and fill histogram
nbins = 5
h2d = r.TH2D("correlation", "correlation", nbins, 0.5, nbins+0.5, nbins, 0.5, nbins+0.5)
for i in range(1,nbins+1):
    for j in range(1,nbins+1):
        h2d.Fill(i, j, np.exp(-(i-j)**2))

# Save output
f = r.TFile("correlation.root","RECREATE")
h2d.Write()
f.Close()

# Plot for your convenience
c1 = r.TCanvas()
h2d.Draw("COLZ,TEXT")
c1.SaveAs("plots/correlation.png")

Welcome to JupyROOT 6.28/06


Info in <TCanvas::Print>: png file plots/correlation.png has been created


The input histogram looks like this:
<img src="./plots/correlation.png" width="450">

## Reading the histogram into python
Now that we have our example input, let's read it back. Reading from ROOT files into python is performed by the `RootFileReader` class.

In [2]:
from hepdata_lib import RootFileReader

# Create a reader for the input file
reader = RootFileReader("correlation.root")

# Read the histogram, "correlation" is the histogram name
data = reader.read_hist_2d("correlation")


The data read out of the histogram is returned as a dictionary:

In [3]:
data.keys()

dict_keys(['x', 'y', 'x_edges', 'y_edges', 'z', 'dz', 'x_labels', 'y_labels'])

The values for each key are just lists of numbers:

In [4]:
data["x"]

[1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0,
 3.0,
 3.0,
 3.0,
 3.0,
 3.0,
 4.0,
 4.0,
 4.0,
 4.0,
 4.0,
 5.0,
 5.0,
 5.0,
 5.0,
 5.0]

Or lists of tuples with the upper/lower boundaries of each bin:

In [5]:
data["x_edges"]

[(0.5, 1.5),
 (0.5, 1.5),
 (0.5, 1.5),
 (0.5, 1.5),
 (0.5, 1.5),
 (1.5, 2.5),
 (1.5, 2.5),
 (1.5, 2.5),
 (1.5, 2.5),
 (1.5, 2.5),
 (2.5, 3.5),
 (2.5, 3.5),
 (2.5, 3.5),
 (2.5, 3.5),
 (2.5, 3.5),
 (3.5, 4.5),
 (3.5, 4.5),
 (3.5, 4.5),
 (3.5, 4.5),
 (3.5, 4.5),
 (4.5, 5.5),
 (4.5, 5.5),
 (4.5, 5.5),
 (4.5, 5.5),
 (4.5, 5.5)]

In any case, you do not have to worry too much about the format, because we will directly feed these numbers into the Variable objects for our HEPData entry, just as we would do in the other examples.

## Writing the matrix into a HEPData entry

This part is rather straightforward: We create independent `Variables` to represent the x and y axes, and a dependent `Variable` to hold the actual correlation values in each bin.

In [6]:
from hepdata_lib import Submission, Variable, Table

# Create variable objects
x = Variable("First Bin", is_independent=True, is_binned=False)
x.values = data["x"]

y = Variable("Second Bin", is_independent=True, is_binned=False)
y.values = data["y"]

correlation = Variable("Correlation coefficient", is_independent=False, is_binned=False)
correlation.values = data["z"]


Here, we chose to define the x/y `Variables` as unbinned, but we could equally well have defined them to be binned. In that case, we would have to feed the `x_edges` and `y_edges` lists into the Variable rather than the bin centers.

Finally, as always, we add the `Variables` to a `Table`, add the `Table` to a `Submission` and create the output files.

In [7]:
# Create the table object and add the variables
table = Table("Correlation coefficients between signal region bins")
for var in [x,y,correlation]:
    table.add_variable(var)
table.add_additional_resource("ROOT file", "correlation.root", copy_file=True)  # optional

# Create the submission object and write output
sub = Submission()
sub.add_table(table)
sub.create_files("./output/", remove_old=True)

!ls -l submission.tar.gz

-rw-r--r--  1 watt  staff  4820 13 Oct 19:14 submission.tar.gz


That's it! The `submission.tar.gz` file can immediately be uploaded to your [HEPData sandbox](https://www.hepdata.net/record/sandbox). You can also check out the already uploaded version [here](https://www.hepdata.net/record/sandbox/1554305576). In the sandbox, HEPData shows you a rendered version of your data, which you can compare to our initial plot above.