# MadMiner physics tutorial (part 4B)

Johann Brehmer, Felix Kling, Irina Espejo, and Kyle Cranmer 2018-2019

## 0. Preparations

In [1]:
import logging

from madminer.fisherinformation import FisherInformation
from madminer.plotting import plot_fisher_information_contours_2d

In [2]:
# MadMiner output
logging.basicConfig(
    format="%(asctime)-5.5s %(name)-20.20s %(levelname)-7.7s %(message)s",
    datefmt="%H:%M",
    level=logging.INFO,
)

# Output of all other modules (e.g. matplotlib)
for key in logging.Logger.manager.loggerDict:
    if "madminer" not in key:
        logging.getLogger(key).setLevel(logging.WARNING)

## 1. Calculating the Fisher information from a SALLY model

We can use SALLY estimators (see part 3b of this tutorial) not just to define optimal observables, but also to calculate the (expected) Fisher information in a process. In `madminer.fisherinformation` we provide the `FisherInformation` class that makes this more convenient.

In [3]:
fisher = FisherInformation("data/lhe_data_shuffled.h5")
# fisher = FisherInformation('data/delphes_data_shuffled.h5')

17:00 madminer.analysis.da INFO    Loading data from data/lhe_data_shuffled.h5
17:00 madminer.utils.inter ERROR   HDF5 file does not contain nuisance parameters information
17:00 madminer.utils.inter ERROR   HDF5 file does not contain finite difference information
17:00 madminer.utils.inter ERROR   HDF5 file does not contain systematic information
17:00 madminer.analysis.da INFO    Found 2 parameters
17:00 madminer.analysis.da INFO      0: CWL2 (LHA: dim6 2, Power: 2, Range: (-20.0, 20.0))
17:00 madminer.analysis.da INFO      1: CPWL2 (LHA: dim6 5, Power: 2, Range: (-20.0, 20.0))
17:00 madminer.analysis.da INFO    Did not find nuisance parameters
17:00 madminer.analysis.da INFO    Found 6 benchmarks
17:00 madminer.analysis.da INFO    Found 3 observables
17:00 madminer.analysis.da INFO    Found 119750 events
17:00 madminer.analysis.da INFO      49987 signal events sampled from benchmark sm
17:00 madminer.analysis.da INFO      10000 signal events sampled from benchmark w
17:00 madminer.a

This class provides different functions:
- `rate_information()` calculates the Fisher information in total rates,
- `histo_information()` calculates the Fisher information in 1D histograms,
- `histo_information_2d()` calculates the Fisher information in 2D histograms,
- `full_information()` calculates the full detector-level Fisher information using a SALLY estimator, and
- `truth_information()` calculates the truth-level Fisher information.

Here we use the SALLY approach:

In [4]:
info_sally, _ = fisher.full_information(
    theta=[0.0, 0.0],
    model_file="models/sally",
    luminosity=300.0 * 1000.0,
)

print("Fisher information after 300 ifb:\n{}".format(info_sally))

17:00 madminer.ml.base     INFO    Loading model from models/sally
17:00 madminer.fisherinfor INFO    Found 2 parameters in Score Estimator model, matching 2 physical parameters in MadMiner file
17:00 madminer.fisherinfor INFO    Evaluating rate Fisher information
17:00 madminer.fisherinfor INFO    Evaluating kinematic Fisher information on batch 1 / 1
17:00 madminer.ml.base     INFO    Loading evaluation data
17:00 madminer.ml.base     INFO    Calculating Fisher information


Fisher information after 300 ifb:
[[77.59920987  2.08284195]
 [ 2.08284195 17.10510623]]


For comparison, we can calculate the Fisher information in the histogram of observables:

In [6]:
info_histo_1d, cov_histo_1d = fisher.histo_information(
    theta=[0.0, 0.0],
    luminosity=300.0 * 1000.0,
    observable="pt_j1",
    bins=[30.0, 100.0, 200.0, 400.0],
    histrange=[30.0, 400.0],
)

print("Histogram Fisher information after 300 ifb:\n{}".format(info_histo_1d))

17:00 madminer.fisherinfor INFO    Bins with largest statistical uncertainties on rates:
17:00 madminer.fisherinfor INFO      Bin 5: (0.21360 +/- 0.00999) fb (5 %)
17:00 madminer.fisherinfor INFO      Bin 1: (0.65462 +/- 0.01781) fb (3 %)
17:00 madminer.fisherinfor INFO      Bin 4: (1.43141 +/- 0.02562) fb (2 %)
17:00 madminer.fisherinfor INFO      Bin 3: (4.26209 +/- 0.04397) fb (1 %)
17:00 madminer.fisherinfor INFO      Bin 2: (8.78770 +/- 0.06473) fb (1 %)


AttributeError: 'NoneType' object has no attribute 'calculate_a'

We can do the same thing in 2D:

In [7]:
info_histo_2d, cov_histo_2d = fisher.histo_information_2d(
    theta=[0.0, 0.0],
    luminosity=300.0 * 1000.0,
    observable1="pt_j1",
    bins1=[30.0, 100.0, 200.0, 400.0],
    histrange1=[30.0, 400.0],
    observable2="delta_phi_jj",
    bins2=5,
    histrange2=[0, 6.2],
)

print("Histogram Fisher information after 300 ifb:\n{}".format(info_histo_2d))

17:00 madminer.fisherinfor INFO    Bins with largest statistical uncertainties on rates:
17:00 madminer.fisherinfor INFO      Bin (5, 2): (0.00376 +/- 0.00118) fb (31 %)
17:00 madminer.fisherinfor INFO      Bin (5, 3): (0.00705 +/- 0.00167) fb (24 %)
17:00 madminer.fisherinfor INFO      Bin (4, 2): (0.03723 +/- 0.00365) fb (10 %)
17:00 madminer.fisherinfor INFO      Bin (1, 2): (0.06269 +/- 0.00541) fb (9 %)
17:00 madminer.fisherinfor INFO      Bin (5, 4): (0.09857 +/- 0.00686) fb (7 %)
  inv_sigma = sanitize_array(1.0 / sigma)  # Shape (n_events,)


AttributeError: 'NoneType' object has no attribute 'calculate_a'

## 2. Calculating the Fisher information from a SALLY model

We can also calculate the Fisher Information using an ALICES model

In [None]:
info_alices, _ = fisher.full_information(
    theta=[0.0, 0.0],
    model_file="models/alices",
    luminosity=300.0 * 1000.0,
)

print("Fisher information using ALICES after 300 ifb:\n{}".format(info_alices))

## 3. Plot Fisher distances

We also provide a convenience function to plot contours of constant Fisher distance `d^2(theta, theta_ref) = I_ij(theta_ref) * (theta-theta_ref)_i * (theta-theta_ref)_j`:

In [None]:
_ = plot_fisher_information_contours_2d(
    [info_sally, info_histo_1d, info_histo_2d, info_alices],
    [None, cov_histo_1d, cov_histo_2d, None],
    inline_labels=["SALLY", "1d", "2d", "ALICES"],
    xrange=(-0.3, 0.3),
    yrange=(-0.3, 0.3),
)