# Systematic Uncertainties and Their Propagation

<CENTER><img src=\"../../images/ATLASOD.gif\" style=\"width:50%\"></CENTER>

This notebook uses ATLAS Open Data https://opendata.atlas.cern to teach you about systematic uncertainties!

ATLAS Open Data provides open access to proton-proton collision data at the LHC for educational purposes. ATLAS Open Data resources are ideal for high-school, undergraduate and postgraduate students.

## The Uncertainty of Measurements

Some numerical statements are exact, such as the number of books on your desk or the number of siblings you have. However, all *measurements*, no matter how carefully they are taken, have some degree of uncertainty that can come from a variety of sources. The process of evaluating uncertainties and identifying sources of error is called **error analysis**. The goal of error analysis is to properly estimate uncertainities in measurements and try to reduce them as much as possible. 

### The Importance of Knowing the Uncertainty

The associated uncertainty of any measurement is just as important as the measured value of the measurement because it gives information on how well the measurement was made. By not reporting the uncertainty of measurements, we may be mislead and/or not be able to make any valid conclusion. 

As an example, suppose a theory predicts a new particle with mass $m = 135$ GeV, fairly close to that of the Higgs boson ($m_H = 125$ GeV). Suppose two particle physicists have tested this theory and made mass measurements. The first physicist, named Shawn, reports his best estimate of the mass to be $m_\text{shawn} = 131$ GeV and says that is almost certaintly lies between 124 and 138 GeV. The second physicist, named Danielle, reports her best estimate to be $m_\text{danielle} = 126$ GeV with a probable range from 124 to 128 GeV. We can summarize their results like this:

$$ m_\text{shawn} = (131 \pm 7)\text{ GeV} \hspace{1cm} \text{and} \hspace{1cm} m_\text{danielle} = (126 \pm 2)\text{ GeV}. $$

There are a few things to note here:

* **Precision of Measurements.** Although Danielle's measurement is much more precise, Shawn's measurement could also be right. Both physicists states a range that they are confident $m$ lies, and these ranges overlap. Thus, it is possible that both statements are correct.

* **Uncertainties of Measurements.** The uncertainty of Shawn's measurement is so large that his results are pretty much useless. The mass of the predicted new particle and of the Higgs boson lie within his range, so it is possible that Shawn may have actually found a Higgs boson and not a new particle. We cannot make a valid conclusion using Shawn's measurement. However, Danielle's measurement indicate clearly that she found a Higgs boson; the mass of the Higgs boson lies within her range, while the mass of the predicted new particle lies far outside it.

* **Range of Uncertainity.** We see that in order to make a conclusion from our measurement, the uncertainty must not be too large. The uncertainties do not need to be extremely small, but small enough so that a conclusion can be reasonably made.

The important point to make from this (very simplified) example is that without stating the uncertainty, your measurement is useless. If we knew only their best estimates (131 for Shawn and 126 for Danielle), not only would we have been unable to draw a conclusion, but we could have been misled by Shawn into thinking that he did find this new particle since his result is closer to it than the Higgs boson.

In ATLAS analyses, we consider uncertainties for several reasons:

1. **Accurate Parameter Estimation:** To get reliable estimates of the parameters of interest, such as the Higgs boson couplings or the top quark mass, we need to account for all sources of uncertainty. Ignoring systematic uncertainties can lead to biased estimates and incorrect conclusions.

2. **Robust Hypothesis Testing:** In testing theoretical models against experimental data, systematic uncertainties ensure that discrepancies between the observed data and theoretical predictions are not mistakenly attributed to new physics or phenomena.

3. **Credible Confidence Intervals:** Confidence intervals derived from the data should reflect the true level of uncertainty in the measurements. By incorporating uncertainties, these intervals provide a more realistic range of values for the parameters of interest.

4. **Improved Comparisons with Other Experiments:** Systematic uncertainties enable more significant comparisons between results from different experiments or analyses.

5. **Informed Decision Making in Future Experiments:** Understanding and quantifying uncertainties help guide the design and improvement of future experiments. By identifying the sources of error, we can target specific areas for enhancement--such as improving detector calibration methods or refining theoretical models--and so reduce uncertainties in future measurements.

When comparing detector data to simulations, you may see a difference that might seem significant. Whether this difference is interesting or important requires understanding uncertainties. Agreement within uncertainties implies that the observed and predicted values are consistent.

### Reporting Measurements

Every measurement should be reported as a measured value with its uncertainty and appropriate unit:

$$ \text{measurement} = \text{(measured value $\pm$ uncertainty) units} $$

or

$$ x = x_\text{best} \pm \delta x, $$

where $x_\text{best}$ represents the best estimate of the measurement of some quantity $x$ and $\delta x$ is the associated uncertainty of the best estimate. Sometimes the **relative uncertainty** is used:

$$ \text{relative uncertainty} = \left| \frac{\text{uncertainty}}{\text{measured value}} \right| = \left| \frac{\delta x}{x_\text{best}} \right|. $$

This better expresses the quality of the measurement. For example, an uncertainty $\delta x = 1$ mm has different meaning when referring to a length $x_\text{best} = 3$ mm or to a length $x_\text{best} = 10$ m. 

## Types of Uncertainties

We often think of the word *error* as a mistake. In a scientific measurement, however, error refers to the fact that all measurements have uncertainties associated with them. Errors in scientific measurement are not mistakes and so cannot be eliminated by being very careful. The best we can do is identify sources of errors, make them as small as possible, and have a reliable estimate of how large they are. In particular, there are two types of errors: *statistical* and *systematic*.

### Statistical Uncertainties

**Statistical** (or **random**) **errors** are sources that cause unpredictable (random) fluctuations in a measurement. These types of errors can be detected statistically and can be reduced by taking a large number of measurements. 

As an example,suppose we use a stopwatch to time the swing of a pendulum. Each time we make a measurement, we may inadvertently start the watch too early or too late, and we may also stop the watch too early or too late. As a result, we may be underestimating or overestimating the time for each swing each time we make a measurement. Since either possibility is equally likely, this is a source of *random* error. By repeating the measurement several times, we may see a spread in the time from underestimating and overestimating. 

Uncertainties related to statistical errors are called **statistical uncertainties** and are labeled as $\delta f_\text{stat}$. A particular experiment can have lots of random errors, and so will have lots of statistical uncertainties. We will see soon how we can combine statistical uncertainites.

### Systematic Uncertainties

**Systematic errors** are sources that cause reproducible inaccuracies that are consistently in the same direction. These types of errors are difficult to detect and cannot be reduced by increasing the number of measurements. 

Consider the pendulum example again in which we use a stopwatch to time each swing. Suppose our stopwatch is faulty and runs slow (or fast). Then all of our times will be understimates (or overestimates). Since the times will all be wrong in the same diretion (either all understimates or all overestimates), this would be a systematic error, and no amount of repition with this stopwatch will reveal this source of error. If we see that our measurements are not matching with our predictions, we may consider repeating the experiment with a different stopwatch. If we find that our measurements are now agreeing with our predictions, this may help us indicate that the previous stopwatch is faulty and therefore eliminate (or reduce) this source of error.

Uncertainties related to systematic errors are called **systematic uncertainties** and are labeled as $\delta f_\text{sys}$.

#### Systematic Uncertainties in ATLAS Analyses

In ATLAS analyses there are various systematic uncertainties that have to be taken into account:

* **Luminosity Uncertainty.** ATLAS has released the 2015 and 2016 data; the 2015 data is $3.24 \pm 0.04$ fb$^{-1}$, and the 2016 data is $33.40 \pm 0.30$ fb$^{-1}$. This means that for a physics process with a cross section of 1 fb (one femtobarn), we would expect to see 3.24 events in the 2015 data, and 33.4 events in the 2016 data. The exact number of proton collisions is known with an uncertainity of about 1%, which is the most precise uncertainty on the luminosity at a hadron collider to date (see [this page]("https://arxiv.org/abs/2212.09379") for more information.)
  
* **Scale Factor Uncertainties.** These arise when the real data and simulation differ in some regard. For example, perhaps in real data an electron is correctly identified 85% of the time, and in simulation it is only correctly identified 83% of the time. A scale factor is applied to the simulation to correct it to match the data, and the uncertainties on that scale factor are included as systematic uncertainties in an analysis.

* **Calibration Uncertainties.** These arise when, for example, the momentum of a physics object must be calculated from measurements in the detector. For example, the momentum of a jet is calculated from measurements of charged particles in the inner detector and measurements of energy deposits in the calorimeter, among other things. These individual measurements all have uncertainties, and there are additional uncertainties in the way they are combined to produce a final momentum. All those uncertainties need to be included in an analysis.

Ensuring that all the possible variations and uncertainties have been included is quite difficult, and can require a great deal of experience. One excellent starting point is always to check a comparable data analysis and understand all the sources of uncertainty that were included for that analysis. For more information, check out [this page]("https://opendata.atlas.cern/docs/documentation/systematics").

## Combining Uncertainties

There are two instances in which we may need to combine uncertainties:

* **Multiple Uncertainties.** A quantity may have several statistical and/or systematic uncertainties which need to be combined in some way to give a *single* uncertainty. 

* **Indirect Quantities.** Often times, we are unable to directly measure a quantity and instead have to *calculate* it from quantities which *can* be directly measured. Since each measured quantity has an associated uncertainty, so too will the calculated quantity; the uncertainties of each measurement have to combine in some way to form an uncertainty in the calculated quantity.

The procedure of combining uncertainties is known as **propagation of uncertainties** (the uncertainties *propagate* through the calculations).

Suppose we have a simulation of $ZZ$ background. The simulation will have dozens of different variations and each variation will have its own systematic uncertainty. Using propagation of uncertainties, we can show the combined uncertainties of each variation. The original Higgs boson discovery paper, for example, included a figure with the distribution of the mass of the possible particle in events with four leptons:

<CENTER><img src=\"images/systematics_notebook/higgs_mass_distribution.png\" style=\"width:40%\"></CENTER>

**Figure 1.** Mass distribution of Higgs to $ZZ$ to four lepton candidate events from the [Higgs boson discovery]("https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/HIGG-2012-28/").

The hashed band on the estimated background (around the red histrogram in this case) is a combination of all the uncertainties included when calculating that background. We also see that all the data points (the black dots) have long vertical lines; these are called **error bars**, and they show the combined uncertainty for each data point. In this case, it gives the uncertainty in the event count per 5 GeV.

### Rules for Propagation of Uncertainties 

Suppose $z = z(x_1, x_2, \ldots, x_n)$ is any function of the quantities $x_1, x_2, \ldots, x_n$, to which we know their uncertainties $\delta x_1, \delta x_2, \ldots, \delta x_n$. Provided that all errors are independent, the uncertainty of $z$ is given by the general rule:

$$ \delta z = \sqrt{ \left(\frac{\partial z}{\partial x_1}\delta x_1 \right)^2 + \left(\frac{\partial z}{\partial x_2}\delta x_2 \right)^2 + \cdots + \left(\frac{\partial z}{\partial x_n}\delta x_n \right)^2 }. $$

This is sometimes written as

$$ \delta z = \frac{\partial z}{\partial x_1}\delta x_1 \oplus \frac{\partial z}{\partial x_2}\delta x_2 \oplus \cdots \oplus \frac{\partial z}{\partial x_n}\delta x_n, $$

where $\oplus$ denotes *addition in quadrature* (that is, you square each term, sum them up, and then take a square-root). From this general rule, we can find expressions for particular functions of $z$.

#### Sums and Differences of Measured Quantities

If $z = x_1 \pm x_2 \pm \cdots \pm x_n$, then the uncertainty in $z$ is given by

$$ \delta z = \sqrt{ (\delta x_1)^2 + (\delta x_2)^2 + \cdots (\delta x_n)^2 }. $$

#### Products and Quotients of Measured Quantities

If $z = x_1 \times x_2 \cdots \times x_n$ or $z = x_1 \div x_2 \cdots \div x_n$, then the relative uncertainty in $z$ is given by

$$ \frac{\delta z}{|z|} = \sqrt{ \left(\frac{\delta x_1}{x_1} \right)^2 + \left(\frac{\delta x_2}{x_2} \right)^2 + \cdots + \left(\frac{\delta x_n}{x_n} \right)^2 }. $$

#### Measured Quantity Times Exact Number

If $z = kx$, where $k$ is known exactly, then the relative uncertainty in $z$ is given by

$$ \frac{\delta z}{|z|} = \frac{\delta x}{|x|}. $$

#### Measured Quantity Raised to an Exact Power

If $z = x^n$ and $n$ is an exact number, then the relative uncertainty is given by

$$ \frac{\delta z}{|z|} = |n| \frac{\delta x}{|x|}. $$

### Effective Uncertainties

When performing a rigorours statistical analysis, having many uncertainties can cause simple practical problems---it is slow to calculate all the necessary numbers! One common approach to work around this is to sum together small uncertainties beforehand and create **effective uncertainties.** These uncertainties don't represent a single variation in particular, but a sum of several. For example, jets can have more than 100 uncertainties, but can be reduced to 20 or 30 effective terms. 

The downside of effective uncertainties, however, is that the meaning of each effective uncertainty may be difficult to understand. For example, instead of simply representing "the uncertainty on the jet momentum from mis-modeling the charged particle reconstruction efficiency," we now have an effective uncertainty that represents some purely mathematical construct that combines this particular uncertainty with various other uncertainties. There are some physics objects with effective uncertainties that have clear names and can be easily understood, but some of them may not represent something physical and have a name like "effective nuisance parameter number 3".

## Accuracy and Precision

For a single measurement, **accuracy** tells you how close your measurement is to an ideal, theoretical, or accepted value (assuming one exists). For a group of measurements, it is how close the *average* is to the ideal value. It is often reported quantitatively by the **relative error:**

$$ \text{Relative error} = \frac{\text{measured value - ideal value}}{\text{ideal value}}. $$

A positive sign for relative error indicates that the measured value was higher than the ideal value, and a negative sign indicates that it was lower. Often the relative error is multipled by 100 to give a percentage. Poor accuracy is usually an indication of large *systematic errors*. 


For a group of measurements, **precision** tells you how close your observed values are to one another. In other words, it is the degree of consistency, reliability, reproducibility, and agreement among independent measurements of the same quantity. It is often reported quantitatively by the **standard deviation**:

$$ \sigma = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2 }, $$

where $N$ is the number of measurements made, $x_i$ is the $i$th measured value, and $\bar{x}$ is the average of all the measured values. The standard deviation quantifies the *spread* of the measured values. A low standard deviation means a small spread in measurements (high precision), and a high standard deviation means a large spread in measurements (low precision). Poor precision is usually an indication of large *random errors*.

### Summary: Target Practice

To summarize the concepts we have discussed so far, consider the four target practice experiments shown below. Here each experiment involves a series of shots fired at a target, with the "ideal value" being the center of the target.

<CENTER><img src=\"images/systematics_notebook/target_practice.png\" style=\"width:40%\"></CENTER>

**Figure 2.** Summarizing systematic and random errors, accuracy, and precision using a target pratice analogy.

There are four cases to examine:

**(a)** None of the shots are close to the center of the target, so the accuracy is low. The systematic errors are high since the shots are all systematically off-centered in the same direction, in this case toward the upper right. On the other hand, the precision is high because all of the shots are close to one another. This means that the random errors are low.

**(b)** All of the shots made it to the center of the target, so the accuracy is high and the systematic errors are low. Furthermore, the shots are all close to one another, so the precision is also high and the random errors are low. This is the best case scenario, and it is what we strive for in any experiment.

**(c)** None of the shots are close to the center of the target, so the accuracy is low and systematic errors high. Worse still, the shots are not close to each other, so the precision is also low and random errors high. This is the worst case scenario, and it is what we try to avoid in any experiment.

**(d)** The shots are either fairly close to or at the center of the target, so the accuracy is high and systematic errors low. However, the shots are not very close to each other, so the precision is low and random errors high.

Although this target practice analogy summarizes the concepts nicely, it is misleading in one important aspect. Since we are given the position of the target for each experiment, we are able to easily tell how accurate the shots were. Knowing the position of the target amounts to knowing the ideal value of a measured quantity, and in the vast majority of real measurements, we do *not* know this value. 

To think about the difficulty in not knowing the ideal value of a measured quantity, consider the target practice analogy again, but without the positions of the targets. Although we can still easily identify the precision and random errors of the shots, there is no way of knowing the accuracy or systematic errors of the shots. 

<CENTER><img src=\"images/systematics_notebook/target_practice_improved.png\" style=\"width:40%\"></CENTER>

**Figure 2.** The same four target practice experiments but without the position of the target. This represents the true nature of most experiments, in which we do not know the "ideal value" (the value in nature) of what we are measuring. 