# Principles of Data Science

### Lecture 15: Confidence intervals

_MPhil in Data Intensive Science_

**University of Cambridge**

<h2 style="color: blue">Matt Kenzie</h2>

[mk652@cam.ac.uk](mailto:mk652@cam.ac.uk)

## Confidence intervals

- Today's lecture covers 
    - Confidence intervals
    - Parameter estimation at physical limits / boundaries
    - Constraints 
    - Feldman-Cousins interval estimation

- Learning objectives:
    - Understand what confidence intervals are and what they represent
    - Understand the issues with classical intervals near physical boundaries
    - Be able to deploy methods to circumvent these issues

## Recap

- We have seen that MLE, least-squares and MoM all give us estimates and estimates of the uncertainty on those estimates
    - Minimum variance bound approximation (inverse of Hessian matrix)
    - Profiled log likelihood
    
- Wilks' theorem:
    - $-2\Delta\ln L$ asymptotically approaches $\Delta \chi^2$


## Confidence intervals

 - We have estimation methods to provide *point estimates* (a value) for parameters
 - But we are also interested in uncertainties (i.e. some confidence interval)
 - What is a *confidence interval*?
     - We believe the *true* value of parameter, $\theta$, lies within some interval, $\theta_l < \theta < \theta_h$, $\beta$% of the time
     - This is the confidence interval at $\beta$% C.L.
     - If this is true the interval is said to "*cover*"
 - We typically quote uncertainties with $\beta = 0.683$ corresponding to $1\sigma$ of a normal distribution

## Z-scores and confidence levels 1D

<img src="plots/intervals1.png" alt="drawing" width="800">
<img src="plots/Zscores.png" alt="drawing" width="800">

## Z-scores and confidence levels 2D
<img src="plots/intervals2.png" alt="drawing" width="800">
<img src="plots/Zscores2.png" alt="drawing" width="600">

<img src="plots/correlations.png" alt="drawing" width="1000">

## Bayesian credible intervals

- No such concept of *confidence* for a Bayesian so instead called *credible* intervals
- A Bayesian does not have a parameter estimate or confidence interval but instead has a *posterior p.d.f*
- Construct a *credible interval* by requiring

$$ \beta = \int_{\theta_l}^{\theta^h} p( \theta | X ) d\theta $$

- There are infinitely many of these but standard is to find the central (or narrowest) such interval

<img src="plots/credible_intervals1.png" alt="drawing" width="1000">


- For an upper limit can drop $\theta_l$ down to $\theta_{\text{min}}$

<img src="plots/credible_intervals2.png" alt="drawing" width="1000">


## Neyman-Pearson (classical) intervals

- For a frequentist the parameter, $\theta$, is **fixed**
- The interval is a member of a set of intervals such that

$$p( \theta \in [ \theta_l, \theta_h] ) = \beta $$

- Interval is built from the p.d.f of the observation, $X$, given a *fixed* value of the parameters $\theta$, $p(X; \theta)$

- <font color="red">This is **NOT** the same as the posterior</font> as $\theta$ is fixed

- Then can build a *confidence belt* from these p.d.f.s for different fixed values of $\theta$

    <img src="plots/confidence_belt.png" alt="drawing" width="1000">
    
- For some observation $X_0$ the *confidence interval* is constructed from the union of all $\theta$ in the belt

## Intermediate summary

- This just formalises our discussion and definition of intervals
- This is what we mean by an uncertainty
- We now see that the frequentist starts to run into problems if parameters get near physical limits

## Constraints in Fits

- <font color="green">*Discussion in lectures about applying constraints*</font>
- Can enforce by mathematical reparameterisation
- Or by inclusion of a constraint term in the likelihood

# <font color="darkred">*Musical interlude...*</font>

## Estimates and intervals near physical boundaries

- The advantage of the Bayesian approach is that we can enforce a physical boundary
- E.g. if parameter $\mu \geq 0$ then can have a prior:
$$ p(\mu) = 0 \quad \forall \; \mu <0 $$
- But what does the frequentist do?

<img src="plots/conf_belt_norm.png" alt="drawing">

- It is possible to measure a value which gives an <font color="blue">**empty**</font> confidence interval!
- The interval will then suffer from <font color="blue">**undercoverage**</font>

### A standard example of fitting a signal

- Imagine fitting a small signal on top of a small background
- I may want to enforce that the signal yield is positive
- Even if I don't my p.d.f has to be positive everywhere so there will be a lower limit to the yield I can fit

<img src="plots/under_coverage_fit.png" alt="drawing" width="800">

- If I enforce the limit to the parameter it will skew my error matrix
- The (profile) likelihood runs up to a boundary where it cannot be computed

<img src="plots/under_coverage1.png" alt="drawing" width="800">

- If I run several toys I see that I do not get a nice normal distribution
- Instead the values "stack up" at the boundary
- The result is that I have a very <font color="blue">**serious coverage problem**</font>

<img src="plots/under_coverage2.png" alt="drawing" width="800">

### Flip-flopping

- You may think one way around this issue is to:
   1) quote central value and uncertainty above some threshold
   2) quote upper limits below some threshold up to the boundary
   3) quote the upper limit at the boundary for any value beyond the boundary
   
- This leads to the wrong coverage in many regions 
- And discontinuities in the confidence belt (known as *flip-flopping*)

<img src="plots/flip_flop.png" alt="drawing" width="800">

## The Feldman-Cousins method

- A beautiful, and purely frequentist approach, to solve the problem
- Based on the assumption that if you want the correct coverage then why not construct the interval that has the correct coverage by defintion
- This requires producing simulation samples and then computing where 68% lie
- For a given $X$ start from ML estimate of $\hat{\mu}$ with the bound applied
- The for some fixed value of $\mu$ find the log-likelihood-ratio

$$ R = \frac{p(X|\mu)}{p(X|\hat{\mu})} $$

- Add values to $X$ from higher to lower $R$ (i.e. working outwards from the best fit point) until the desired probability content is reached

- The code for this is in the lecture notes

<img src="plots/fc_norm.png" alt="drawing" width="800">

### Feldman Cousins

- A more formulaic MC simulation based approach:
    
    1) Run ML fit to determine $-2\ln L(\hat{\mu}$) at best fit point $\hat{\mu}$
    2) Pick some other test value $\mu_0$ and compute $-2\ln L(\mu_0)$
    3) Now compute the LLR between these points, $R = -2\Delta \ln L(\mu_0)$
    4) Simulate a set of toy experiments from the parameters at the point $\mu_0$
    5) Compute the LLR for each of these $R'$
    6) The 1-CL value at the scan point is given by the fraction of toys for which $R' > R$.
    
<img src="plots/fc_example.png" alt="drawing" width="800">

# End of Lecture 15

By the end of this lecture you should:
   - Understand what confidence intervals are and what they represent
   - Understand the issues with classical intervals near physical boundaries
   - Be able to deploy methods to circumvent these issues