In [1]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

## Problem 1

In this problem we will explore the effects of correlations between data points on weighted averaging. Let’s assume we have four measurements of the cross-section for the production of top quarks at the Fermilab Tevatron collider in four different decay channels, each involving a single electron or a muon and 3 or 4 jets. (Don’t worry if this is all gibberish to you - the important thing is that we have four measurements, supposedly of the same true quantity.)

$$p\overline{p}X \rightarrow \frac{e}{\mu}+ \frac{3}{4} \text{ jets}$$

For each measurement, there are four potential sources of uncertainty, as shown below:

| Channel       | Measured| Stat  | e-ID  | μ-ID  | Lumi |
|---------------|---------|-------|-------|-------|------|
| e + 3-jets    | 7.91    | 0.99  | 0.24  |   -   | 0.59 |
| e + 4-jets    | 6.82    | 0.76  | 0.18  |   -   | 0.45 |
| μ + 3-jets    | 7.69    | 1.06  |   -   | 0.40  | 0.63 |
| μ + 4-jets    | 9.00    | 0.98  |   -   | 0.37  | 0.59 |

where

**Meas**: the measured cross-section in pico-barns

**Stat**: the statistical and other uncorrelated errors on the measurement
* uncorrelated for all measurements

**e/μ-ID**: the  uncertainty  due  to  estimating  the  efficiency  for  identifying  electrons  (e)  or muons (μ)
* the e-ID uncertainties are +100% correlated between the e +n-jets channels
* the μ-ID uncertainties are +100% correlated between the μ +n-jets channels

**Lumi**: the uncertainty due to estimating the luminosity used in the measurement
* +100% correlated for all measurements  
    
Your job is to write code to calculate the weighted average of these measurements, taking into account the correlations. Use this code to answer the following questions.

### a) 
What is the weighted average and uncertainty of the measurements if correlations are ignored?

In [None]:
def w_ave_uncorr(measurements, errors):
    '''
    @param measurements: an array of all the measurements
    @param errors: an array of the errors
    @return the weighted average and uncertantity
    '''

### b)
What is the covariance matrix for the measurements including the correlations described above?

### c)
Using that covariance matrix, calculate the weigthed average and uncertainty of the measurements taking correlations properly into account.

### d)
What is the χ2 and χ2 probability for uncorrelated and correlated averages? Are these results likely to be consistent?

## Problem 2

In this problem we will explore the effects of non-Gaussian errors on the relationship between confidence regions and the fit covariance matrix. Use the example we did in class of data drawn from a linear model with:

$$ x \in [0,1]$$
$$f(x) = ax+b \text{  (measurements, y, drawn from this model)}$$
$$a = 1$$
$$b=0.5$$
$$N=10 \text{ measurments / experiment}$$
$$M=101 \text{  experiments}$$

But, for this problem, assume that the measurements, yi, take values that are uniformly (rather than normally) distributed about their true values, f(x) =a+bx. Use:

$$ y_i = -f(x_i)\in[-0.15, +0.15] \text{  (uniformly)}$$

Please do the following:

### a)
Simulate one experiment using the parameters above (with yi drawn uniformly from the region[−0.15,+0.15]about the true value f(xi).  Make a plot of yi vs xi, including errors on yi from the standard deviation of a uniform distribution(width/√12).

### b)
Now perform a χ2 fit to the data that you produced in part a.  Simultaneously extract best-fit values for a, b, and their covariance matrix.  In your χ2, use errors on the yi’s as in part a. (0.3/√12) as is often assumed in these situations. What are the paramter values and covariance matrix that your fit returns and what is the χ2 and probability of the fit?  Do your fit results agree with the values input to your simulation?

### c)
Now simulate 1,000 more experiments and fit to each of them to extract 1,000 new estimates of a, b.
* Histogram the fitted values of a and b that you get.  Are the means and widths of these histograms consistent with the true parameter values and the fit uncertainty you obtained in part b.
* Produce a scatter plot of bi−btrue vs ai−atrue.  Superimpose the boundary of the 68.3% confidence region you would expect from your fit to experiment 0 if the errors on yi were Gaussian with σi= 0.3/√12.  What fraction of the fits fall within this confidence region?

### d)
Repeat the fits to your 1,000 experiments using i= 0.15. (This is another common way of dealing with uniform uncertainty regions.)  What fraction of fits fall within the Gaussian 68.3% confidence region in this case?