# A. Confidence limits for Poisson processes

$$ P(x\leq n) = \sum_{x_i=0}^n \frac{\lambda^{x_i}}{x_i!} e^{-\lambda} $$ 

## 1.   Confidence interval for a given expectation value.
Write a program that lists, for fixed values of $\lambda=1, 2, ...12$, the number of observed events, $n$, where there is (at most) a 10\% of the probability of observing (i) above $n$, (ii) below $n$, (iii) above $n$ and below $n'$ (central confidence interval). For simplicity here, find the value of $n$ below which there is (at most) a 5\% probability and the value of $n'$ above which there is (at most) a 5\% probability.

In [1]:
import numpy as np
import math
from scipy.stats import poisson
import astropy.stats as astats

In [2]:
def poisson_10pc_n_events(tail):
    mean_all = np.arange(1, 13)
    sum_n = np.arange(0, 100)
    n_events = []
    n2_events = []
    if tail == 'below':
        for mean in mean_all:
            p = 0
            for n in sum_n:
                p_old = p
                p += mean**(n)/math.factorial(n) * np.exp(-mean)
                if (p_old < 0.1) & (p > 0.1) & (n != 0):
                    n_events.append(n-1)
                    break
                elif (p_old < 0.1) & (p > 0.1) & (n == 0):
                    n_events.append(np.NaN)
                    break
        print('The number of observed events n such that at most 10% of the probability are below n')
        print('Mean: \t n')
        print('___________')
        for i in range(len(mean_all)):
            print(str(mean_all[i]) + ": \t" + str(n_events[i]))
    elif tail == 'above':
        for mean in mean_all:
            p = 0
            for n in sum_n:
                p_old = p
                p += mean**(n)/math.factorial(n) * np.exp(-mean)
                if (p_old < 0.9) & (p > 0.9):
                    n_events.append(n)
                    break
        print('The number of observed events n such that at most 10% of the probability are above n')
        print('Mean: \t n')
        print('___________')
        for i in range(len(mean_all)):
            print(str(mean_all[i]) + ": \t" + str(n_events[i]))
    elif tail == 'both':
        for mean in mean_all:
            p = 0
            for n in sum_n:
                p_old = p
                p += mean**(n)/math.factorial(n) * np.exp(-mean)
                if (p_old < 0.05) & (p > 0.05) & (n != 0):
                    n_events.append(n-1)
                elif (p_old < 0.05) & (p > 0.05) & (n == 0):
                    n_events.append(np.NaN)
                elif (p_old < 0.95) & (p > 0.95):
                    n2_events.append(n)
                    break
                elif (poisson.cdf(n, 12) > 0.95):
                    n2_events.append(n)
                    break
        print('The number of observed events n such that at most 10% of the probability are above n and below n2')
        print('Mean: \t n : \t n2')
        print('____________________')
        for i in range(len(mean_all)):
            print(str(mean_all[i]) + ":\t" + str(n_events[i]) + "\t" + str(n2_events[i]))
    else:
        print('Please select which tail of the distribution you want to find: above, below, both')

In [3]:
poisson_10pc_n_events(tail = 'above')

The number of observed events n such that at most 10% of the probability are above n
Mean: 	 n
___________
1: 	2
2: 	4
3: 	5
4: 	7
5: 	8
6: 	9
7: 	10
8: 	12
9: 	13
10: 	14
11: 	15
12: 	17


In [4]:
poisson_10pc_n_events(tail = 'below')

The number of observed events n such that at most 10% of the probability are below n
Mean: 	 n
___________
1: 	nan
2: 	nan
3: 	0
4: 	1
5: 	1
6: 	2
7: 	3
8: 	4
9: 	4
10: 	5
11: 	6
12: 	7


In [5]:
poisson_10pc_n_events(tail = 'both')

The number of observed events n such that at most 10% of the probability are above n and below n2
Mean: 	 n : 	 n2
____________________
1:	nan	3
2:	nan	5
3:	0	6
4:	0	8
5:	1	9
6:	1	10
7:	2	12
8:	3	13
9:	3	14
10:	4	15
11:	5	17
12:	6	18


I could have also done this using `scipy.stats.poission` but this is the old fashioned way! I will use `scipy.stats.poission` for the following examples now that we can see how it works.

## 2. Confidence interval  given a measured value.
In a similar spirit, write a program that lists, for fixed number of observed events $n = 0, 1, 2, ...12$, the (i) upper, (ii) lower, and (iii) central CIs on $\lambda$ at 90\% confidence level. Describe the conceptual difference between this case and that discussed in part 1.

In [6]:
mean_all = np.arange(0, 30, 0.01)
n_all = np.arange(0, 13)
upper_limits = []
lower_limits = []

for n in n_all:
    prob = poisson.cdf(n, mean_all)
    for i in range(len(prob)):
        if (prob[i+1] < 0.9) & (prob[i] > 0.9):
            lower_limits.append(mean_all[i+1])
        elif (prob[i+1] < 0.1) & (prob[i] > 0.1):
            upper_limits.append(mean_all[i+1])
            break

In [7]:
print('Lower and upper limits at 10% and 90% CL')
print('___________________________________________')
print('n:\t Lower:\t Upper:')
print('___________________________________________')
for j in range(len(n_all)):
    print(str(n_all[j]) + '\t' + str(round(lower_limits[j], 2)) + '\t' + str(np.round(upper_limits[j], 2)))

Lower and upper limits at 10% and 90% CL
___________________________________________
n:	 Lower:	 Upper:
___________________________________________
0	0.11	2.31
1	0.54	3.89
2	1.11	5.33
3	1.75	6.69
4	2.44	8.0
5	3.16	9.28
6	3.9	10.54
7	4.66	11.78
8	5.44	13.0
9	6.23	14.21
10	7.03	15.41
11	7.83	16.6
12	8.65	17.79


The code is essentially the same for the centre CI with both tails, except we have 5% and 95% confidence level for each tail of the distribution.

# B. Confidence limits for Poisson processes in the context of a counting experiment

$$ p(n;\lambda_S;\lambda_B)= \frac{(\lambda_S+\lambda_B)^n}{n!} \exp{(-[\lambda_S+\lambda_B])} $$

$$ p(n\geq n_{obs})=\sum_{n=n_{obs}}^\infty p(n;\lambda_S=0;\lambda_B)= 1 - \sum_{n=0}^{n_{obs}-1} \frac{\lambda_B^n}{n!} \exp{(-\lambda_B)} $$

Assume a counting experiment in which 5 events are observed, while $\lambda_B = 1.8$ background events are expected.

## 1. Establishing the presence of signal.
Is this a significant ($=3\sigma$) excess to establish the presence of signal? In other words, calculate the probability of observing $n_{obs} = 5$ or more events assuming the presence of background only with the expectation value is $\lambda_B = 1.8$ using Poisson statistics.

$$ p(n\geq 5)= 1 - \sum_{n=0}^{4} \frac{1.8^n}{n!} \exp{(-1.8)} $$

In [8]:
prob = 1- poisson.cdf(4, 1.8)
three_sigma = 1 -  0.997
print('p-value = ' + str(prob))
print('3 sigma = ' + str(three_sigma))

p-value = 0.036406661001083473
3 sigma = 0.0030000000000000027


The p-value is greater than the three sigma threshold, Therefore, this is not a significant excess to establish the presence of a signal.

## 2. Upper limit on the number of signal events.
Determine an upper limit $\lambda_S^{max}$ for the number of signal events at a 95\% CL. Such a limit is defined by the expected number of signal events $\lambda_S^{max}$ where the probability of measuring $n_{obs}$ or fewer events reaches 5\% assuming a Poisson statistic with mean $\lambda_B + \lambda_S^{max}$. To (numerically) find the answer, perform an interval search starting from the probabilities to observe $n_B + n^{min}_S$ and $n_B + n_S^{max}$ or less events. Stop the search when the uncertainty, i.e. the difference of the limits of the interval, is less than $10^{-5}$.

In [9]:
sig_mean_max = np.linspace(0, 10, 11)

print('Upper lim: \t p-value')
for i in range(len(sig_mean_max)):
    p_less_than_obs = poisson.cdf(5, (1.8+sig_mean_max[i]))
    print(str(sig_mean_max[i]) + " : \t\t" + str(p_less_than_obs))

Upper lim: 	 p-value
0.0 : 		0.9896219631338404
1.0 : 		0.934889686635759
2.0 : 		0.8155562560569335
3.0 : 		0.6510064372694917
4.0 : 		0.47831468715817593
5.0 : 		0.3269771300718833
6.0 : 		0.21025110554874318
7.0 : 		0.12838664508882555
8.0 : 		0.07504113738341638
9.0 : 		0.04225517364020971
10.0 : 		0.023043101774884243


In [10]:
sig_mean_max = np.linspace(8, 9, 11)

print('Upper lim: \t p-value')
for i in range(len(sig_mean_max)):
    p_less_than_obs = poisson.cdf(5, (1.8+sig_mean_max[i]))
    print(str(sig_mean_max[i]) + " : \t\t" + str(p_less_than_obs))

Upper lim: 	 p-value
8.0 : 		0.07504113738341638
8.1 : 		0.07096514087048697
8.2 : 		0.06708596287903189
8.3 : 		0.06339596386834864
8.4 : 		0.05988765537102165
8.5 : 		0.05655370984090572
8.6 : 		0.053386969043626664
8.7 : 		0.05038045108893583
8.8 : 		0.04752735620150341
8.9 : 		0.04482107132365227
9.0 : 		0.04225517364020971


In [11]:
sig_mean_max = np.linspace(8.7, 8.8, 11)

print('Upper lim: \t p-value')
for i in range(len(sig_mean_max)):
    p_less_than_obs = poisson.cdf(5, (1.8+sig_mean_max[i]))
    print(str(round(sig_mean_max[i],2)) + " : \t\t" + str(p_less_than_obs))

Upper lim: 	 p-value
8.7 : 		0.05038045108893583
8.71 : 		0.05008834815427274
8.72 : 		0.04979777259804961
8.73 : 		0.049508717740923115
8.74 : 		0.04922117692231874
8.75 : 		0.048935143500466564
8.76 : 		0.04865061085244038
8.77 : 		0.04836757237419089
8.78 : 		0.04808602148058197
8.79 : 		0.04780595160542332
8.8 : 		0.04752735620150341


In [12]:
sig_mean_max = np.linspace(8.71, 8.72, 11)

print('Upper lim: \t p-value')
for i in range(len(sig_mean_max)):
    p_less_than_obs = poisson.cdf(5, (1.8+sig_mean_max[i]))
    print(str(round(sig_mean_max[i],3)) + " : \t\t" + str(p_less_than_obs))

Upper lim: 	 p-value
8.71 : 		0.05008834815427272
8.711 : 		0.05005922197697671
8.712 : 		0.05003011106676179
8.713 : 		0.05000101541694229
8.714 : 		0.04997193502083388
8.715 : 		0.049942869871754565
8.716 : 		0.04991381996302396
8.717 : 		0.0498847852879639
8.718 : 		0.04985576583989758
8.719 : 		0.049826761612150244
8.72 : 		0.049797772598049524


In [13]:
0.05000101541694229 - 0.04997193502083388

2.9080396108410733e-05

In [14]:
sig_mean_max = np.linspace(8.713, 8.714, 11)

print('Upper lim: \t p-value')
for i in range(len(sig_mean_max)):
    p_less_than_obs = poisson.cdf(5, (1.8+sig_mean_max[i]))
    print(str(round(sig_mean_max[i],4)) + " : \t\t" + str(p_less_than_obs))

Upper lim: 	 p-value
8.713 : 		0.05000101541694232
8.7131 : 		0.04999810669102477
8.7132 : 		0.049995198117637606
8.7133 : 		0.04999228969677415
8.7134 : 		0.049989381428427807
8.7135 : 		0.04998647331259182
8.7136 : 		0.04998356534925956
8.7137 : 		0.04998065753842439
8.7138 : 		0.04997774988007933
8.7139 : 		0.04997484237421816
8.714 : 		0.04997193502083388


In [15]:
0.05000101541694232 - 0.04999810669102477

2.908725917544208e-06

We made it! The difference is between intervals is less than $10^{-5}$. This means that $\lambda_S^{max}$ is ~ 8.713.

In [16]:
# This is approximation way to check our answer...
# Note that this is not the exact same answer as this uses a Bayesion method with a uniform prior.

cl_array = astats.poisson_conf_interval(5, interval = 'kraft-burrows-nousek', confidence_level = 0.95, background = 1.8)
cl_array

array([[0.0453289],
       [8.7740608]])

# C. Confidence limit determination with MC approaches

Verify the limit determined in above problem B.2 with toy Monte Carlo experiments. In each toy experiment generate a random number according to a Poisson distribution with a mean value of $\lambda_B + \lambda_S^{max}$. Then count the number of experiments in which this random number is less or equal $n_{obs}$. By construction, the fraction of these events should be 5%.

In [17]:
dist = np.random.poisson((1.8+8.713), 10000000)

In [18]:
count = 0
for i in range(len(dist)):
    if dist[i] < 6:
        count +=1
print(str(count/len(dist) * 100) + "%")

5.008789999999999%
