# **Confidence interval estimation 2**

by Felix Fritzen <fritzen@simtech.uni-stuttgart.de>, November 2020

additional material for the course _Data processing for engineers and scientists_ at the University of Stuttgart


## **Objective**
Extende the previous confidence interval estimation using the central limit theorem and Student's t-distribution to estimating the variance based on the $\chi^2$ distribution.

## **Outline**
- Confidence interval of the **variance** using the $\chi^2$ distribution


In [100]:
import numpy as np
from numpy.random import normal
from scipy.stats import chi2


## Use of the $\chi^2$ distribution for estimating the variance


**Assumptions**
- $Y_i \sim \mathcal{N}(0,1)$, $i=1, \dots, n$
- estimate $\mathcal{I}(\alpha)$ containing the true variance with probability $\geq \alpha$
- use estimators for $\mu, \sigma^2$ and the $\chi^2(\nu)$-distribution

In [107]:
n       = 10
alpha   = 0.975
nu      = n-1
Z1      = chi2.ppf((1+alpha)/2, nu)
Z2      = chi2.ppf((1-alpha)/2, nu)
n_run   = 10
print('test running %d estimations over %d samples each' % (n_run, n))
print('---------------------------------------------------------------------------------')
for i in range(n_run):
    y       = normal(0, 1, size=n)
    mu      = y.mean()
    V       = np.sum((y-mu)**2)/(n-1)
    V_min   = V * (n-1)/Z1
    V_max   = V * (n-1)/Z2
    print('run %3d .... I(%5.3f) =  [  %10.5f,   %10.5f  ] ;   s_n^2 = %10.5f' % (i+1, alpha, V_min, V_max, V))

# theoretical values:
V_min = Z2/(n-1)
V_max = Z1/(n-1)
print('---------------------------------------------------------------------------------')
print('theoretical  I(%5.3f) =  [  %10.5f,   %10.5f  ] ;   s_n^2 = %10.5f' % (alpha, V_min, V_max, 1))

test running 10 estimations over 10 samples each
---------------------------------------------------------------------------------
run   1 .... I(0.975) =  [     0.43944,      4.16441  ] ;   s_n^2 =    1.02703
run   2 .... I(0.975) =  [     0.60375,      5.72143  ] ;   s_n^2 =    1.41103
run   3 .... I(0.975) =  [     0.36469,      3.45599  ] ;   s_n^2 =    0.85232
run   4 .... I(0.975) =  [     0.50769,      4.81119  ] ;   s_n^2 =    1.18654
run   5 .... I(0.975) =  [     0.42315,      4.01001  ] ;   s_n^2 =    0.98895
run   6 .... I(0.975) =  [     0.83520,      7.91483  ] ;   s_n^2 =    1.95197
run   7 .... I(0.975) =  [     0.68615,      6.50237  ] ;   s_n^2 =    1.60362
run   8 .... I(0.975) =  [     0.69924,      6.62636  ] ;   s_n^2 =    1.63420
run   9 .... I(0.975) =  [     0.22433,      2.12587  ] ;   s_n^2 =    0.52429
run  10 .... I(0.975) =  [     0.58592,      5.55251  ] ;   s_n^2 =    1.36937
-------------------------------------------------------------------------------

## Empirical validation (voluntary homework)
- compute interval as given above
- perform a large number of random draws and estimate the variance for each run
- compute the relative frequency of finding the true variance (here: 1) inside of the estimated interval and compare that to the target value $\alpha$

In [108]:
n_vali  = 200000
n_hit    = 0

print('---------------------------------------------------------------------------------')

Z1      = chi2.ppf((1+alpha)/2, nu)
Z2      = chi2.ppf((1-alpha)/2, nu)
for i in range(n_vali):
    y       = normal(0, 1, size=n)
    mu      = y.mean()
    V       = np.sum((y-mu)**2)/(n-1)
    V_min   = V * (n-1)/Z1
    V_max   = V * (n-1)/Z2
    n_hit   += ( 1 >= V_min ) and ( 1 <= V_max )
P_hit = n_hit / n_vali

print('P_hit %7.5f (n_hit=%d) vs. alpha %7.5f (alpha*n_vali=%d)' % (P_hit, n_hit, alpha, alpha*n_vali) )
        

---------------------------------------------------------------------------------
P_hit 0.97552 (n_hit=195105) vs. alpha 0.97500 (alpha*n_vali=195000)
