## Confidence interval using t distribution


$ \text{Let Z be a standar normal distribution and W a Chi square distribution with n degrees of freedom,}$

$\text{the combination below follows a Student t distribution:}$

1) $$\frac{Z}{\sqrt{W/n}}$$

$ \text{if a sample of n size is taken from a population where the } X_{i} \text{ are normally distributed with mean } \mu \text{ and standar deviation } \sigma$

$\text{Then: }$

2) $$\frac{\bar X - \mu}{\frac{\sigma}{\sqrt{n}}}$$  

$\text{ is standar normal. In the other hand it can be shown that: }$

3) $$\frac{(n - 1)S_{n}^2}{\sigma^2} \sim \chi_{n-1}^2$$

$\text{Being S the sample standar deviation and } \chi_{n-1}^2 \text{ a Chi square distribution with (n - 1) degrees of freedom.}$

$\text{Dividing 3. by its degrees of freedom and taking the square root:}$

4) $$\sqrt{\frac{(n - 1)S_{n}^2}{\sigma^2 (n - 1)}} = \frac{S_{n}}{\sigma}$$

$\text{Dividing 2. by 4. a t student distribution with (n - 1)  degrees of freedom is obtained}$

5) $$\frac{\bar X - \mu}{\frac{S_{n}}{\sqrt{n}}}$$

$\text{A } (1-\alpha) \text{ confidence interval can be derived from:}$

6) $$ 1 - \alpha = P \Big( t_{n-1, \alpha/2} \leq \frac{\bar X - \mu}{\frac{S_{n}}{\sqrt{n}}} \leq t_{n-1, 1-\alpha/2} \Big)$$

$$ = P \Big( \bar X - t_{n-1, 1- \alpha/2}\frac{S_{n}}{\sqrt{n}} \leq \mu \leq \bar X + t_{n-1, 1-\alpha/2}\frac{S_{n}}{\sqrt{n}} \Big)$$

$\text{Considering that the t pdf si simetric } t_{n-1,\alpha/2} = - t_{n-1,1-\alpha/2}$

$ \text{So the } (1-\alpha) \text{confidence interval is }  \Big[\bar X - t_{n-1, 1- \alpha/2}\frac{S_{n}}{\sqrt{n}}\, , \, \bar X + t_{n-1, 1-\alpha/2}\frac{S_{n}}{\sqrt{n}}\Big]$

In [93]:
import numpy as np
from scipy import stats

In [94]:
# Data sample, as an axample. one can choose another data list to perform the calculations
x = np.random.normal(loc=5, scale=10, size=200)

# Sample size
sample_size = len(x)

In [95]:
# Sample mean
x_bar = x.mean()
print(f'mean: {x_bar}')

# Sample standar deviation
x_std = np.std(x, ddof=1)
print(f'Sample st deviation: {x_std}')

mean: 5.49561203030529
Sample st deviation: 10.565759781883308


In [96]:
# Confidence level  (1 - alpha)
confidence_required = 0.95
alpha = 1 - confidence_required
dof = sample_size - 1

In [97]:
# t (1 - alpha/2) quantile
q = 1 - alpha/2
tq = stats.t.ppf(q, df = dof, loc=0, scale=1)
print(f'1 - alpha/2 = {q}, (1 - alpha/2) quantile: {tq}')

1 - alpha/2 = 0.975, (1 - alpha/2) quantile: 1.971956544249395


In [98]:
# Calculating the interval manually.
ci = (x_bar - tq*x_std/(sample_size)**0.5, x_bar + tq*x_std/(sample_size)**0.5)
print(f'{confidence_required*100} % confidence interval : [ {ci[0]} , {ci[1]}]')

95.0 % confidence interval : [ 4.022339555680617 , 6.968884504929964]


In [99]:
# Using scipy
ci_auto = stats.t.interval(
    confidence_required,
    df=dof,
    loc=x_bar,
    scale=x_std/(sample_size)**0.5
)
print(f'{confidence_required*100} % confidence interval : [ {ci_auto[0]} , {ci_auto[1]}]')


95.0 % confidence interval : [ 4.022339555680617 , 6.968884504929964]


### Calculation for two independent groups from a population with same standar deviation

In [None]:
# number of observations
nx = 15
ny = 7
# mean and standar deviation for control x group
x_bar = 120
Sx = 20
# mean and standar deviation for treatment group y
y_bar = 130
Sy = 15

# The standar deviation from population is the same for the two groups from hypotesis

Pooled_variance = ((nx-1)*Sx**2 + (ny-1)*Sy**2)/(nx + ny - 2)
Sp = Pooled_variance**0.5

dof = nx + ny - 2

standar_error = Sp*(1/nx + 1/ny)**0.5


# a confidense interval for the difference between the two means
ci_group = stats.t.interval(
    confidence_required,
    df=dof,
    loc=y_bar - x_bar, 
    scale=standar_error
)
print(f'{confidence_required*100} % confidence interval : [ {ci_group[0]} , {ci_group[1]}]')


347.5
95.0 % confidence interval : [ -7.79921255159006 , 27.79921255159006]
