# Hypothesis Tests for One Sample

When conducting hypothesis tests for a single population, different methods are employed depending on the nature of the data and the available information. Below, we explore three common scenarios: testing a single population mean using a t-test, testing a single population mean using a z-test, and testing a single population proportion.

## Testing a Single Population Mean using a z-test (Known Variance)

**Assumptions:**
- A simple random sample is taken from the population.
- The population is normally distributed or the sample size is sufficiently large.
- The value of the population standard deviation is known (which is rarely the case in practice).

**Mathematical Formulation:**
$$ z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}} $$

where:
- $ \bar{x} $ is the sample mean,
- $ \mu $ is the population mean,
- $ \sigma $ is the population standard deviation,
- $ n $ is the sample size.


### Example:

In [80]:
import numpy as np
from scipy import stats

In [81]:
# Lets genrate a sample from Normal distribution with mean: 10 and sigma:3. 

mu_true = 10
sigma_true = 3
n = 50

x = stats.norm.rvs(size=n, loc=mu_true, scale=sigma_true)

State the following Hypothesis:  
$$H_0: \mu = 9.5$$
$$H_1: \mu \neq 9.5$$  

And set significance level $\alpha$ = 0.05

In [83]:
alpha = 0.05
mu_0 = 9.5

x_hat = np.mean(x)
n = len(x)

z_statistic = (x_hat - mu_0)/(sigma_true/np.sqrt(n))
p_value = 2*stats.norm.cdf(-abs(z_statistic))
z_critical = stats.norm.ppf(1 - alpha/2)
print(f'z-statistic: {z_statistic: 3.4f} \np-value: {p_value: 3.4f} \nz-critical: {z_critical: 3.4f}')
print('Reject H0:', p_value < alpha)

z-statistic:  0.6174 
p-value:  0.5370 
z-critical:  1.9600
Reject H0: False


As we can see, the test failed to reject the $H_0$. Is there a way to determine sample size $n$ such that the desired difference in means would be detected with significance level $\alpha$? Let's find out:  

Given, significance level $\alpha$ and Type II error probability $\beta$:  
$$P\left(\frac{\bar{X} -  \mu_0}{\frac{\sigma}{\sqrt{n}}} \geq z_{\frac{\alpha}{2}}\right) = \frac{\alpha}{2}$$
$$P\left(\frac{\bar{X} -  \mu_1}{\frac{\sigma}{\sqrt{n}}} \leq z_{\beta}\right) = \beta$$  

We need: 
$$z_{\frac{\alpha}{2}} + z_{\beta} = \mu_1 - \mu_0 = d$$  
Where $d = \mu_1 - \mu_0 $ is called effect size  
So we have:  
$$\frac{\bar{X} -  \mu_0}{\frac{\sigma}{\sqrt{n}}} = z_{\frac{\alpha}{2}}$$
$$\frac{\bar{X} -  \mu_1}{\frac{\sigma}{\sqrt{n}}} = -z_{\beta}$$
From here we get:  
$$d = z_{\frac{\alpha}{2}} + z_{\beta} = \mu_0 + \frac{\sigma \cdot z_{\frac{\alpha}{2}}}{\sqrt{n}} -\mu_1 + \frac{\sigma \cdot z_{\beta}}{\sqrt{n}}$$  
$$d = \frac{\sigma\cdot \left(z_{\frac{\alpha}{2}} + z_{\beta}\right)}{\sqrt{n}}$$  
And finally:
$$n = \frac{\sigma^2\cdot \left(z_{\frac{\alpha}{2}} + z_{\beta}\right)^2}{d^2}$$  

In [84]:
# Lets calculate the desired sample size, setting beta = 0.1
beta = 0.1
d = 0.5

z_alpha = stats.norm.ppf(1 - alpha/2)
z_beta = abs(stats.norm.ppf(beta))
n = np.ceil(sigma_true**2*(z_alpha + z_beta)**2/d**2)
print(f'Desired sample size: {n}')

Desired sample size: 379.0


In [85]:
n = 379

x = stats.norm.rvs(size=n, loc=mu_true, scale=sigma_true)

x_hat = np.mean(x)
n = len(x)

z_statistic = (x_hat - mu_0)/(sigma_true/np.sqrt(n))
p_value = 2*stats.norm.cdf(-abs(z_statistic))
z_critical = stats.norm.ppf(1 - alpha/2)
print(f'z-statistic: {z_statistic: 3.4f} \np-value: {p_value: 3.4f} \nz-critical: {z_critical: 3.4f}')
print('Reject H0:', p_value < alpha)

z-statistic:  1.9855 
p-value:  0.0471 
z-critical:  1.9600
Reject H0: True


Now our test can detect desired effect size of 0.5



## Testing a Single Population Mean using a t-test (Unknown Variance)

**Assumptions:**
- The data should be a simple random sample from a population.
- The population should be approximately normally distributed.
- The sample standard deviation is used to estimate the population standard deviation.

**Mathematical Formulation:**
$$ t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}} $$

where:
- $ \bar{x} $ is the sample mean,
- $ \mu $ is the population mean,
- $ s $ is the sample standard deviation,
- $ n $ is the sample size.

If the sample size is sufficiently large, this test can be robust even if the population is not normally distributed.  



**Degrees of Freedom:**

For a single-sample t-test, the degrees of freedom are given by \( df = n - 1 \). This reflects the number of independent pieces of information available in the sample.


### Example:

In [103]:
x_hat = np.mean(x)
n = len(x)
df = n - 1

s = np.sqrt(np.var(x, ddof=1))
t_statistic = (x_hat - mu_0)/(s/np.sqrt(n))
p_value = 2*stats.t.cdf(-abs(t_statistic), df=df)
t_critical = stats.t.ppf(1 - alpha/2, df=df)
print(f't-statistic: {t_statistic: 3.4f} \np-value: {p_value: 3.4f} \nz-critical: {t_critical: 3.4f}')
print('Reject H0:', p_value < alpha)

t-statistic:  2.0860 
p-value:  0.0376 
z-critical:  1.9663
Reject H0: True


## Testing a Single Population Proportion

**Assumptions:**
- A simple random sample is taken from the population.
- The conditions for a binomial distribution are met: there are a certain number $ n $ of independent trials, each trial has outcomes of success or failure, and each trial has the same probability of success $ p $.
- The binomial distribution needs to resemble a normal distribution, ensuring $ np > 5 $ and $ nq > 5 $, where $ q = 1 - p $.

**Mathematical Formulation:**
$$ z = \frac{\hat{p} - p}{\sqrt{\frac{pq}{n}}} $$

where:
- $ \hat{p} $ is the sample proportion,
- $ p $ is the population proportion,
- $ q = 1 - p $,
- $ n $ is the sample size.

These tests provide valuable tools for drawing inferences about a single population based on sample data, facilitating informed decision-making in various fields.

### Example:

In [106]:
p = 0.45
n = 100

x = stats.bernoulli.rvs(size=n, p=p)

State the following Hypothesis:  
$$H_0: p = 0.5$$
$$H_1: p \neq 0.5$$  

And set significance level $\alpha$ = 0.05

In [118]:
alpha = 0.05
p_0 = 0.5

n = len(x)
p_hat = np.mean(x)

sigma = np.sqrt(p_0*(1 - p_0))
#sigma = np.sqrt(p_hat*(1 - p_hat))
z_statistic = (p_hat - p_0)/(sigma/np.sqrt(n))
p_value = 2*stats.norm.cdf(-abs(z_statistic))
z_critical = stats.norm.ppf(1 - alpha/2)
print(f'z-statistic: {z_statistic: 3.4f} \np-value: {p_value: 3.4f} \nz-critical: {z_critical: 3.4f}')
print('Reject H0:', p_value < alpha)

z-statistic: -2.0000 
p-value:  0.0455 
z-critical:  1.9600
Reject H0: True
