# Z-Test

## One-Sample Z-Test

A one-sample Z-test is utilized to evaluate if the mean of a single sample differs from a known or hypothesized population mean. Several criteria must be fulfilled for a one-sample Z-test:

- The population from which the sample is drawn follows a normal distribution.
- The sample size exceeds 30.
- Only one sample is obtained.
- The hypothesis concerns the population mean.
- The population standard deviation is known.

The test statistic is computed using the formula:

$$ z = \frac {(\overline x - \mu)}{\frac{\sigma}{\sqrt n}}$$

where $x$ denotes the sample mean, $\mu$ represents the population mean, $\sigma$ stands for the population standard deviation, and $n$ is the sample size.

## One-Sample Z-Test: One-Tail

Suppose we have a pizza delivery shop with a historical average delivery time of 45 minutes and a standard deviation of 5 minutes. However, due to recent customer complaints, the shop decides to analyze the delivery time of the last 40 orders, revealing an average delivery time of 48 minutes. We aim to ascertain if the new mean significantly exceeds the population mean.

The null hypothesis ($H_0$) posits that the mean delivery time equals 45 minutes: $\mu = 45$. The alternative hypothesis ($H_1$) suggests that the mean delivery time surpasses 45 minutes: $\mu > 45$. Let's adopt a significance level of $\alpha = 0.05$. In this scenario, the region of rejection will be situated on the right tail.

In [1]:
z = (48-45)/(5/(40)**0.5)
print(z)

3.7947331922020555


In [2]:
import scipy.stats as stats
p_value = 1 - stats.norm.cdf(z) # cumulative distribution function
print(p_value)

7.390115516725526e-05


Since the p-value is less than $\alpha$, we reject the null hypothesis. There is a significant difference, at a level of 0.05, between the average delivery time of the sample and the historical population average.

## One-Sample Z-Test: Two-Tail

Suppose we aim to investigate whether a drug has an impact on IQ. In this scenario, we opt for a two-tail test because we're interested in determining whether the drug affects IQ, regardless of whether it has a positive or negative effect.

Given a significance level of $\alpha = 0.05$, our rejection regions are 0.025 on both the right and left tails.

Assuming our population mean $\mu = 100$ and population standard deviation $\sigma = 15$, we conduct a study involving a sample of 100 subjects. Upon analysis, we discover that the mean IQ of the sample is 96.

In [3]:
z = (100-96)/(15/(100**0.5))
print("statistic: ", round(z, 4))

statistic:  2.6667


In [4]:
import scipy.stats as stats
critical = stats.norm.ppf(1-0.025) # cumulative distribution function
print("Critical:", round(critical, 4))

Critical: 1.96


Since our test statistic is greater than the critical statistic, we conclude that our drug has a significant influence on IQ values at a criterion level of $\alpha = 0.05$.

## Two-Sample Z-Test

A two-sample z-test is similar to a one-sample z-test, with the main differences being:

- There are two groups/populations under consideration, and we draw one sample from each population.
- Both population distributions are assumed to be normal.
- Both population standard deviations are known.
- The formula for calculating the test statistic is:

$$z = \frac{\overline{x}_1 - \overline{x}_2} {\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$$

An organization manufactures LED bulbs in two production units, A and B. The quality control team believes that the quality of production at unit A is better than that of B. Quality is measured by how long a bulb works. The team takes samples from both units to test this. The mean life of LED bulbs at units A and B are 1001.3 and 810.47, respectively. The sample sizes are 40 and 44. The population variances are known: $\sigma_A^2 = 48127$ and $\sigma_B^2 = 59173$.

Conduct the appropriate test, at a 5% significance level, to verify the claim of the quality control team.

**Null hypothesis:** $H_0: \mu_A ≤ \mu_B$  
**Alternate hypothesis:** $H_1: \mu_A > \mu_B$

Let's fix the level of significance at $\alpha = 0.05$.

In [5]:
z = (1001.34-810.47)/(48127/40+59173/44)**0.5
print(z)

3.781260568723408


In [6]:
import scipy.stats as stats
p_value = 1 - stats.norm.cdf(z)
p_value

7.801812433294586e-05

p-value (0.000078)<$\alpha$(0.05), we reject the null hypothesis. The LED bulbs produced at unit A have a significantly longer life than those at unit B, at a 5% level.

## Hypothesis Tests with Proportions

Proportion tests are utilized with nominal data and are effective for comparing percentages or proportions. For instance, a survey collecting responses from a department in an organization might claim that 85% of people in the organization are satisfied with its policies. Historically, the satisfaction rate has been 82%. Here, we compare a percentage or proportion taken from the sample with a percentage/proportion from the population. The following are some characteristics of the sampling distribution of proportions:

- The sampling distribution of the proportions taken from the sample is approximately normal.
- The mean of this sampling distribution ($\overline{p}$) equals the population proportion ($p$).
- Calculating the test statistic: The following equation gives the $z$-value:

$$ z = \frac{\overline{p} - p}{\sqrt{\frac{p(1-p)}{n}}} $$

Where $\overline{p}$ is the sample proportion, $p$ is the population proportion, and $n$ is the sample size.

## One-Sample Proportion Z-Test

It is known that 40% of the total customers are satisfied with the services provided by a mobile service center. The customer service department of this center decides to conduct a survey for assessing the current customer satisfaction rate. It surveys 100 of its customers and finds that only 30 out of the 100 customers are satisfied with its services. Conduct a hypothesis test at a 5% significance level to determine if the percentage of satisfied customers has reduced from the initial satisfaction level (40%).

**Null Hypothesis:** $H_0: p = 0.4$  
**Alternate Hypothesis:** $H_1: p < 0.4$

The < sign indicates a lower-tail test.

Let's fix the level of significance at $\alpha = 0.05$.

In [7]:
z=(0.3-0.4)/((0.4)*(1-0.4)/100)**0.5
z

-2.041241452319316

In [8]:
import scipy.stats as stats

p=stats.norm.cdf(z)
p

0.02061341666858179

p-value (0.02) < 0.05. We reject the null hypothesis. At a 5% significance level, the percentage of customers satisfied with the service center’s services has reduced.

## Two-Sample Proportion Z-Test

Here, we compare proportions taken from two independent samples belonging to two different populations. The following equation gives the formula for the critical test statistic:

$$ z = \frac {(\overline{p}_1 - \overline{p}_2)}{\sqrt{\frac{p_c(1-p_c)}{N_1} + \frac{p_c(1-p_c)}{N_2}}}$$

In the preceding formula, $\overline{p}_1$ is the proportion from the first sample, and $\overline{p}_2$ is the proportion from the second sample. $N_1$ is the sample size of the first sample, and $N_2$ is the sample size of the second sample. $p_c$ is the pooled variance.

$$\overline{p}_1 = \frac{x_1}{N_1} ;  \overline{p}_2 = \frac {x_2}{N_2} ;  p_c = \frac {x_1 + x_2}{N_1 + N_2}$$

In the preceding formula, $x_1$ is the number of successes in the first sample, and $x_2$ is the number of successes in the second sample.

## Investigation of Passenger Compliance with Child Safety Guidelines

A ride-sharing company is investigating complaints by its drivers regarding passenger compliance with child safety guidelines, specifically concerning the use of child seats and seat belts. Surveys were independently conducted in two major cities, A and B, to gather data on passenger compliance. The company aims to determine if there is a difference in the proportion of passengers conforming to child safety guidelines between the two cities. The data for the two cities is summarized in the following table:

|                 |  City A | City B |
|-----------------|---------|--------|
|  Total surveyed |  200    | 230    |
| No. of complaints |  110    | 106    |

The law enforcement authority seeks to evaluate if the proportion of compliant passengers differs significantly between City A and City B.

## Hypotheses for Two-Sample Proportion Test

For the two-sample proportion test comparing compliance rates between City A and City B:

- Null hypothesis: $H_0: p_A = p_B$
- Alternative hypothesis: $H_1: p_A \neq p_B$

This constitutes a two-tail test because the region of rejection could be located on either side.

The significance level $\alpha$ is set at 0.05, resulting in an area of 0.025 on both sides.

In [9]:
x1,n1,x2,n2=110,200,106,230
p1=x1/n1
p2=x2/n2
pc=(x1+x2)/(n1+n2)
z_statistic=(p1-p2)/(((pc*(1-pc)/n1)+(pc*(1-pc)/n2))**0.5)
z_statistic

1.8437643201697864

In [10]:
critical = stats.norm.ppf(1-0.025)
critical

1.959963984540054

In [11]:
p_value =2*(1-stats.norm.cdf(z))
p_value

1.9587731666628365

## Conclusion of Two-Sample Proportion Test

Based on the statistical analysis:

- Since the test statistic is less than the critical value, or the p-value is greater than 0.05, we fail to reject the null hypothesis.
- Therefore, there is no significant difference between the proportion of passengers in these cities complying with child safety norms, at a 5% significance level.
