# z - test
___

## One-sample z-test

This test is used when we want to verify if the population mean differs from its historical or hypothesized value.  
Criteria for a one-sample z-test:
+ The population from which the sample is drawn is normally distributed
+ The sample size is greater than 30
+ A single sample is drawn
+ We are testing for the population mean
+ The population standard deviation is known
Formula for calculating test statistic:

$$ z = \frac {(\overline x - \mu)}{\frac{\sigma}{\sqrt n}}$$

where x is the sample mean, $\mu$ is the population mean, $\sigma$ is the population standard
deviation, and $n$ is the sample size.  

## One-Sample z-test One-tail

A pizza shop has an average delivery time of 45 minutes with a standard deviation on 5 minutes. Due to seom complaints from customer they decided to analyze the last 40 orders. The average delivery time of last 40 orders was found to be 48 minutes. Is the new mean significantly greater than the pop. avg?   
  
Null hypothesis $H_0$: $\mu$ = 45  
Alternate hypothesis $H_1$: $\mu$>45

lets fix level of significance $\alpha$=0.05

Our area of rejection 0.05 is on right tail

In [1]:
z = (48-45)/(5/(40)**0.5)
print(z)

3.7947331922020555


In [2]:
import scipy.stats as stats
p_value = 1 - stats.norm.cdf(z) # cumulative distribution function
print(p_value)

7.390115516725526e-05


since p-value < $\alpha$, we reject the null hypothesis. There is significant difference, at a level of 0.05, between the
average delivery time of the sample and the historical population average.

## One-sample z-test Two-tail

Suppose we want to check if a drug has influence on IQ or not. In this case we have to perform a two tail test because we don't need to know wether it effects positively or negatively in specific. we just want to know, does it effect IQ or not.

Now for two tail test if we fix a level of significance as $\alpha$ = 0.05

Our area of rejection is 0.025 on both right and left tail.

Our population mean $\mu$ = 100 and $\sigma$ = 15, we measure from a sample of 100 subjects and find mean IQ to be 96.


In [3]:
z = (100-96)/(15/(100**0.5))
print("statistic: ", round(z, 4))

statistic:  2.6667


In [4]:
import scipy.stats as stats
critical = stats.norm.ppf(1-0.025) # cumulative distribution function
print("Critical:", round(critical, 4))

Critical: 1.96


Since our test statistic$>$critical statistic in this case We conclude that our drug has a significant influence on IQ values at a criterion level of a=.05We conclude that our drug has a significant influence on IQ values at a criterion level of $\alpha$=.05

## Two-sample z-test  

A two-sample z-test is similar to a one-sample z-test, the only differences being as follows:
+ There are two groups/populations under consideration and we draw one sample from each population
+ Both the population distributions are normal
+ Both population standard deviations are known
+ The formula for calculating test statistic: :
$$z = \frac{\overline x_1 - \overline x_2} {\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}$$

An organization manufactures LED bulbs in two production units, A and B. The quality control team believes that the quality of production at unit A is better than that of B. Quality is measured by how long a bulb works. The team takes samples from both units to test this. The mean life of LED bulbs at units A and B are 1001.3 and 810.47, respectively. The sample sizes are 40 and 44. The population variances are
known: $\sigma_A^2$ = 48127  $\sigma_B^2$ = 59173.  

Conduct the appropriate test, at 5% significance level, to verify the claim of the quality
control team.  

Null hypothesis: $H_0: \mu_A ≤ \mu_B$  
Alternate hypothesis: $H_1 : \mu_A > \mu_B$

lets fix level of significance $\alpha$=0.05

In [5]:
z = (1001.34-810.47)/(48127/40+59173/44)**0.5
print(z)

3.781260568723408


In [6]:
import scipy.stats as stats
p_value = 1 - stats.norm.cdf(z)
p_value

7.801812433294586e-05

calculated p-value (0.000078)<$\alpha$(0.05), we reject the null hypothesis. The LED bulbs produced at unit A have a significantly longer life than those at unit B, at a 5% level.

## Hypothesis tests with proportions

Proportion tests are used with nominal data and are useful for comparing percentages or proportions. For example, a survey collecting responses from a department in an organization might claim that 85% of people in the organization are satisfied with its policies. Historically the satisfaction rate has been 82%. Here, we are comparing a percentage or a proportion taken from the sample with a percentage/proportion from the population. The following are some of the characteristics of the sampling distribution of proportions:

+ The sampling distribution of the proportions taken from the sample is approximately normal
+ The mean of this sampling distribution (p) = Population proportion (p)
+ Calculating the test statistic: The following equation gives the z-value
$$ z = \frac{\overline p - p}{\sqrt{\frac{p(1-p)}{n}}} $$

Where $\overline p$ is the sample proportion, $p$ is the population proportion, and $n$ is the sample size.

## One-sample proportion z-test

It is known that 40% of the total customers are satisfied with the services provided by a mobile service center. The customer service department of this center decides to conduct a survey for assessing the current customer satisfaction rate. It surveys 100 of its customers and finds that only 30 out of the 100 customers are satisfied with its services. Conduct a hypothesis test at a 5% significance level to determine if the percentage of satisfied customers has reduced from the initial satisfaction level (40%).

$H_0: p = 0·4$  
$H_1: p < 0·4$

The < sign indicates lower-tail test.

Fix level of significance $\alpha$ = 0.5

In [7]:
z=(0.3-0.4)/((0.4)*(1-0.4)/100)**0.5
z

-2.041241452319316

In [8]:
import scipy.stats as stats

p=stats.norm.cdf(z)
p

0.02061341666858179

p-value (0.02)<0.05.  We reject the null hypothesis. At a 5% significance level, the percentage of customers satisfied with the service center’s services has reduced

## **Two-sample proportion z-test**

Here, we compare proportions taken from two independent samples belonging to two different populations. The following equation gives the formula for the critical test statistic:

$$ z = \frac {(\overline p_1 - \overline p_2)}{\sqrt{\frac{p_c(1-p_c)}{N_1} + \frac{p_c(1-p_c)}{N_2}}}$$

In the preceding formula, $p_1$ is the proportion from the first sample, and $p_2$ is the
proportion from the second sample. $N_1$is the sample size of the first sample, and $N_2$ is the
sample size of the second sample. $p_c$ is the pooled variance.

$$\overline p_1 = \frac{x_1}{N_1} ;  \overline p_2 = \frac {x_2}{N_2} ;  p_c = \frac {x_1 + x_2}{N_1 + N_2}$$  
In the preceding formula, $x_1$ is the number of successes in the first sample, and $x_2$ is the
number of successes in the second sample.

A ride-sharing company is investigating complaints by its drivers that some of the passengers (traveling with children) do not conform with child safety guidelines (for example, not bringing a child seat or not using the seat belt). The company undertakes surveys in two major cities. The surveys are collected independently, with one sample being taken from each city. From the data collected, it seems that the passengers in City B are more noncompliant than those in City A. The law enforcement authority wants to know if the proportion of passengers conforming with child safety guidelines is different for the two cities. The data for the two cities is given in the following table:

|                 |  City A | City B |
|-----------------|---------|--------|
|  Total surveyed |  200    | 230    |
|No. of complaint |  110    | 106    |

Null hypothesis: $H_0: p_A = p_B$  
Alternate hypothesis: $H_1 : p_A ! = p_B$

This would be a two-tail test, because the region of rejection could be located on either side.

area = 0.025 on both sides. level of significance $\alpha$ = 0.05

In [9]:
x1,n1,x2,n2=110,200,106,230
p1=x1/n1
p2=x2/n2
pc=(x1+x2)/(n1+n2)
z_statistic=(p1-p2)/(((pc*(1-pc)/n1)+(pc*(1-pc)/n2))**0.5)
z_statistic

1.8437643201697864

In [10]:
critical = stats.norm.ppf(1-0.025)
critical

1.959963984540054

In [11]:
p_value =2*(1-stats.norm.cdf(z))
p_value

1.9587731666628365

Since statistic < critical or p > 0.05, we fail to reject the null hypothesis.
There is no significant difference between the proportion of passengers in these cities complying with child safety norms, at a 5% significance level.