# Statistics-8 🤝

**Title :** $\textit{Fundamentals of Hypothesis Testing | One-Sample Tests}$\
**Author :** $\textit{Manideep Bangaru}$ 👨🏻‍💻

$\small \text{on 23/02/2020}$

### The Null and Alternative Hypothesis
The Null hypothesis states that a population parameter (such as the mean, the standard deviation, and so on) is equal to a hypothesized value.

The alternative hypothesis is what you might believe to be true or hope to prove true.

### The Critical value of Test Statistic
Hypothesis testing uses sample data to determine how likely it is that the null hypothesis is true.

For this, you select a sample out of the population and calculate the sample mean ($\bar{X}$)

This sample statistic is an estimate of the corresponding parameter, the population mean,($\mu$). Even if the null hypothesis is true, the sample statistic X is likely to differ from the value of the parameter (the population mean, ($\mu$)) because of variation due to sampling.

You do expect the sample statistic to be close to the population parameter if the null hypothesis is true. If the sample statistic is close to the population parameter, you have insufficient evidence to reject the null hypothesis.

However, if there is a large difference between the value of the sample statistic and the hypothesized value of the population parameter, you might conclude that the null hypothesis is false and you reject it.

But in some cases, you will not be able to visually inspect. So we need a logical methodology to clearly determine "Very Close" and "Very Different"

Furthermore, it enables you to quantify the decision-making process by computing the probability of getting a certain sample result if the null hypothesis is true. You calculate this probability by determin- ing the sampling distribution for the sample statistic of interest (e.g., the sample mean) and then computing the particular test statistic based on the given sample result. Because the sam- pling distribution for the test statistic often follows a well-known statistical distribution, such as the standardized normal distribution or t distribution, you can use these distributions to help determine whether the null hypothesis is true.

### Regions of Rejection and Nonrejection
The sampling distribution of the test statistic is divided into two regions, a **region of rejection** (sometimes called the critical region) and a **region of nonrejection**

If the test statistic falls into the region of nonrejection, you do not reject the null hypothesis

To make a decision concerning the null hypothesis, you first determine the **critical value** of the test statistic. The critical value divides the nonrejection region from the rejection region. Determining the critical value depends on the size of the rejection region.

Using hypothesis testing involves the risk of reaching an incorrect conclusion. You might wrongly **reject** a true null hypothesis, H0, or, conversely, you might wrongly **not reject** a false null hypothesis, H0. These types of risk are called Type I and Type II errors.

![Screenshot%202021-02-23%20at%208.27.12%20PM.png](attachment:Screenshot%202021-02-23%20at%208.27.12%20PM.png)

### Type I and Type II Errors
A **Type I error** occurs if you reject the null hypothesis, H0, when it is true and should not be rejected. A Type I error is a “false alarm.” The probability of a Type I error occurring is **$\large \alpha$**.

A **Type II error** occurs if you do not reject the null hypothesis, H0, when it is false and should be rejected. A Type II error represents a “missed opportunity” to take some corrective action. The probability of a Type II error occurring is **$\large \beta$**.

Traditionally, you control the Type I error by determining the risk level, a (the lower- case Greek letter alpha), that you are willing to have of rejecting the null hypothesis when it is true. This risk, or probability, of committing a Type I error is called the level of significance ($\large \alpha$). Because you specify the level of significance before you perform the hypothesis test, you directly control the risk of committing a Type I error. Traditionally, you select a level of 0.01, 0.05, or 0.10. The choice of a particular risk level for making a Type I error depends on the cost of making a Type I error. After you specify the value for $\large \alpha$, you can then determine the critical values that divide the rejection and nonrejection regions. You know the size of the rejection region because a is the probability of rejection when the null hypothesis is true. From this, you can then determine the critical value or values that divide the rejection and nonrejection regions.

The probability of committing a Type II error is called the $\large \beta$ risk. Unlike the Type I error, which you control through the selection of $\large \alpha$, the probability of making a Type II error depends on the difference between the hypothesized and actual values of the population parameter. Because large differences are easier to find than small ones, if the difference between the hypothesized and actual values of the population parameter is large, $\large \beta$ is small.where as when the difference is less then the $\large \beta$ is large.

### Probability of Type I and Type II Errors
The level of significance $\large \alpha$ of a statistical test is the probability of committing a Type I error.\
The $\large \beta$ risk is the probability of committing a Type II error.

* The complement of the probability of a Type I error, (1 - $\large \alpha$), is called the **confidence coefficient**. The confidence coefficient is the probability that you will not reject the null hypothesis, H0, when it is true and should not be rejected.

* The complement of the probability of a Type II error, (1 - $\large \beta$), is called the **power of a statistical test**. The power of a statistical test is the probability that you will reject the null hypothesis when it is false and should be rejected.

### Z test for the mean (Standard Deviation - Known)
When the standard deviation, $\large \sigma$, is known (which rarely occurs), you use the Z test for the mean if the population is normally distributed. If the population is not normally distributed, you can still use the Z test if the sample size is large enough for the Central Limit Theorem to take effect. Below equation defines the $Z_{STAT}$ test statistic for determining the difference between the sample mean, X, and the population mean $\large \mu$, when the standard deviation, $\large \sigma$, is known.

$Z_{STAT} = \Large \frac{\bar{X}-\mu}{\frac{\sigma}{\sqrt{n}}}$

### Hypothesis testing Using the critical value approach

The critical value approach compares the value of the computed ZSTAT test statistic from above equation to critical values that divide the normal distribution into regions of rejection and nonrejection. The critical values are expressed as standardized Z values that are determined by the level of significance.

For example, if you use a level of significance of 0.05, the size of the rejection region is 0.05. Because the null hypothesis contains an equal sign and the alternative hypothesis contains a not equal sign, you have a two-tail test in which the rejection region is divided into the two tails of the distribution, with two equal parts of 0.025 in each tail. For this two-tail test, a rejection region of 0.025 in each tail of the normal distribution results in a cumulative area of 0.025 below the lower critical value and a cumulative area of 0.975 (1 - 0.025) below the upper critical value (which leaves an area of 0.025 in the upper tail).

According to the cumulative standardized normal distribution table, the critical values that divide the rejection and nonrejection regions are -1.96 and +1.96

If the $Z_{STAT}$ value lies between -1.96 and +1.96 then we can conclude that there is no sufficient evidence to reject the null hypothesis as the value is not falling in rejection region

### Hypothesis testing Using the p-value approach
The p-value is the probability of getting a test statistic equal to or more extreme than the sample result, given that the null hypothesis, H0, is true. The p-value is also known as the observed level of significance. Using the p-value to determine rejection and nonrejection is another approach to hypothesis testing.
The decision rules for rejecting H0 in the p-value approach are

* If the p-value is greater than or equal to $\large \alpha$, do not reject the null hypothesis.
* If the p-value is less than $\large \alpha$, reject the null hypothesis.

> <font color = 'red'>A small (or low) p-value indicates a small probability that H0 is true. A big or large p-value indicates a large probability that H0 is true.</font>

For example, the test statistic resulted in a $Z_{STAT}$ value of +1.50 and you did not reject the null hypothesis because + 1.50 was less than the upper critical value of +1.96 and greater than the lower critical value of -1.96.

To use the p-value approach for the two-tail test, you find the probability that the test statistic $Z_{STAT}$ is equal to or more extreme than 1.50 standard error units from the center of a standardized normal distribution. In other words, you need to compute the probability that the $Z_{STAT}$ value is greater than +1.50 along with the probability that the $Z_{STAT}$ value is less than -1.50.

the probability of a ZSTAT value below -1.50 is 0.0668. The probability of a value below +1.50 is 0.9332, and the probability of a value above +1.50 is 1 - 0.9332 = 0.0668. Therefore, the p-value for this two-tail test is 0.0668 + 0.0668 = 0.1336 . Thus, the probability of a test statistic equal to or more extreme than the sample result is 0.1336. Because 0.1336 is greater than a = 0.05, you do not reject the null hypothesis.

### Can you ever know the Population Standard Deviation?
For most practical applications, you are unlikely to use a hypothesis-testing method that requires knowing $\sigma$. If you knew the population standard deviation, you would also know the population mean and therefore have no need to form and then test. A hypothesis about the mean **Then why study a hypothesis test of the mean which requires that $\sigma$ is known?** Explaining the fundamentals of hypothesis testing is simpler when using such a test. With a known population standard deviation, you can use the normal distribution and compute p-values using the tables of the normal distribution.

### t Test of Hypothesis for the Mean (Standard Deviation Unknown)

In virtually all hypothesis-testing situations concerning the population mean, m, you do not know the population standard deviation, $\sigma$. However, you will always be able to know the sam- ple standard deviation, S. If you assume that the population is normally distributed, then the sampling distribution of the mean will follow a t distribution with **n - 1** degrees of freedom and you can use the t test for the mean. If the population is not normally distributed, you can still use the t test if the population is not too skewed and the sample size is not too small.

$\large t_{STAT} = \large \frac{\bar{X}-\mu}{\frac{S}{\sqrt{n}}}$

where the $t_{STAT}$ test statistic follows a t distribution having **n-1** degrees of freedom.

> <font color = 'red'> _Here also we can go with Critical value approach and P-value approach as we did for $Z_{STAT}$_

### One-Tail tests

Till now we have seen only Two-Tail test, which has rejection region on either sides of the distribution curve.

In contrast, some hypothesis tests are one-tail tests because they require an alternative hypothesis that focuses on a particular direction.

One example of a one-tail hypothesis test would test whether the population mean is less than a specified value.

One such situation involves the business problem concerning the ser- vice time at the drive-through window of a fast food restaurant.

The speed with which customers are served is of critical importance to the success of the service. In that study, an audit of McDonald’s drive-throughs had a mean service time of 189.49 seconds, which was slower than the drive-throughs of several other fast-food chains. Suppose that McDonald’s began a quality improvement effort to reduce the service time by deploying an improved drive-through service process in a sample of 25 stores. Because McDonald’s would want to institute the new process in all of its stores only if the test sample saw a decreased drive-through time, the entire rejection region is located in the lower tail of the distribution.

#### The Critical value approach

You wish to determine whether the new drive-through process has a mean that is less than 189.49 seconds. To perform this one-tail hypothesis test, you use the six-step method :

+ **Step 1 :**
You define the null and alternative hypotheses:\
H0 : $\mu \ge$ 189.49 \
H1 : $\mu \le$189.49 \

The alternative hypothesis contains the statement for which you are trying to find evidence. If the conclusion of the test is “reject H0,” there is statistical evidence that the mean drive-through time is less than the drive-through time in the old process. 

This would be reason to change the drive-through process for the entire population of stores. If the conclusion of the test is “do not reject H0,” then there is insufficient evidence that the mean drive-through time in the new process is significantly less than the drive-through time in the old process. If this occurs, there would be insufficient reason to institute the new drive-through process in the population of stores.

+ **Step 2 :**
You collect the data by selecting a sample of n = 25 stores. You decide to use a = 0.05.

+ **Step 3 :**
Because $\sigma$ is unknown, you use the t distribution and the $t_{STAT}$ test statistic. You need to assume that the drive-through time is normally distributed because a sample of only 25 drive-through times is selected.

+ **Step 4 :**
The rejection region is entirely contained in the lower tail of the sampling distribution of the mean because you want to reject $H_0$ only when the sample mean is significantly less than 189.49 seconds. When the entire rejection region is contained in one tail of the sampling distribution of the test statistic, the test is called a one-tail test, or directional test. If the alternative hypothesis includes the less than sign, the critical value of t is negative. As shown in Table 9.3 and Figure 9.10, because the entire rejection region is in the lower tail of the t distribution and contains an area of 0.05, due to the symmetry of the t distribution, the critical value of the t test statistic with 25 - 1 = 24 degrees of freedom is -1.7109.

The decision rule is
reject H0 if $t_{STAT}<$-1.7109;\
otherwise, do not reject H0.

+ **Step 5 :** 
From the sample of 25 stores you selected, you find that the sample mean service time at the drive-through equals 175.8 seconds and the sample standard deviation equals 21.3 seconds. Using n = 25, X = 175.8, S = 21.3

$t_{STAT} = \large \frac{\bar{X}-\mu}{\frac{S}{\sqrt{n}}}$

$\text{i.e},\large \frac{\text{175.8-189.49}}{\frac{\text{21.3}}{\sqrt{\text{25}}}}$ = -3.2136

+ **Step 6 :**
Because $t_{STAT}$ = -3.2136 < -1.7109, you reject the null hypothesis. You conclude that the mean service time at the drive-through is less than 189.49 seconds. There is sufficient evidence to change the drive-through process for the entire population of stores.

**In the Similar manner we can perform P-value approach**