# **Concepts Of Hypothesis Testing**

## Understanding Hypothesis Testing

Let’s understand the basic difference between inferential statistics and hypothesis testing.

 

* **Inferential statistics** is used to find some population parameter (mostly population mean) when you have no initial number to start with. So, you start with the sampling activity and find out the sample mean. Then, you estimate the population mean from the sample mean using the confidence interval.

 

* **Hypothesis testing** is used to confirm your conclusion (or hypothesis) about the population parameter (which you know from EDA or your intuition). Through hypothesis testing, you can determine whether there is enough evidence to conclude if the hypothesis about the population parameter is true or not.

 

Both these modules have a few similar concepts, so don’t confuse terminology used in hypothesis testing with inferential statistics.

Hypothesis Testing starts with the formulation of these two hypotheses:

**Null hypothesis (H₀):** 
* Preveiling belief about population.
* The status quo is true.

**Alternate hypothesis (H₁):** 
* The challenge to the status quo is false.
* These two oppose each other.

e.g.

Avg. commute time of an employee to office is = 36.5mins

Null hypothesis: Avg commute time = 36.5mins

Alternate hypothesis: Avg commute time != 36.5mins

**H0 means Null hypothesis**<br>
**H1 means Alternate hypothesis**

**Q. 1:**
Null and Alternate Hypotheses<br>

In the Maggi Noodles example, if you fail to reject the null hypothesis, what can you conclude from this statement?

**Solution:**
<br>
Maggi Noodles do not contain excess lead
✓ Correct<br>
Feedback:<br>
The null hypothesis is that the average lead content is less than or equal to 2.5 ppm. Since you fail to reject the null hypothesis, you can conclude that Maggi Noodles do not contain excess lead. Please note than you can only fail to reject the null hypothesis, you can never accept the null hypothes

**Q. 2:**
Types of Hypotheses<br>
The null and alternative hypotheses divide all possibilities into

**Solution**

2 non-overlapping sets
✓ Correct<br>
Feedback:<br>
Both the null and alternate hypotheses can’t be true at the same time. Only one of them will be true.

## Null and Alternate Hypotheses

The first step of hypothesis testing is the formulation of the null and alternate hypotheses for a given situation. Let’s learn how to do this through different examples.

But in some instances, if your claim statement has words like **“at least”**, **“at most”**, **“less than”**, or **“greater than”**, you cannot formulate the null hypothesis just from the claim statement (because it’s not necessary that the claim is always about the status quo).

You can use the following rule to formulate the null and alternate hypotheses:

The null hypothesis always has the following signs:  =  OR   ≤   OR    ≥

The alternate hypothesis always has the following signs:  ≠   OR  >   OR    <

Let's take an example:

![image.png](attachment:30856e50-aaad-4788-b1e2-f035a30afeda.png)
 
 
 **Q. 3**

Null and Alternate Hypotheses<br>
Flipkart claimed that its total valuation in December 2016 was $14 billion.

What would be the alternate hypothesis for the given situation?

**Answer:**

H₁: Total valuation ≠ $14 billion

![image.png](attachment:3a132560-f32f-4e38-89ea-23973993e3df.png)

**Q. 4**

The average commute time for an UpGrad employee to and from office is at least 35 minutes.

What will be the null and alternate hypotheses in this case if the average time is represented by μ?

**Answer:**
H₀: μ ≥ 35 minutes and H₁: μ < 35 minutes<br>
The null hypothesis is always formulated by either = or ≤ or ≥ whereas the alternate hypothesis is formulated by ≠ or > or <. In this case, the average time taken was greater than or equal to 35 minutes. So, that becomes the null hypothesis. Less than 35 minutes becomes the alternate hypothesis.

**Q. 5**
Goodyear claims that each of its tyres can travel more than 7500 miles on average before they need any replacement.

Assuming that the average travel distance is given by μ, what would be the null and the alternate hypothesis in this case?

**Answer:**
H₀: μ  ≤  7500 miles and H₁: μ > 7500 miles
<br>
The null hypothesis is always formulated by either = or ≤ or ≥ whereas the alternate hypothesis is formulated by ≠ or > or <. If you check the claim statement, it has the > sign (i.e. μ> 7500 miles). Hence the null hypothesis would be the complement of the claim statement i.e. μ ≤ 7500 miles, and the alternate hypothesis would be the claim statement itself or μ > 7500 miles.

## Making a Decision

Making the decision to either reject or fail to reject the null hypothesis — through an interesting example of a friend playing archery.

**Ram's claim:** Avg score in archery = 70

**Over 5 games of Archery:**
* Avg score =20 ---> Less likely to belive his claim about score 70
* Avg score = 65 ---> Good chance of beleiving his claim.

So what is the point to belive his claim?

What would be the null and alternate hypotheses for Ram’s claim?

**H₀: μ = 70 and H₁: μ ≠ 70**
<br>
**The null hypothesis is always formulated by either = or ≤ or ≥. Here, Ram’s claim is that her score is equal to 70, so that would become the null hypothesis. And alternate hypothesis would be that her score is not equal to 70.**
<br>
<br>
**We need to find out the criticial point below 70 where we can say that his actual score is really lower than 70**

![image.png](attachment:ad78fbef-4212-4391-8c24-7e39bc4ccc73.png)

**Q. 6**

If your sample mean lies in the acceptance region, then:

**Answer:**
You fail to reject the null hypothesis<br>

If your sample mean lies in the acceptance region, you fail to reject the null hypothesis because it is not beyond the critical point and you can consider that sample mean is equal to the population mean statistically.

The formulation of the null and alternate hypotheses determines the type of the test and the position of the critical regions in the normal distribution.

 

You can tell the type of the test and the position of the critical region on the basis of the ‘sign’ in the alternate hypothesis.

      

* ≠ in H₁    →   Two-tailed test        →     Rejection region on both sides of distribution

* < in H₁    →   Lower-tailed test     →     Rejection region on left side of distribution

* > in H₁    →   Upper-tailed test     →     Rejection region on right side of distribution


**Q. 7**

The average commute time for an UpGrad employee to and from office is at least 35 minutes.

If this hypothesis has to be tested, select the type of the test and the location of the critical region.

**Answer:**<br>

Lower-tailed test, with the rejection region on the left side
<br>
For this situation, the hypotheses would be formulated as H₀: μ ≥ 35 minutes and H₁: μ < 35 minutes. As < sign is used in alternate hypothesis, it would be a lower-tailed test and the rejection region would be on the left side of the distribution.

**Q. 8**

A courier claims that the average delivery time of goods is 3 days. Choose the correct formulation of hypothesis along with the correct test.

**Answer:**
H0 always has an equal sign. And since H1 has an unequal sign, we need to test both the sides.

**Q. 9**

A researcher claims that the weight of an average male Bengal tiger is less than 220 KG.

**Answer:**<br>
H0:μ>=220kg, H1:μ<220kg, Test:Lower-tailed test


## Critical Value Method

Now, let’s learn how to find the critical values for the critical region in the distribution and make the final decision of rejecting or failing to reject the null hypothesis.

(Note: In the video below, the graph showing the distribution of average sales data at 1:06 incorrectly displays 370.6 as the sample mean instead of 370.16. Also, it would be σ¯x[delta subscript x bar] = 15 instead of σ =15 at 3:41)

**Q. 10**

What will be the standard deviation of the distribution of sample means if the population has a mean of 350 and a standard deviation of 90, a sample mean of 370.16, and a sample size of 36?


**Answer:**
<br>

15<br>

<h5 style = 'color:Blue'>The standard deviation of a distribution of sample means is obtained by dividing the population standard deviation by the square root of the sample size. So,90/√36 = 90/6 = 15.</h5>



**IMPORTANT!!!**

* There is one critical value where you take a decision. Like if the mean sales is 70 as H0, you decide if you get a mean which is say greater than 80 and less than 20 you will reject the mean to be 70. i.e. you choose what are your critical values. Now, mind that these critical values are means themselves in the sampling distribution. There are several means in the sampling distributions. Their probability is plotted as a normal distribution. So, you choose what probability you consider as your inflection point. That is the alpha value. That decides which mean value is too big or too small. This absolute value of the mean that is too far, might be different for different cases. So, we always talk about the alpha value as the probability, since that will help us to dynamically contextualize based on what is the mean for a given case. This probability level is known as Significance Level (α). Significance level is 1-confidence level.
* From the significance level just figure out the **z score using the formula = NORM.S.INV().** If not specified use 0.05
* From the **z score find the actual mean by multiplying the Standard error of the means SEM * zscore. This zscore is known as z-critical or Zc**
* From the **Zc find the actual mean by multiplying with SEM. (Same way you find the confidence interval’s margin of error)**
* Add and subtract the margin of error got from step iv to the mean value to get the UCV and LCV.

**Problem 2**

The problem statement might be upper tailed, lower tailed or dual tailed, with max and min boundaries. But in either case the z score will be just one value. Only the sign would change for two tailed case!

![image.png](attachment:2b16c275-1c65-49b1-a1db-28d37e28969b.png)

After formulating the hypothesis, the steps you have to follow to make a decision using the critical value method are as follows:

* Calculate the value of Zċ from the given value of α (significance level). Take it as 5% if not specified in the problem.

* Calculate the critical values (UCV and LCV) from the value of Zċ.

* Make the decision on the basis of the value of the sample mean x with respect to the critical values (UCV AND LCV).

**Q. 11**<br>

What will be the area of the critical region on the right-hand side of the distribution if the significance level (α) for a two-tailed test is 3%?

**Answer:**
<br>
<h5 style ='color:Blue'>Here, value of α is 0.03 (of 3%), so the area of the rejection region would be 0.03 and the area of the acceptance region would be 0.97. In addition, since this is a two-tailed test, the area of the critical region on the right-hand side would be half of 0.03, i.e. 0.015.</h5>

**Q. 12**<br>
What would be the area of the critical region on the right-hand side of the distribution if the significance level (α) for an upper-tailed test is 3%?

**Answer:**
<br>
<h5 style ='color:Blue'>Here, the value of α is 0.03 (of 3%), so the area of the critical region would be 0.03 and the area of the acceptance region would be 0.97. Since this is an upper-tailed test, the critical region is only on the right-hand side of the distribution, and the area of the critical region would be 0.03.</h5>

**Q. 13**<br>
What would be the value of the cumulative probability of UCV if the significance level (α) for an upper-tailed test is 3%?

**Answer:**
<br>
<h5 style ='color:Blue'>The area of the critical region in this case would be 0.03 (as calculated in the last question), which would be the area beyond the UCV point in the distribution. So, the area till the UCV point would be 1 - 0.03, i.e. 0.97. This would be the cumulative probability of that point, going by the definition of cumulative probability.</h5>

**Pre-requisite!!!**<br>
A manufacturer claims that the average life of its product is 36 months. An auditor selects a sample of 49 units of the product, and calculates the average life to be 34.5 months. The population standard deviation is 4 months. Test the manufacturer’s claim at 3% significance level using the critical value method.
 

First, you need to formulate the hypotheses for this two-tailed test, which would be:

                                   H₀:μ = 36 months and H₁: μ ≠ 36 months


Now, you need to follow the three steps to find the critical values and make a decision.

Try out the three-step process by answering the following questions.

**Q. 14**<br>
1st step: Calculate the value of Zc from the given value of α (significance level).

Calculate the z-critical score for the two-tailed test at 3% significance level.

**Answer:**
<h5 style = 'color:Blue'>For 3% significance level, you would have two critical regions on both sides with a total area of 0.03. So, the area of the critical region on the right side would be 0.015, which means that the area till UCV (cumulative probability of that point) would be 1 - 0.015 = 0.985. So, you need to find the z-value of 0.985. The z-score for 0.9850 in the z-table is 2.17 (2.1 on the horizontal axis and 0.07 on the vertical axis).</h5>

**Q. 15**<br>
2nd step: Calculate the critical values (UCV and LCV) from the value of Zc.

Find out the UCV and LCV values for Zc = 2.17.

μ = 36 months        σ = 4 months       n (Sample size) = 49

**Answer:**
<h5 style = 'color:Blue'>The critical values can be calculated from μ ± Zc x (σ/​√N​) as 36 ± 2.17(4/​√49​) = 36 ± 1.24 which comes out to be 37.24 and 34.76.</h5>

**Q. 16**<br>
3rd step: Make the decision on the basis of the value of the sample mean ​¯x with respect to the critical values (UCV AND LCV).

What would be the result of this hypothesis test?

UCV = 37.24 months                 LCV = 34.76 months              Sample mean (​¯x) = 34.5 months

**Answer:**
<h5 style = 'color:Blue'>The UCV and LCV values for this test are 37.24 and 34.76. The sample mean in this case is 34.5 months, which is less than LCV. So, this implies that the sample mean lies in the critical region and you can reject the null hypothesis.</h5>

## Critical Value Method - Examples

**Q. 17**<br>

Consider this problem — H₀: μ ≤ 350 and H₁: μ > 350

In case of a two-tailed test, you find the z-score of 0.975 in the z-table, since 0.975 was cumulative probability of UCV in that case. In this problem, what would be the cumulative probability of critical point in this example for the same significance level of 5%?

**Answer:**

<h5 style = 'color:Blue'>**0.950
In this problem, the area of the critical region beyond the only critical point, which is on the right side, is 0.05 (in the last problem, it was 0.025). So, the cumulative probability of the critical point (the total area till that point) would be 0.950.
</h5>

**Q. 18**<br>
Consider this problem — H₀: μ ≤ 350 and H₁: μ > 350

The next step would be to find the Zc, which would basically be the z-score for the value of 0.950. Look at the z-table and find the value of Zc.

**Answer:**<br>
<h5 style = 'color:Blue'>0.950 is not there in the z-table. So, look for the numbers nearest to 0.950. You can see that the z-score for 0.9495 is 1.64 (1.6 on the horizontal bar and 0.04 on the vertical bar), and the z-score for 0.9505 is 1.65. So, taking the average of these two, the z-score for 0.9500 is **1.645**.</h5>

**Q. 19**<br>
Consider this problem, H₀: μ ≤ 350 and H₁: μ > 350

So, the Zc comes out to be 1.645. Now, find the critical value for the given Zc and make the decision to accept or reject the null hypothesis.

μ = 350     σ = 90       N (Sample size) = 36    
 ¯x= 370.16

**Answer:**<br>
<h5 style = 'color:Blue'>The critical value can be calculated from μ + Zc x (σ/√N). 350 + 1.645(90/√36) = 374.67. Since 370.16 (¯x) is less than 374.67, ¯x lies in the acceptance region and you fail to reject the null hypothesis.</h5>

# Master In Hypothesis Concepts With Questions:

These below Questions will give a a grip on Hypothesis Testing Concept::

Government regulatory bodies have specified that the maximum permissible amount of lead in any food product is 2.5 parts per million or 2.5 ppm. Let’s say you are an analyst working at the food regulatory body of India FSSAI. Suppose you take 100 random samples of Sunshine from the market and have them tested for the amount of lead. The mean lead content turns out to be 2.6 ppm with a standard deviation of 0.6.

 

One thing you can notice here is that the standard deviation of the sample is given as 0.6, instead of the population’s standard deviation. In such a case, you can approximate the population’s standard deviation to the sample’s standard deviation, which is 0.6 in this case.

**Q. 20**
<br>
Select the correct null and alternate hypotheses in this case.

**Answer:**
<br>
H₀: Average lead content ≤ 2.5 ppm and H₁: Average lead content > 2.5 ppm<br>
<h5 style = "color:Blue">The null hypothesis is your assumption about the population — it is based on the status quo. It always makes an argument about the population using the equality sign. The null hypothesis in this case would be that the average lead content in the food material is less than or equal to 2.5 ppm. And the alternate hypothesis is that the average lead content is greater than 2.5 ppm.</h5>

**Q. 21**<br>
Calculate the z-critical score for this test at 3% significance level.

**Answer:**<br>
1.88
<h5 style = "color:Blue">This is a one-tailed test. So, for 3% significance level, you would have only one critical region on the right side with a total area of 0.03. This means that the area till the critical point (the cumulative probability of that point) would be 1 - 0.030 = 0.970. So, you need to find the z-value of 0.970. The z-score for 0.9699 (~0.970) in the z-table is 1.88.</h5>

**Q. 22**
<br>
Now, you need to find out the critical values and make a decision on whether to raise a regulatory alarm against Sunshine or not. Select the correct option.

**Answer:**
<br>
Critical value = 2.61 ppm and Decision: Don’t raise a regulatory alarm
<br>
<h5 style = "color:Blue">Feedback:
The critical value can be calculated from μ + Zc x (σ/​√N​), as 2.5 + 1.88(0.6 /​√100​) = 2.61 ppm . You need to use the + sign since the critical value is on the right-hand side (upper-tailed test). Since the sample mean 2.6 ppm is less than the critical value (2.61 ppm), you fail to reject the null hypothesis and don’t raise a regulatory alarm against Sunshine.</h5>

**Q. 23**
<br>
The critical value for this test at 3% significance level comes out to be 2.61 ppm. If you take more than 100 samples (with the same sample mean and standard deviation), how would the z-score and critical value change?

**Answer:**
<br>
<h5 style = "color:Blue">The z-score would remain the same but the critical value would decrease
Since Zc is calculated from the given value of α (3%), it remains the same. Critical value is calculated using the formula: μ + Zc x (σ/​√N​), since it is an upper-tailed test. If you increase the value of N, the critical value would decrease according to the formula.
</h5>

# Hypothesis Testing - II

## The p-value Method

The higher the p-value, the higher is the probability of failing to reject a null hypothesis. And the lower the p-value, the higher is the probability of the null hypothesis being rejected.

![image.png](attachment:f6770a50-4c3b-4f58-b0b2-92295ef4cbee.png)

After formulating the null and alternate hypotheses, the steps to follow in order to make a decision using the p-value method are as follows:

* Calculate the value of the z-score for the sample mean point on the distribution.
* Calculate the p-value from the cumulative probability for the given z-score using the z-table.* 
* Make a decision on the basis of the p-value (multiply it by 2 for a two-tailed test) with respect to the given value of α (significance value).

To find the correct p-value from the z-score, find the cumulative probability first, by simply looking at the z-table, which gives you the area under the curve till that point.


Now try some questions:

**Problems!!!**

You are working as a data analyst at an auditing firm. A manufacturer claims that the average life of its product is 36 months. An auditor selects a sample of 49 units of the product and calculates the average life to be 34.5 months. The population standard deviation is 4 months. Test the manufacturer’s claim at a 3% significance level using the p-value method.


First, formulate the hypothesis for this two-tailed test, which would be:

H₀: μ = 36 months and H₁: μ ≠ 36 months

Now, you need to follow the three steps to find the p-value and make a decision.
Try out the three-step process by answering the following questions.

**Q. 24**<br>
Step 1: Calculate the value of the z-score for the sample mean point of the distribution. Calculate the z-score for the sample mean (¯x) = 34.5 months.

**Answer:**
<br>
You can calculate the z-score for the sample mean of 34.5 months using the formula: (​​¯x​​ - μ)/(σ/​√n​). This gives you (34.5 - 36)/(4/√49) = (-1.5) * 7/4 = -2.62. Notice that since the sample mean lies on the left side of the hypothesised mean of 36 months, the z-score comes out to be negative.

**Q. 25**<br>
Calculate the p-value from the cumulative probability for the given z-score using the z-table. Find out the p-value for a z-score of -2.62 (corresponding to the sample mean of 34.5 months). 

Hint: The sample mean is on the left side of the distribution, and it is a two-tailed test.

**Answer:**
<br>
The value in the z-table corresponding to -2.6 on the vertical axis and 0.02 on the horizontal axis is 0.0044. Since the sample mean is on the left side of the distribution and this is a two-tailed test, the p-value would be 2 * 0.0044 = 0.0088.

**Q.26**<br>
 Make the decision on the basis of the p-value with respect to the given value of α (significance value). What would the result of this hypothesis test be?
 
 **Answer:**<br>
 Here, the p-value comes out to be 2 * 0.0044 = 0.0088. Since the p-value is less than the significance level (0.0088 < 0.03), you reject the null hypothesis that the average lifespan of the manufacturer's product is 36 months.

## The p-value Method: Examples

Let’s say you work at a pharmaceutical company that manufactures an antipyretic drug in tablet form, with paracetamol as the active ingredient. An antipyretic drug reduces fever. The amount of paracetamol deemed safe by the drug regulatory authorities is 500 mg. If the value of paracetamol is too low, it will make the drug ineffective and become a quality issue for your company. On the other hand, a value that is too high would become a serious regulatory issue.

 
There are 10 identical manufacturing lines in the pharma plant, each of which produces approximately 10,000 tablets per hour.


Your task is to take a few samples, measure the amount of paracetamol in them, and test the hypothesis that the manufacturing process is running successfully, i.e., the paracetamol content is within regulation. You have the time and resources to take about 900 sample tablets and measure the paracetamol content in each.


Upon sampling 900 tablets, you get an average content of 510 mg with a standard deviation of 110. What does the test suggest if you set the significance level at 5%? Should you be happy with the manufacturing process, or should you ask the production team to alter the process? Is it a regulatory alarm or a quality issue?

Solve the following questions in order to find the answers to the questions stated above.

One thing you can notice here is that the standard deviation of the sample of 900 is given as 110 instead of the population standard deviation. In such a case, you can assume the population standard deviation to be the same as the sample standard deviation, which is 110 in this case.


**Q. 27**<br>
The p-Value Method
Calculate the Z-score for the sample mean (¯x) = 510 mg.

**Answer:**<br>
You can calculate the Z-score for the sample mean of 510 mg using the formula: (​¯x​ - μ) / (σ /​√N​). This gives you (510 - 500)/(110/√900) = (10)/(110/30) = 2.73. Notice that since the sample mean lies on the right side of the hypothesised mean of 500 mg, the Z-score comes out to be positive.

**Q. 28**<br>
Find out the p-value for the Z-score of 2.73 (corresponding to the sample mean of 510 mg).

**Answer:**
<br>
The value in the Z-table corresponding to 2.7 on the vertical axis and 0.03 on the horizontal axis is 0.9968. Since the sample mean is on the right side of the distribution and this is a two-tailed test (because we want to test whether the value of the paracetamol is too low or too high), the p-value would be 2 * (1 - 0.9968) = 2 * 0.0032 = 0.0064.

**Q. 29**<br>

Based on this hypothesis test, what decision would you make about the manufacturing process?

**Answer:**<br>
Here, the p-value comes out to be 0.0064. Here, the p-value is less than the significance level (0.0064 < 0.05) and a smaller p-value gives you greater evidence against the null hypothesis. So, you reject the null hypothesis that the average amount of paracetamol in medicines is 500 mg. So, this is a regulatory alarm for the company, and the manufacturing process needs to change

A nationwide survey claimed that the unemployment rate of a country is at least 8%. However, the government claimed that the survey was wrong and the unemployment rate is less than that. The government asked about 36 people, and the unemployment rate came out to be 7%. The population standard deviation is 3%.

**Q. 30**<br>
What are the null and alternative hypotheses in this case?

**Answer:**
<br>
**H0:μ ≥ 8% and H1:μ < 8%**<br>
The null hypothesis must contain a = sign. The two arithmetic signs possible in this scenario are >= or <. Since >= has an equal to sign it becomes the null hypothesis.

**Q. 31**<br>
Based on the information above, conduct a hypothesis test at a 5% significance level using the p-value method. What is the Z-score of the sample mean point ¯x= 7%?

**Answer:**<br>
**μ=8%;σ=3%;n=36;¯x=7%;S.E.=3√36=0.5. Now, Z¯x=7−80.5=−2**

**Q. 32**<br>
Calculate the p-value from the cumulative probability for the given Z-score using the Z-table. In other words, find out the p-value for the Z-score of -2.0 (corresponding to the sample mean of 7%).

**Answer:**<br>
**The p-value corresponding to a Z-score of -2.0 is 0.0228.**

**Q. 33**<br>
Make the decision on the basis of the p-value with respect to the given value of α (significance value).

**Answer:**<br>
**Since the p-value of the Z-score of the sample mean is less than the given p-value of 0.05, we reject the null hypothesis H0:μ≥8%.**

# Types of Errors

While doing hypothesis testing, there is always a possibility of making the wrong decision about your hypothesis; such instances are referred to as 'errors.
There are two types of errors that you might make in the hypothesis testing process: type-I error and type-II error.

![image.png](attachment:0f3c49b7-d533-4484-a1d5-e251afcaff95.png)

* A type I-error, represented by α, occurs when you reject a true null hypothesis.

* A type-II error, represented by β, occurs when you fail to reject a false null hypothesis.

The power of any hypothesis test is defined by 1 - β. The power of the test or the calculation of β is beyond the scope of this course. You can study more about the power of a test at   [Link](https://statisticsbyjim.com/hypothesis-testing/types-errors-hypothesis-testing/).



**Q. 34**<br>
Suppose you are conducting a hypothesis test where the sample size is 49. Now, you want to conduct another hypothesis test on a different sample, where the sample size is 121. The p-value calculated in the first case comes out to be 0.0512. What will happen to the p-value in the second case if you observe the same values for the sample mean and the sample standard deviation for both the cases?

**Answer:**
<br>
<h5 style = 'color: Blue'>With an increase in the sample size, the denominator of the Z-score decreases, and thus, the absolute value of Z-score increases, which means that the sample mean would move away from the central tendency towards the tails. This means that the p-value would actually decrease. Conceptually, increasing the sample size will make the distribution of the sample means narrower, and chances of the sample mean falling in the critical region increase. So, the p-value will decrease.</h5>

**Q. 35**<br>
Consider the null hypothesis that a process produces no more than the maximum permissible rate of defective items. In this situation, a type-II error would be ___.

**Answer:**
<br>
<h5 style = 'color: Blue'>A type-II error refers to not rejecting an incorrect null hypothesis. So, a type-II error would signify that the null hypothesis is actually incorrect, i.e., the process actually produces more than the maximum permissible rate of defective items, but you fail to reject it. In other words, you think that it does not produce more than the maximum permissible rate of defective items.</h5>

**Q. 36**<br>
A screening test for a serious but curable disease is similar to hypothesis testing. In this instance, the null hypothesis would be that the person does not have the disease, and the alternative hypothesis would be that the person has the disease. If the null hypothesis is rejected, it means that the disease is detected and treatment will be provided to the particular patient. Otherwise, it will not. Assuming that the treatment does not have serious side effects, in this scenario, it is better to increase the probability of ___.

**Answer:**
<br>
<h5 style = 'color: Blue'>A type-II error refers to not rejecting an incorrect null hypothesis. So, a type-II error would signify that the null hypothesis is actually incorrect, i.e., the process actually produces more than the maximum permissible rate of defective items, but you fail to reject it. In other words, you think that it does not produce more than the maximum permissible rate of defective items.</h5>

# Business Understanding

## Click-Through Rate (**CTR**)
<br>
Clickthrough Rate is a pretty generic yet important term in the online marketing industry. In plain terms, it measures the action rate of some entity - may it be a banner ad, a home page of some website and so on. Now let's understand how the Click-Through Rate or CTR can be defined in the search context.

So in the case of online search, the CTR is the proportion of searches that were successful. When you're trying to compute the Search CTR, the relevant formula is given as -

**Search CTR =** (**Total numbers of successfull searches**)/(**Total numbers of searches**)

Now in the next segment, you'll be introduced to the problem statement.

