<b><u>Level of significance and Confidence</u></b>

<b>We can not be 100% sure to conclude something for the population with sample data<b>

In statistics, the level of significance and confidence are two related concepts often used in hypothesis testing and constructing confidence intervals.

1. **Level of Significance (α)**:
   - The level of significance, denoted by α, is the probability of rejecting the null hypothesis when it is actually true.
   - It represents the risk of making a Type I error, which occurs when you reject a true null hypothesis.
   - Commonly used levels of significance include 0.05 (5%), 0.01 (1%), and 0.10 (10%). These values are somewhat arbitrary and are chosen based on the context of the problem and the acceptable balance between Type I and Type II errors.

   <b>*p-value*</b>:

      The p-value is a probability value used to determine the statistical significance of an observed effect in hypothesis testing.
      It quantifies the strength of evidence against the null hypothesis. A smaller p-value indicates stronger evidence against the null hypothesis.

      If the p-value is less than or equal to the chosen level of significance (α), typically 0.05, then the null hypothesis is rejected in favor of the alternative hypothesis.

      A common interpretation is: "The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample data, assuming the null hypothesis is true."

      The p-value is directly related to the level of significance (α):

      If the p-value is less than α, you reject the null hypothesis.
      
      If the p-value is greater than α, you fail to reject the null hypothesis.

2. **Confidence Level**:
   - The confidence level is the complement of the level of significance. If the level of significance is α, then the confidence level is (1 - α).
   - It represents the probability that the confidence interval will contain the true population parameter, assuming the sampling procedure is repeated a large number of times.
   - For example, if you construct a 95% confidence interval for a population parameter, it means that if you were to construct many confidence intervals using the same procedure, approximately 95% of them would contain the true population parameter.
   - Common confidence levels include 90%, 95%, and 99%, but other values can be used as well.

These concepts are often used together in hypothesis testing and estimation:

- In hypothesis testing, you choose a level of significance (e.g., α = 0.05), and if the p-value calculated from the sample data is less than α, you reject the null hypothesis. The smaller the α, the less likely you are to reject the null hypothesis when it is true, leading to a more stringent test.
  
- In constructing confidence intervals, you choose a confidence level (e.g., 95%), and the resulting interval is an estimate for the population parameter. The higher the confidence level, the wider the interval, as you are more certain (with higher probability) that the true parameter lies within it.

Both concepts are essential in statistical inference, helping researchers draw conclusions from data while considering the inherent uncertainty involved in statistical analysis.

      1.consider an example involving the average height of students in a school.


Suppose we want to test the hypothesis that the average height of students in the school is 165 centimeters. We collect a sample of 50 students and calculate their average height, which turns out to be 163 centimeters, with a standard deviation of 8 centimeters.

1. **Level of Significance (α)**:
   Let's choose a significance level of α = 0.05. This means we're willing to accept a 5% chance of incorrectly rejecting the null hypothesis (that the average height is 165 cm) when it's actually true.

2. **Hypothesis Test**:
   - Null Hypothesis (H0): The average height of students is 165 centimeters.
   - Alternative Hypothesis (H1): The average height of students is not 165 centimeters.

   We will use a t-test to compare the sample mean to the population mean.

3. **Confidence Level**:
   Let's construct a 95% confidence interval for the true average height of students in the school.

Now, let's perform the calculations:

- **Hypothesis Test**:
  Using the sample data, we calculate the t-statistic and corresponding p-value. Suppose we find that the t-statistic is -2.5 and the p-value is 0.015. Since the p-value is less than α (0.05), we reject the null hypothesis. This means there is sufficient evidence to conclude that the average height of students is not 165 centimeters.

- **Confidence Interval**:
  Using the sample mean, standard deviation, and the appropriate t-distribution value (from a t-table or software), we calculate the 95% confidence interval. Let's say we find it to be (160.2, 165.8) centimeters. This means we are 95% confident that the true average height of students in the school lies between 160.2 and 165.8 centimeters.

In this example, the level of significance (α = 0.05) helped us make a decision about the null hypothesis based on the p-value, while the confidence level (95%) provided an interval estimate for the true population parameter (average height). Both concepts are crucial in drawing conclusions from the data while considering uncertainty.

        2 consider an example involving a medical study investigating the effectiveness of a new drug in reducing blood pressure.

1. **Level of Significance (α)**:
   Let's set our significance level at α = 0.05, indicating that we're willing to accept a 5% chance of incorrectly concluding that the drug is effective (rejecting the null hypothesis) when it's actually not.

2. **Hypothesis Test**:
   - Null Hypothesis (H0): The new drug has no effect on reducing blood pressure.
   - Alternative Hypothesis (H1): The new drug is effective in reducing blood pressure.

   We will use a two-sample t-test to compare the mean blood pressure before and after administering the drug.

3. **Confidence Level**:
   Let's construct a 95% confidence interval for the mean reduction in blood pressure due to the drug.

Now, let's perform the calculations:

- **Hypothesis Test**:
  Suppose we collect data from a sample of 100 patients and measure their blood pressure before and after administering the drug. We calculate the mean reduction in blood pressure and find that it's statistically significant (p < 0.05). Therefore, we reject the null hypothesis and conclude that the new drug is effective in reducing blood pressure.

- **Confidence Interval**:
  Using the same sample data, we calculate the 95% confidence interval for the mean reduction in blood pressure. Let's say we find it to be (5 mmHg, 10 mmHg). This means we are 95% confident that the true mean reduction in blood pressure due to the drug lies between 5 mmHg and 10 mmHg.

In this medical example, the level of significance helped us make a decision about the effectiveness of the drug based on the p-value, while the confidence interval provided an estimate for the true mean reduction in blood pressure with a certain level of confidence. Both concepts are crucial in medical research for drawing conclusions about treatment effectiveness while considering uncertainty and variability in patient responses.

<b><u>Hypothesis Tests</u></b>

**Z-test and t-test**: Used for testing hypotheses about population means when the population standard deviation is known (Z-test) or unknown (t-test).

**Chi-square test**: Used for testing hypotheses about the association between categorical variables.

**ANOVA (Analysis of Variance)**: Used for comparing means across multiple groups.

**F-test:** Used for testing hypotheses about variances or regression coefficients.

Hypothesis testing is a fundamental tool in statistics, helping researchers draw conclusions from data and make informed decisions based on evidence.

<img src="https://leanmanufacturing.online/wp-content/uploads/2020/10/Hypothesis-test-decision-tree.png" height="350">

<img src="https://sixsigmadsi.com/wp-content/uploads/2023/12/SSDSI-Infographics-13.jpg" height="450">

<img src="https://slideplayer.com/slide/3829413/13/images/3/Quick+recipe+%231%3A+One+sample+T+test.jpg" height="450">

**Steps of Hypothesis Testing**

**One more**
<img src="https://global-s3.s3.us-west-2.amazonaws.com/Hypothesis_Test_3726aa8772.jpg" height="450">

<img src ="https://miro.medium.com/v2/resize:fit:1268/1*QXlzi81JAPZZ_esB5y3FVA.png" height="450">

<img src ="https://miro.medium.com/v2/resize:fit:484/1*UW92hOUSdgyUhSQH2YOwKQ.png">

**Tail Tests**

<img src ="https://ars.els-cdn.com/content/image/3-s2.0-B9780128008522000092-f09-06-9780128008522.jpg" width="450">

Q1 : Child Psychologist says that the avg. time working mother spend talking to their children is up to 11 minutes per day
To test the hypothesis , you conducted an experiment with random sample of 100 working mother and find that they spend 11.5 minutes per day talking with their children .Assume prior research suggest that the population standard deviation is 2.3 Conduct the test with 5% level of Significance (α = 0.05).
Do hypothesis testing :


Steps:
To conduct hypothesis testing for this scenario, we will follow these steps:

1. **Formulate Hypotheses**:
   - Null Hypothesis (H0): The average time working mothers spend talking to their children is 11 minutes per day.
     (H0: mu <= 11)
   - Alternative Hypothesis (Ha): The average time working mothers spend talking to their children is greater than 11 minutes per day.
     (Ha: mu > 11)

2. **Choose a Significance Level (α)**:
   - Given α = 0.05 (5%). and this is a one tail test as the alternate hypothesis is greater than mean

3. **Collect Data**:
   - Sample size (n) = 100 working mothers.
   - Sample mean bar{x}= 11.5 minutes per day.
   - Population standard deviation sigma = 2.3 minutes (based on prior research).

4. **Calculate Test Statistic**:
   - Since the population standard deviation sigma is known and the sample size is large (n > 30), we can use the z-test.
   - The test statistic for testing the population mean is calculated using:
   
​<img src="https://study.com/cimages/multimages/16/zscoreformulaone.png">

     Z = (bar{x} - mu)/(sigma/sqrt{n})
   - Substitute the values:
     Z = 11.5 - 11/(2.3/sqrt{100})
       = approx 2.17

5. **Determine Critical Region**:
   - Since the alternative hypothesis is one-tailed (greater than), we are interested in the right tail of the standard normal distribution.
   - At a significance level of 0.05, the critical value for a one-tailed test is Zα = 1.645 (obtained from standard normal distribution table for z table).

   Search of 0.05 value or a closer to that it will be in minus as per table but sue to symmetry in distribution same can be taken for positive side. and chk for x and y values which -1.6 and 0.04 this will make the value 1.64 as appending.
   
<img src="Z Table-Ref-1.png" height="450">

<a href ="https://users.stat.ufl.edu/~mripol/STA2023/Z-table.pdf">Z Table Ref</a>

6. **Make a Decision**:
   - If the calculated test statistic (Z) is greater than the critical value (1.645), we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.

7. **Draw Conclusion**:
   - If we reject the null hypothesis, it means there is sufficient evidence to conclude that the average time working mothers spend talking to their children is greater than 11 minutes per day. Otherwise, we do not have enough evidence to support this claim.

Now, let's calculate the test statistic:

Since 2.17 > 1.645 (critical value), we reject the null hypothesis.

Therefore, we can conclude that there is sufficient evidence at the 5% significance level to suggest that the average time working mothers spend talking to their children is greater than 11 minutes per day.



**Another Approach using P value**

Alpha =0.05
Zstatistics = 2.17
(P value score as per statistics value)
To read value from table break Zstatistics = 2.17 in (2.1 and 0.07), check 2.1 in y axis( under z ) and the 0.07 in x axis and look for intersection.

<img src="Z Table-Ref.png" height="350">


The z-table provides the area to the left of the z-score, and since we are conducting a one-tailed test (greater than), we're interested in the area to the right of the z-score.

Given that the z-score you calculated is 2.17, the area to the right of this z-score should be:

1−0.9850=0.015

So, the correct p-value corresponding to a z-score of 2.17 should be approximately 0.015.


**Pvalue** = 1 - 0.9850 = 0.15

<a href ="https://users.stat.ufl.edu/~mripol/STA2023/Z-table.pdf">Z Table Ref</a>


Compare with the Significance Level (α):

Given that the significance level is 
0.05, since 
0.015<0.05, the p-value is less than the significance level.
Make a Decision:

Since the p-value is less than the significance level, we reject the null hypothesis.
Draw Conclusion:

We conclude that there is sufficient evidence at the 
5% 
significance level to suggest that the average time working mothers spend talking to their children is greater than 
11 minutes per day.

----

<b>Q</b> The avg. height of all ppl in a city is 168cm and sigma = 3.9. 36 individual were taken and height to be 169.5 cm Test the hypothesis with 5 % alpha

<b>Ans</b>:

Null Hy      : H0 = 168 cm 
Alternate Hy : Ha = not equal 168cm
Sigma = 0.05/2 = 0.025(as this is a two tail test)
Zstatistics = (169.5 - 168)/3.9/6
            = 2.30

Zcritical (for Sigma 0.025) as per Z score = -1.96 (checking bot the axis in Z score table)

as Zcrtitical 1.96 is more than Zstatistics = 2.30 hence reject null hypothesis



**Using p value:**

Zstatistics = 2.30

P value = 0.9893 (arera to the left of z)

being two tail test need to calculate p value from both the side 1-0.9893 = 0.0107

need to add this value from both the end = 0.0107 + 0.0107 = 0.0214

now this P value is less than sigma i.e. 0.05 and hence reject the null hypothesis.