In [1]:
import numpy as np
import pandas as pd
from scipy import stats

# What is Hypothesis Test

Youtube Link: https://www.youtube.com/watch?v=fb8BSFr0isg <br />
Youtube Link 2: https://www.youtube.com/watch?v=0oc49DyA3hU

Hypothesis testing is a fundamental concept in statistics and research methodology. It is a formal process used to make inferences about populations based on sample data. Hypothesis testing involves assessing the validity of a statement or hypothesis about a population parameter (such as a mean, proportion, or variance) by analyzing sample data.

Here are the key steps involved in hypothesis testing:

1. **Formulate Hypotheses**: The first step is to state two mutually exclusive hypotheses:
   - **Null Hypothesis (H0)**: This is the default or initial assumption. It typically represents a statement of no effect, no difference, or no change in the population parameter. It is what you're trying to test against.
   - **Alternative Hypothesis (Ha or H1)**: This represents the statement you want to test or prove. It can take several forms, depending on your research question, such as stating that there is an effect, a difference, or a change in the population parameter.



## Null and Alternate hypothesis

Definition: A Hypothesis is an idea that can be tested

## Steps Involved in Hypothesis Test

Hypothesis testing is a critical process in statistics used to make inferences about populations based on sample data. Here are the steps involved in conducting a hypothesis test:

1. **State the Hypotheses**:
   - **Null Hypothesis (H0)**: Begin by stating the null hypothesis, which is often a statement of no effect, no difference, or no change in the population parameter. It represents the default assumption.
   - **Alternative Hypothesis (Ha or H1)**: Define the alternative hypothesis, which is the statement you want to test or prove. It can assert the presence of an effect, a difference, or a change in the population parameter.

2. **Collect and Analyze Data**:
   - Collect a sample of data from the population of interest.
   - Perform appropriate statistical analysis on the sample data. The choice of analysis depends on the nature of your data and the hypothesis being tested. Common analyses include t-tests, chi-squared tests, ANOVA, regression analysis, etc.

3. **Choose a Significance Level (Alpha, α)**:
   - Select a significance level, denoted as α, which represents the probability of making a Type I error (incorrectly rejecting a true null hypothesis). Common choices for α include 0.05 (5%) and 0.01 (1%).

4. **Calculate the Test Statistic**:
   - Based on the sample data and the null hypothesis, calculate a test statistic that quantifies the difference between the sample results and what is expected under the null hypothesis. The specific test statistic depends on the hypothesis test you're conducting.

5. **Determine the Critical Region**:
   - Determine the critical region of the test, which is the range of values of the test statistic that would lead to the rejection of the null hypothesis. You can find critical values from statistical tables or software.

6. **Compare the Test Statistic to Critical Values or Calculate p-value**:
   - Depending on your chosen significance level:
     - Compare the calculated test statistic to the critical values. If the test statistic falls within the critical region, you reject the null hypothesis. If it falls outside the critical region, you fail to reject the null hypothesis.
     - Alternatively, calculate a p-value, which represents the probability of obtaining the observed results (or more extreme results) under the null hypothesis. If the p-value is less than α, you reject the null hypothesis. If it is greater than α, you fail to reject the null hypothesis.

7. **Make a Decision**:
   - Based on the comparison of the test statistic to critical values or the calculation of the p-value, make a decision:
     - If the test statistic is in the critical region or the p-value is less than α, reject the null hypothesis in favor of the alternative hypothesis.
     - If the test statistic is not in the critical region or the p-value is greater than α, fail to reject the null hypothesis.

8. **Draw a Conclusion**:
   - Based on your decision, draw a conclusion about the population parameter. If you reject the null hypothesis, it suggests evidence in favor of the alternative hypothesis. If you fail to reject the null hypothesis, you do not have sufficient evidence to support the alternative hypothesis.

9. **Report Results**:
   - Communicate your findings, including the decision made, the test statistic or p-value, the chosen significance level, and the implications for the research question.

Hypothesis testing is a structured and systematic approach to making statistical inferences and is widely used in various fields to make informed decisions based on data.

# Performing a Z-test

Performing a Z-test is a statistical procedure used to compare a sample mean to a known population mean to determine if there is a statistically significant difference between them. This test is typically used when you have a large enough sample size and know the population standard deviation. Here are the steps to perform a Z-test:

**Step 1: Define your null and alternative hypotheses:**
- Null Hypothesis (H0): This is the hypothesis that there is no significant difference between the sample mean and the population mean.
- Alternative Hypothesis (H1): This is the hypothesis that there is a significant difference between the sample mean and the population mean.

**Step 2: Collect and organize your data:**
- Gather your sample data and calculate the sample mean (x̄).

**Step 3: Determine the population mean (μ) and standard deviation (σ):**
- You should know or have access to the population mean and standard deviation. If not, you may need to estimate them based on historical data or other sources.

**Step 4: Calculate the standard error (SE):**
- The standard error is a measure of the standard deviation of the sample mean. You can calculate it using the formula:
  SE = σ / √(n)
  Where σ is the population standard deviation, and n is the sample size.

**Step 5: Calculate the Z-score:**
- The Z-score measures how many standard errors your sample mean is away from the population mean. You can calculate it using the formula:
  Z = (x̄ - μ) / SE

**Step 6: Determine the significance level (α) and find the critical Z-value:**
- The significance level (α) is the probability of making a Type I error (rejecting the null hypothesis when it's true). Common values for α are 0.05 and 0.01. Find the corresponding critical Z-value from the standard normal distribution table or calculator.

**Step 7: Compare the calculated Z-score to the critical Z-value:**
- If the absolute value of your calculated Z-score is greater than the critical Z-value, then you can reject the null hypothesis (H0) in favor of the alternative hypothesis (H1).

**Step 8: Draw a conclusion:**
- Based on the comparison in Step 7, make a conclusion about whether there is a statistically significant difference between the sample mean and the population mean.

Remember that the Z-test assumes that your data follows a normal distribution and that you have a sufficiently large sample size (usually n > 30) for the Central Limit Theorem to apply. If your sample size is small or you don't know the population standard deviation, you might need to use a t-test instead.

### Problem 
Suppose you are a quality control manager at a beverage company, and you want to determine if a new production process for filling soda bottles is consistent with the advertised fill volume. The company claims that the average fill volume of the bottles is 500 ml with a standard deviation of 10 ml. To test this claim, you take a random sample of 36 bottles and measure their fill volumes. The sample mean fill volume is 498 ml.

Perform a Z-test at a 5% significance level to determine if there is evidence to reject the company's claim.

### Solution (Python):

In [10]:
# Given data
population_mean = 500  # population mean fill volume (ml)
population_stddev = 10  # population standard deviation (ml)
sample_size = 36
sample_mean = 498  # sample mean fill volume (ml)
significance_level = 0.05

# Calculate the standard error
standard_error = population_stddev / np.sqrt(sample_size)
print("Standard Error: ", round(standard_error, 4))

# Calculate the Z-score
z_score = (sample_mean / population_mean) / standard_error
print("Z Score: ", z_score)

# Find the critical Z-value for a two-tailed test at 5% significance level
critical_z_value = stats.norm.ppf(1 - (significance_level/2))
print("Critical Z Value: ", critical_z_value)

# Calculate the p-value
p_value = 2 * (1 - (significance_level/2))
print("P Value:", p_value)

# Perform the Z-test
if np.abs(z_score) > critical_z_value:
    print("Reject the Null Hypothesis")
else:
    print("Fail to Reject the Null Hypothesis")

Standard Error:  1.6667
Z Score:  0.5976
Critical Z Value:  1.959963984540054
P Value: 1.95
Fail to Reject the Null Hypothesis


# Rejection Region Approach

In hypothesis testing, the rejection region approach is a way to determine whether to reject the null hypothesis based on the test statistic and a predefined significance level (alpha). It involves comparing the test statistic to critical values (or cutoff points) determined by the chosen significance level. If the test statistic falls into the rejection region, you reject the null hypothesis; otherwise, you fail to reject it.

Let's create a problem using the rejection region approach and then solve it using Python with Pandas and NumPy:

**Problem:**

A car manufacturer claims that a new type of engine they developed produces an average fuel efficiency of 40 miles per gallon (mpg) or more. To test this claim, an independent consumer organization takes a random sample of 25 cars equipped with these engines and measures their fuel efficiency. The sample mean fuel efficiency is 38 mpg, with a sample standard deviation of 4 mpg. Test the manufacturer's claim at a 5% significance level using the rejection region approach.

**Solution (Python):**
Null Hypothesis: The new type of engine developed by the car manufacturer produces an average fuel efficiency of 40 miles per gallon (mpg) or more. 
Alternative Hypothesis: The new type of engine produces an average fuel efficiency of less than 40 miles per gallon (mpg)


In [20]:
# Given Data
sample_size = 25
sample_mean = 38 
sample_std = 4
population_mean = 40
significance_level = 0.05

# Calculate the Test Statistics (Z Socre)
z_score = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))
print("T Statistics (Z Socre):", z_score)

# Find the critical Z-values for a two-tailed test at the 5% significance level
critical_z_lower = (significance_level / 2)
critical_z_upper = -critical_z_lower

# Determine the rejection region
rejection_region = (critical_z_lower, critical_z_upper)
print("Rejection Region is:", rejection_region)

# Determine if the test statistic falls into the rejection region
if z_score < critical_z_lower or z_score > critical_z_upper:
    print("Reject the Null Hypothesis")
else:
    print("Fail to Reject the null hypothesis")

T Statistics (Z Socre): -2.5
Rejection Region is: (0.025, -0.025)
Reject the Null Hypothesis
