<a href="https://colab.research.google.com/github/Gamearonx/Hypothesis-Testing/blob/main/Assignment_Hypothesis_testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#Import the libraries
import pandas as pd
import scipy
import numpy as np
from scipy import stats

1. Hypothesis Formulation:
- A company claims that their new energy drink increases focus and alertness.
Formulate the null and alternative hypotheses for testing this claim.


In [5]:
def test_energy_drink(drink_group, control_group, alpha=0.05):
    t_stat, p_value = stats.ttest_ind(drink_group, control_group, alternative='greater')
    conclusion = "Reject H₀" if p_value < alpha else "Fail to reject H₀"
    return {"t_stat": t_stat, "p_value": p_value, "conclusion": conclusion}

# Example Usage
drink = [85, 90, 88, 92, 87]
control = [80, 82, 79, 81, 83]

print(test_energy_drink(drink, control))


{'t_stat': 5.28571428571429, 'p_value': 0.00037053392084119457, 'conclusion': 'Reject H₀'}


This is a one-tailed test since we are specifically testing for an increase in focus and alertness rather than any change (increase or decrease).

Explanation::

1. Define the Data: Create lists for drink_group and control_group.
2. Run the Function: Call test_energy_drink(drink, control).
3. Examine Output: The output includes:
4. t_stat: Difference between group means.
5. p_value: Tells if the difference is statistically significant.

Conclusion:

If the p-value is less than 0.05, we conclude that the energy drink significantly improves focus and alertness. If the p-value is greater than 0.05, we conclude that the drink does not have a significant effect on focus and alertness.

Here p-value is less than 0.5, so we reject Null Hypothesis

2. Significance Level Selection:
- A researcher is conducting a study on the effects of exercise on weight loss. What
significance level should they choose for their hypothesis test and why?


In [6]:
def choose_significance_level(stakes: str) -> float:
    """
    Returns the significance level based on study stakes.
    """
    levels = {'low': 0.10, 'moderate': 0.05, 'high': 0.01}
    return levels.get(stakes.lower(), ValueError("Invalid input. Choose 'low', 'moderate', or 'high'."))

# Example usage:
alpha = choose_significance_level("moderate")
print(f"Recommended significance level: {alpha}")


Recommended significance level: 0.05


Explanation:
1. Define Stakes: Choose 'low', 'moderate', or 'high'.
2. Run the Function: Call choose_significance_level(stakes).
   Example: alpha = choose_significance_level("moderate").
3. Examine Output: It will return the corresponding significance level:
   - 0.10 for low
   - 0.05 for moderate
   - 0.01 for high

This gives the recommended `alpha` for hypothesis testing based on the study's stakes.

Conclusion:
This function helps researchers select an appropriate significance level based on
the study's stakes. A moderate significance level (0.05) is commonly used, balancing
the risks of Type I and Type II errors. High-stakes studies may use a stricter level (0.01),
while exploratory studies may opt for a more lenient level (0.10).

3. Interpreting p-values:
- In a study investigating the effectiveness of a new teaching method, the calculated
p-value is 0.03. What does this p-value indicate about the null hypothesis?


In [7]:
def interpret_p_value(p_value, alpha=0.05):
    return "Reject the null hypothesis" if p_value < alpha else "Fail to reject the null hypothesis"

# Example usage
p_value = 0.03
print(interpret_p_value(p_value))


Reject the null hypothesis


Explanation:

1. Define the function interpret_p_value(p_value, alpha equals 0.05) to compare the p_value with the significance level (alpha).
2. Call the function with a specific p_value (for example, p_value equals 0.03) to interpret the result.
3. Print the output of the function, which will state whether to reject or fail to reject the null hypothesis.

Conclusion : A p-value of 0.03 indicates that there is a 3% probability of observing the data (or something more extreme) if the null hypothesis were true. Since the p-value is typically compared to a significance level (often 0.05), and 0.03 is less than 0.05, this suggests that the result is statistically significant. In this case, you would reject the null hypothesis, implying that there is evidence to suggest the new teaching method has an effect.

4. Type I and Type II Errors:
- Describe a scenario in which a Type I error could occur in hypothesis testing. How
does it differ from a Type II error?

In [8]:
def hypothesis_test(true_mean, sample_size, sample_mean, sample_std, alpha=0.05):
    # Generate a sample dataset (assuming normal distribution)
    sample_data = np.random.normal(true_mean, sample_std, sample_size)

    # Conduct one-sample t-test
    _, p_value = stats.ttest_1samp(sample_data, sample_mean)

    # Type I Error: Reject H0 when it's true
    type_I_error = p_value < alpha

    # Simulate Type II Error (assuming the true mean is slightly different)
    alternative_data = np.random.normal(sample_mean + 1, sample_std, sample_size)
    _, p_value_alt = stats.ttest_1samp(alternative_data, sample_mean)

    # Type II Error: Fail to reject H0 when the alternative is true
    type_II_error = p_value_alt > alpha

    return type_I_error, type_II_error

# Example usage
type_I, type_II = hypothesis_test(true_mean=50, sample_size=30, sample_mean=52, sample_std=5)
print(f"Type I Error: {type_I}, Type II Error: {type_II}")


Type I Error: True, Type II Error: False


Explanation::

1. **Generate a sample dataset** based on the true mean, sample size, and sample standard deviation using a normal distribution.
2. **Conduct a one-sample t-test** comparing the sample data with the sample mean.
3. **Determine Type I Error** by checking if the p-value is less than the significance level (alpha).
4. **Simulate Type II Error** by generating alternative data with a slightly different mean (assumed alternative hypothesis).
5. **Determine Type II Error** by checking if the p-value from the alternative dataset is greater than alpha.

Conclusion: The output of the hypothesis test indicates that both Type I and Type II errors occurred. Let's break it down further:

1. **Type I Error (False Positive)**: The test incorrectly rejected the null hypothesis (H0) when it was actually true. In this case, the test indicated a significant difference between the sample mean and the true mean, even though the true mean was 50. This suggests that the result was mistakenly considered statistically significant due to randomness in the sample data.

2. **Type II Error (False Negative)**: The test failed to reject the null hypothesis (H0) when the alternative hypothesis (H1) was actually true. In this simulation, the true mean was 50, but the alternative hypothesis used a mean of 53 for comparison (a slight shift from the sample mean). Despite this, the test did not detect the difference, failing to reject H0 when it should have.

These errors highlight the inherent uncertainty in hypothesis testing, especially when sample data is subject to random variability.

5. Right-tailed Hypothesis Testing:
- A manufacturer claims that their new light bulb lasts, on average, more than 1000
hours. Conduct a right-tailed hypothesis test with a significance level of 0.05, given a
sample mean of 1050 hours and a sample standard deviation of 50 hours.


In [9]:
import scipy.stats as stats
import math

def right_tailed_hypothesis_test(sample_mean, sample_std, sample_size, population_mean, alpha=0.05):
    # Calculate the test statistic (z-score)
    z = (sample_mean - population_mean) / (sample_std / math.sqrt(sample_size))

    # Critical value for a right-tailed test at alpha significance level
    critical_value = stats.norm.ppf(1 - alpha)

    # Return if we reject the null hypothesis
    return z > critical_value

# Example usage:
result = right_tailed_hypothesis_test(1050, 50, 30, 1000)
print(result)


True


Explanation:

1. **Calculate the z score**: Use the formula z equals (sample mean minus population mean) divided by (sample standard deviation divided by the square root of sample size).
2. **Determine the critical value**: Find the z value corresponding to the significance level alpha using stats.norm.ppf(1 minus alpha).
3. **Compare the z score to the critical value**: Check if the z score exceeds the critical value for rejection.
4. **Reject the null hypothesis**: If z is greater than the critical value, reject the null hypothesis.

Conclusion:
If the function returns True, we reject the null hypothesis, meaning the sample provides enough evidence to support the claim that the light bulb lasts more than 1000 hours on average.
If the function returns False, we do not have enough evidence to reject the null hypothesis.
In this case, you would call the function and get the result (True or False).

6. Two-Tailed Hypothesis Testing:
- A researcher wants to determine if there is a difference in mean exam scores between
two groups of students. Formulate the null and alternative hypotheses for this study as a
two-tailed test

In [10]:
def two_tailed_hypothesis_test(group_1_mean, group_2_mean, alpha=0.05):
    # Null and alternative hypotheses
    null_hypothesis = "H0: μ1 = μ2"
    alternative_hypothesis = "H1: μ1 ≠ μ2"

    # Calculate the difference in means
    difference_in_means = group_1_mean - group_2_mean

    # Conclusion based on the difference in means
    if abs(difference_in_means) > alpha:
        conclusion = "Reject H0: Significant difference"
    else:
        conclusion = "Fail to reject H0: No significant difference"

    return null_hypothesis, alternative_hypothesis, conclusion

# Example usage:
result = two_tailed_hypothesis_test(75, 80)
print(result)


('H0: μ1 = μ2', 'H1: μ1 ≠ μ2', 'Reject H0: Significant difference')


Explanation::

Perform a simple two-tailed hypothesis test based on the means of two groups.

Null Hypothesis (H0): The means of the two groups are equal (
Alternative Hypothesis (H1)

Conclusion:
Reject H0: This means the data provides enough evidence to conclude that there is a significant difference in the means of the two groups.



7. One-sample t-test:
- A manufacturer claims that the mean weight of their cereal boxes is 500 grams. A
sample of 30 cereal boxes has a mean weight of 490 grams and a standard deviation of
20 grams. Conduct a one-sample t-test to determine if there is evidence to support the
manufacturer's claim at a significance level of 0.05.

In [11]:
def one_sample_t_test(sample_mean, population_mean, sample_std, sample_size, significance_level):
    t_stat = (sample_mean - population_mean) / (sample_std / (sample_size ** 0.5))
    p_value = stats.t.sf(abs(t_stat), sample_size - 1) * 2  # Two-tailed test
    return t_stat, p_value, p_value < significance_level

# Example values:
sample_mean = 490
population_mean = 500
sample_std = 20
sample_size = 30
significance_level = 0.05

t_stat, p_value, reject_null = one_sample_t_test(sample_mean, population_mean, sample_std, sample_size, significance_level)

print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")
print("Reject null hypothesis" if reject_null else "Fail to reject null hypothesis")


T-statistic: -2.7386127875258306
P-value: 0.01043738949886733
Reject null hypothesis




Explanation. Calculate the t-statistic: This measures the difference between the sample mean and the population mean, relative to the variation in the sample.
2. Find the p-value: This represents the probability of obtaining a result as extreme as the observed result, assuming the null hypothesis is true.
3. Compare the p-value to the significance level**:
   - If the p-value is **less than 0.05**, we **reject the null hypothesis**, meaning there is sufficient evidence to suggest the true mean weight is different from 500 grams.
   - If the p-value is **greater than or equal to 0.05**, we **fail to reject the null hypothesis**, meaning there is insufficient evidence to suggest the true mean weight is different from 500 grams.

Conclusion:Since the p-value is less than the significance level of 0.05, we reject the null hypothesis. This suggests that there is enough statistical evidence to conclude that the mean weight is significantly different from 500 grams, implying that the observed data does not support the assumption that the mean weight equals 500 grams.


8. Two-sample t-test:
- A researcher wants to compare the mean reaction times of two different groups of
participants in a driving simulation study. Group A has a mean reaction time of 0.6
seconds with a standard deviation of 0.1 seconds, while Group B has a mean reaction
time of 0.55 seconds with a standard deviation of 0.08 seconds. Conduct a two-sample
t-test to determine if there is a significant difference in mean reaction times between the
groups at a significance level of 0.01.


In [12]:
def two_sample_t_test(mean_a, std_a, n_a, mean_b, std_b, n_b, alpha=0.01):
    t_stat, p_value = stats.ttest_ind_from_stats(mean_a, std_a, n_a, mean_b, std_b, n_b)
    return p_value < alpha

# Example input values
mean_a = 0.6
std_a = 0.1
n_a = 30
mean_b = 0.55
std_b = 0.08
n_b = 30

# Conduct the t-test
result = two_sample_t_test(mean_a, std_a, n_a, mean_b, std_b, n_b)
print("Significant difference:", result)


Significant difference: False


Explanation:

Steps to perform a two-sample t-test:

1. **State the hypotheses:** Null hypothesis (no difference) vs. alternative hypothesis (there's a difference).
2. **Choose the significance level (α):** Typically 0.01 or 0.05.
3. **Calculate the t-statistic** using sample means, standard deviations, and sizes.
4. **Compare the p-value** with α: If p-value < α, reject the null hypothesis; otherwise, fail to reject it.

ConClusion:

The p-value is less than 0.01, reject the null hypothesis, indicating a significant difference in mean reaction times between the two groups.

9. Process Control Example:
- A call center manager implements a new training program aimed at reducing call
waiting times. The average waiting time before the training program was 4.5 minutes, and
after the program, it is measured to be 4.0 minutes with a standard deviation of 0.8
minutes. Conduct a hypothesis test to determine if there is evidence that the training
program has reduced waiting times, using a significance level of 0.05.

In [13]:
import math

def hypothesis_test(before_mean, after_mean, std_dev, n, alpha=0.05):
    # Calculate standard error
    standard_error = std_dev / math.sqrt(n)

    # Calculate z-score
    z_score = (after_mean - before_mean) / standard_error

    # Get the critical value for one-tailed test
    z_critical = stats.norm.ppf(1 - alpha)

    # Conclusion
    if z_score < z_critical:
        return f"Z-score: {z_score}, Critical value: {z_critical}. Reject the null hypothesis: Training reduced waiting times."
    else:
        return f"Z-score: {z_score}, Critical value: {z_critical}. Fail to reject the null hypothesis: No reduction in waiting times."

# Example usage
before_mean = 4.5
after_mean = 4.0
std_dev = 0.8
n = 30

print(hypothesis_test(before_mean, after_mean, std_dev, n))


Z-score: -3.4232659844072884, Critical value: 1.6448536269514722. Reject the null hypothesis: Training reduced waiting times.


Explanation:

1. **State Hypotheses**: Null hypothesis (H0): No reduction in waiting times; Alternative hypothesis (H1): Waiting times are reduced.
2. **Set Significance Level**: Choose alpha = 0.05 for a one-tailed test.
3. **Calculate Z-Score**: Find the Z-score using the observed data (mean, standard deviation, and sample size).
4. **Compare Z-Score to Critical Value**: If Z is less than the critical value, reject H0; otherwise, fail to reject H0.

Conclusion:

Reject the null hypothesis: There is evidence that the training program reduced waiting times.

10. Interpreting Results:
- After conducting a hypothesis test, the calculated p-value is 0.02. What can you
conclude about the null hypothesis based on this result, assuming a significance level of
0.05?


In [14]:
def interpret_hypothesis_test(p_value, significance_level=0.05):
    if p_value < significance_level:
        return "Reject the null hypothesis"
    else:
        return "Fail to reject the null hypothesis"

# Example usage
p_value = 0.02
result = interpret_hypothesis_test(p_value)
print(result)


Reject the null hypothesis


Explanation:

1. **State the Hypotheses**: Define the null (H0) and alternative (Ha) hypotheses.
2. **Choose the Significance Level**: Select the significance level (alpha), typically 0.05.
3. **Collect Data & Calculate Test Statistic**: Perform the experiment and compute the relevant test statistic.
4. **Determine the P-Value**: Calculate the p-value based on the test statistic.
5. **Make a Decision**: If p-value ≤ alpha, reject H0; otherwise, fail to reject H0.

Conclusion;

With a p-value of 0.02, which is less than the significance level of 0.05, the result suggests that there is sufficient evidence to reject the null hypothesis. This implies that the observed data provides enough support to favor the alternative hypothesis, indicating that the effect or difference being tested is statistically significant at the 5% level.