In [None]:
'''
Inferential statistics:
It gives us to make conclusions about a population based on a sample of data.
This module provides functions to perform inferential statistics, including hypothesis testing and confidence intervals.
It includes functions for t-tests, z-tests, chi-squared tests, and confidence intervals for means and proportions.
It is designed to be used in conjunction with the `statistics` module for descriptive statistics.
'''

In [None]:
'''
Hypothesis Testing:
Definition:
    Hypothesis testing is a statistical method used to make inferences about a population based on a sample. 
    It involves formulating a null hypothesis (H0) and an alternative hypothesis (H1), 
    then using sample data to determine whether to reject or fail to reject the null hypothesis.
Hypothesis Testing Mechanism:
    1. Formulate the null hypothesis (H0) and alternative hypothesis (H1).
    (Here null hypothesis is a statement of no effect or no difference, 
    while the alternative hypothesis is what you want to prove.)
    2. Choose a significance level (alpha), typically 0.05.
    3. Collect sample data and calculate the test statistic.
    4. Determine the critical value or p-value based on the test statistic.
    5. Compare the test statistic to the critical value or p-value to make a decision.
    6. Draw conclusions about the population based on the results.

Example:
    Suppose we want to test whether a new drug is effective in reducing blood pressure.
    - Null Hypothesis (H0): The drug has no effect on blood pressure.
    - Alternative Hypothesis (H1): The drug reduces blood pressure.
    - We collect sample data, calculate the test statistic, and compare it to the critical value or p-value.
    - If the test statistic exceeds the critical value or the p-value is less than alpha, we reject H0 and conclude that the drug is effective.
'''

In [None]:
'''
p-value:
Definition:
    The p-value is the probability of observing a test statistic at least as extreme as the one calculated from the sample data, 
    assuming that the null hypothesis is true. It quantifies the evidence against the null hypothesis.
Mechanism:
    1. Calculate the test statistic from the sample data.
    2. Determine the distribution of the test statistic under the null hypothesis.
    3. Calculate the p-value as the probability of observing a test statistic as extreme or more extreme than the one calculated.
    4. Compare the p-value to the significance level (alpha) to make a decision about the null hypothesis.
Formula:
    p-value = P(T >= t | H0) for a right-tailed test
    p-value = P(T <= t | H0) for a left-tailed test
    p-value = 2 * min(P(T >= t | H0), P(T <= t | H0)) for a two-tailed test
Example:
    Suppose we have a school with students who take a math test.
    - Null Hypothesis (H0): The average score of students is 75.
    - We collect sample data and calculate the test statistic.
    - If we find a p-value of 0.03, it means there is a 3% chance of observing a test statistic as extreme as ours if the null hypothesis is true.
    - If our significance level (alpha) is 0.05, we reject the null hypothesis because 0.03 < 0.05, indicating that the average score is significantly different from 75.

'''

In [2]:
# implement the above example in code p-test
import numpy as np
from scipy import stats
def test_math_scores(scores, null_hypothesis_mean=75, alpha=0.05):
    """
    Perform a one-sample t-test to determine if the average score of students is significantly different from the null hypothesis mean.
    
    Parameters:
    scores (list or np.array): Sample data of student scores.
    null_hypothesis_mean (float): The mean score under the null hypothesis.
    alpha (float): Significance level for the test.
    
    Returns:
    dict: Results of the hypothesis test including test statistic, p-value, and conclusion.
    """
    # Calculate the sample mean and standard deviation
    sample_mean = np.mean(scores)
    sample_std = np.std(scores, ddof=1)  # Sample standard deviation
    n = len(scores)  # Sample size
    
    # Perform the t-test
    t_statistic, p_value = stats.ttest_1samp(scores, null_hypothesis_mean)
    
    # Determine if we reject or fail to reject the null hypothesis
    if p_value < alpha:
        conclusion = "Reject the null hypothesis: The average score is significantly different from 75."
    else:
        conclusion = "Fail to reject the null hypothesis: The average score is not significantly different from 75."
    
    return {
        'sample_mean': sample_mean,
        'sample_std': sample_std,
        't_statistic': t_statistic,
        'p_value': p_value,
        'conclusion': conclusion
    }
# Example usage
scores = [78, 75, 74, 75, 75, 76, 75, 78, 73, 74]
result = test_math_scores(scores)
print("Sample Mean:", result['sample_mean'])
print("Sample Standard Deviation:", result['sample_std'])
print("T-Statistic:", result['t_statistic'])
print("P-Value:", result['p_value'])
print("Conclusion:", result['conclusion'])


Sample Mean: 75.3
Sample Standard Deviation: 1.636391694484477
T-Statistic: 0.5797409637607422
P-Value: 0.5763121535388385
Conclusion: Fail to reject the null hypothesis: The average score is not significantly different from 75.
