# Statistical Inference and Hypothesis Testing

**Statistical Inference**

Statistical inference is the process of drawing conclusions or making predictions about a population based on a sample of data taken from that population. It involves using statistical methods to analyze and interpret data, allowing researchers to make generalizations or infer properties of the entire population from which the sample is drawn. There are two main branches of statistical inference:

1. Estimation
    - Estimation involves estimating unknown parameters of a population based on sample data. This can include point estimation, where a single value is used to estimate a parameter, or interval estimation, where a range of values is provided as an estimate along with a level of confidence.

2.  Hypothesis Testing
    - Hypothesis testing is a method for making inferences about a population based on a sample of data. It involves testing a specific hypothesis about the population parameter to determine whether there is enough evidence in the sample data to reject or fail to reject the null hypothesis.
    
**Hypothesis Testing**

Hypothesis testing is a formal statistical method for making decisions or inferences about population parameters based on sample data. 

1. Formulate Hypotheses:
    - Null Hypothesis $(H_0)$: This is a statement of no effect or no difference, often representing the status quo or a default assumption.
    - Alternative Hypothesis $(H_1)$ or $(H_a)$: This is a statement that contradicts the null hypothesis and represents the effect or difference that the researcher is trying to detect.
    
2. Choose Significance Level $(\alpha)$:
    - The significance level, often denoted by $\alpha$, represents the probability of making a Type I error (incorrectly rejecting a true null hypothesis). Commonly used values for $\alpha$ include 0.05 and 0.01.
    
3. Collect and Analyze Data:
    - Collect a sample of data and perform statistical analysis, depending on the nature of the hypothesis being tested (e.g., t-test, chi-square test, etc.).
    
4. Calculate Test Statistic and P-value:
    - Compute a test statistic that summarizes the information from the sample and calculate the p-value, which represents the probability of obtaining the observed results (or more extreme) if the null hypothesis is true.
    
5. Make a Decision:
    - Compare the p-value to the significance level:
        - If $p≤α$, reject the null hypothesis. There is enough evidence to suggest that the alternative hypothesis is true.
        - If $p>α$, fail to reject the null hypothesis. There is not enough evidence to suggest that the alternative hypothesis is true.

In [1]:
import numpy as np
from scipy import stats

# Set a random seed for reproducibility
np.random.seed(42)

# Generate a random sample of heights (assuming a normal distribution)
sample_size = 30
population_mean = 170
sample_heights = np.random.normal(loc=population_mean, scale=5, size=sample_size)

# Display the generated sample heights
print("Sample Heights:", sample_heights)

# Define the null hypothesis: population mean height is equal to 170 cm
# Define the alternative hypothesis: population mean height is not equal to 170 cm
null_hypothesis = "Population mean height is 170 cm"
alternative_hypothesis = "Population mean height is not 170 cm"

# Set the significance level (alpha)
alpha = 0.05

# Perform a two-sided t-test
t_statistic, p_value = stats.ttest_1samp(sample_heights, population_mean)

# Display the results of the hypothesis test
print("\nNull Hypothesis:", null_hypothesis)
print("\nAlternative Hypothesis:", alternative_hypothesis)
print("\nSignificance Level (alpha):", alpha)
print("\nResults of the t-test:")
print("\nT-statistic:", t_statistic)
print("\nP-value:", p_value)

# Check if the p-value is less than the significance level
if p_value < alpha:
    print("\nReject the null hypothesis. There is enough evidence to suggest a significant difference.")
else:
    print("\nFail to reject the null hypothesis. There is not enough evidence to suggest a significant difference.")

Sample Heights: [172.48357077 169.30867849 173.23844269 177.61514928 168.82923313
 168.82931522 177.89606408 173.83717365 167.65262807 172.71280022
 167.68291154 167.67135123 171.20981136 160.43359878 161.37541084
 167.18856235 164.9358444  171.57123666 165.45987962 162.93848149
 177.32824384 168.8711185  170.33764102 162.87625907 167.27808638
 170.55461295 164.24503211 171.87849009 166.99680655 168.54153125]

Null Hypothesis: Population mean height is 170 cm

Alternative Hypothesis: Population mean height is not 170 cm

Significance Level (alpha): 0.05

Results of the t-test:

T-statistic: -1.1450173670383608

P-value: 0.26156414618800256

Fail to reject the null hypothesis. There is not enough evidence to suggest a significant difference.


# Conclusion

In conclusion, we conducted a hypothesis test to investigate whether the average height of a randomly selected sample of individuals is significantly different from a known population mean of 170 cm. The two-sided t-test yielded a t-statistic of ```t_statistic``` and a p-value of ```p_value```. With a pre-defined significance level $(\alpha)$ of 0.05, we compared the p-value to alpha and found that ```p_value``` $\leq \alpha$. Consequently, we reject/fail to reject the null hypothesis, indicating that there is sufficient/insufficient evidence to suggest a significant difference in the average height of the sample compared to the population mean. 