# Inferential Statistics Notebook

### **Explanation of Inferential Statistics Concepts:**

1. **Confidence Intervals**:
   - A confidence interval provides a range of values that likely contains the population parameter (e.g., mean) based on the sample data.
   - It is expressed as an interval around the sample mean, constructed using the standard error and a critical value from the t-distribution.

2. **Hypothesis Testing**:
   - A statistical method used to make decisions about population parameters based on sample data.
   - Involves:
     - Formulating a null hypothesis (H0) and an alternative hypothesis (H1).
     - Calculating a test statistic (e.g., t-statistic).
     - Comparing the p-value against a significance level (α) to determine whether to reject H0.

3. **Comparing Two Means (Independent Samples t-test)**:
   - Used to determine whether there is a significant difference between the means of two independent groups.
   - Assumes that the data is normally distributed and that variances are equal (homogeneity of variance).

4. **ANOVA (Analysis of Variance)**:
   - A statistical method used to compare means across three or more groups.
   - Tests the null hypothesis that all group means are equal against the alternative hypothesis that at least one group mean is different.
   - Based on the F-statistic, which compares the variance between groups to the variance within groups.

### Usage Instructions:
1. **Run the code** to see how inferential statistics work with sample data.
2. Modify the sample data generation process to simulate different scenarios or distributions.
3. Explore further statistical tests or analyses as needed.

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats

# Set style for seaborn
sns.set(style="whitegrid")

# Generating sample data
np.random.seed(0)
sample_size = 30
population_mean = 100
population_std = 15

# Generating a sample from a normal distribution
sample_data = np.random.normal(population_mean, population_std, sample_size)

# Function to display sample statistics
def display_sample_statistics(data):
    sample_mean = np.mean(data)
    sample_std = np.std(data, ddof=1)  # Sample standard deviation
    print(f"Sample Mean: {sample_mean:.2f}")
    print(f"Sample Standard Deviation: {sample_std:.2f}")

# Display sample statistics
display_sample_statistics(sample_data)

# 1. Confidence Intervals
def confidence_interval(data, confidence=0.95):
    n = len(data)
    mean = np.mean(data)
    std_err = stats.sem(data)
    h = std_err * stats.t.ppf((1 + confidence) / 2, n - 1)
    return (mean - h, mean + h)

ci = confidence_interval(sample_data)
print(f"95% Confidence Interval: {ci}")

# Visualizing the sample data and confidence interval
plt.figure(figsize=(10, 5))
sns.histplot(sample_data, bins=10, kde=True, color="skyblue")
plt.axvline(ci[0], color='red', linestyle='--', label='Lower CI')
plt.axvline(ci[1], color='green', linestyle='--', label='Upper CI')
plt.axvline(np.mean(sample_data), color='black', label='Sample Mean')
plt.title("Sample Data with Confidence Interval")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()

# 2. Hypothesis Testing
# Null hypothesis: Sample comes from a population with a mean of 100
# Alternative hypothesis: Sample comes from a population with a mean different from 100

# One-sample t-test
t_statistic, p_value = stats.ttest_1samp(sample_data, population_mean)
print(f"T-Statistic: {t_statistic:.2f}")
print(f"P-Value: {p_value:.4f}")

# Conclusion of hypothesis test
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

# 3. Comparing Two Means (Independent Samples t-test)
# Generating another sample
sample_data_2 = np.random.normal(population_mean + 5, population_std, sample_size)

# Visualizing the two samples
plt.figure(figsize=(10, 5))
sns.histplot(sample_data, bins=10, kde=True, color="skyblue", label="Sample 1")
sns.histplot(sample_data_2, bins=10, kde=True, color="orange", label="Sample 2")
plt.title("Comparison of Two Samples")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()

# Performing an independent samples t-test
t_statistic_2, p_value_2 = stats.ttest_ind(sample_data, sample_data_2)
print(f"T-Statistic (Independent Samples): {t_statistic_2:.2f}")
print(f"P-Value (Independent Samples): {p_value_2:.4f}")

# Conclusion of the independent samples t-test
if p_value_2 < alpha:
    print("Reject the null hypothesis for independent samples.")
else:
    print("Fail to reject the null hypothesis for independent samples.")

# 4. ANOVA (Analysis of Variance)
# Generating three samples for ANOVA
sample_data_3 = np.random.normal(population_mean + 10, population_std, sample_size)
sample_data_4 = np.random.normal(population_mean + 15, population_std, sample_size)

# Performing ANOVA
f_statistic, p_value_anova = stats.f_oneway(sample_data, sample_data_2, sample_data_3)
print(f"F-Statistic: {f_statistic:.2f}")
print(f"P-Value (ANOVA): {p_value_anova:.4f}")

# Conclusion of ANOVA
if p_value_anova < alpha:
    print("Reject the null hypothesis in ANOVA.")
else:
    print("Fail to reject the null hypothesis in ANOVA.")